aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/mps
Commit message (Collapse)AuthorAgeFilesLines
* mpr/mps: when sending reset on removal, include target in messageWarner Losh2024-12-281-2/+6
| | | | | | | | | | | | | | It's possible for muliple drives to be departing at the same time (if the common power rail the share goes dark, for example). To understand what's going on better, include target and handle in the messages announcing the reset to allow matching with other corresponding events. MFC After: 3 days Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D35092 (cherry picked from commit ca420b4ef2ceac00f6c6905252d553a86100ab6a)
* mps/mpr: Add workaround for firmware not responding to IOC_FACTS or IOC_INITprateek sethi2024-10-161-8/+30
| | | | | | | | | | | | | | | | | | | | | | | | | Sometimes, especially with older firmware, mps(4) would have trouble initializing the card in one of these two steps. Add in a retry after a short delay. Sean Bruno and Stephen McConnell thought this was OK in the bug discussions, but never committed it. Steve indicated the delay might not be necessary, but the OP clearly needed to make it longer to make things work. I've kept the delay, and added the suggested comment. Ported the iocfacts part to mpr as well, since we see similar errors about once every month or two over a few thousand controllers at work. We've not seen it with IOC_INIT as far back as I can query the error log database, so I didn't port that forward. We'll see if this helps, but won't know for sure until next year (so I'm committing it now since it won't hurt and might help). We usually see this failure in connection with complicated recovery operations with a drive that's failing, though, at least in the last year's worth of failures. It's not clear this is the same as OP or not. PR: 212841 Sponsored by: Netflix Co-authored-by: imp (cherry picked from commit c0e0e530ced057502f51d7a6086857305e08fae0)
* mps(4): Correct a typo in a source code commentGordon Bergling2024-07-131-2/+2
| | | | | | - s/vender/vendor/ (cherry picked from commit a9d7f098b86576006f5aeb312521bfde5ac77c77)
* mps: Handle errors from copyout() in ioctl handlersMark Johnston2024-01-021-10/+21
| | | | | | | | | | | | | In preparation for adding a __result_use_check annotation to copyin() and related functions, start checking for errors from copyout() in the mps(4) user command handler. This should make it easier to catch bugs. Reviewed by: imp, asomers MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D43176 (cherry picked from commit bcf4a7c7ace21a01d10003de9c7692f0887526c1)
* mpr, mps: Establish busdma boundaries for memory poolsKenneth D. Merry2023-12-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most all of the memory used by the cards in the mpr(4) and mps(4) drivers is required, according to the specs and Broadcom developers, to be within a 4GB segment of memory. This includes: System Request Message Frames pool Reply Free Queues pool ReplyDescriptorPost Queues pool Chain Segments pool Sense Buffers pool SystemReply message pool We got a bug report from Dwight Engen, who ran into data corruption in the BAE port of FreeBSD: > We have a port of the FreeBSD mpr driver to our kernel and recently > I found an issue under heavy load where a DMA may go to the wrong > address. The test system is a Supermicro X10SRH-CLN4F with the > onboard SAS3008 controller setup with 2 enterprise Micron SSDs in > RAID 0 (striped). I have debugged the issue and narrowed down that > the errant DMA is one that has a segment that crosses a 4GB > physical boundary. There are more details I can provide if you'd > like, but with the attached patch in place I can no longer > re-create the issue. > I'm not sure if this is a known limit of the card (have not found a > datasheet/programming docs for the chip) or our system is just > doing something a bit different. Any helpful info or insight would > be welcome. > Anyway, just thought this might be helpful info if you want to > apply a similar fix to FreeBSD. You can ignore/discard the commit > message as it is my internal commit (blkio is our own tool we use > to write/read every block of a device with CRC verification which > is how I found the problem). The commit message was: > [PATCH 8/9] mpr: fix memory corrupting DMA when sg segment crosses > 4GB boundary > Test case was two SSD's in RAID 0 (stripe). The logical disk was > then partitioned into two partitions. One partition had lots of > filesystem I/O and the other was initially filled using blkio with > CRCable data and then read back with blkio CRC verify in a loop. > Eventually blkio would report a bad CRC block because the physical > page being read-ahead into didn't contain the right data. If the > physical address in the arq/segs was for example 0x500003000 the > data would actually be DMAed to 0x400003000. The original patch was against mpr(4) before busdma templates were introduced, and only affected the buffer pool (sc->buffer_dmat) in the mpr(4) driver. After some discussion with Dwight and the LSI/Broadcom developers and looking through the driver, it looks like most of the queues in the driver are ok, because they limit the memory used to memory below 4GB. The buffer queue and the chain frames seem to be the exceptions. This is pretty much the same between the mpr(4) and mps(4) drivers. So, apply a 4GB boundary limitation for the buffer and chain frame pools in the mpr(4) and mps(4) drivers. Reported by: Dwight Engen <dwight.engen@gmail.com> Reviewed by: imp Obtained from: Dwight Engen <dwight.engen@gmail.com> Differential Revision: <https://reviews.freebsd.org/D43008> (cherry picked from commit 264610a86e14f8e123d94c3c3bd9632d75c078a3)
* mprutil: "fix user reply buffer (64)..." warningsAlan Somers2023-10-052-3/+3
| | | | | | | | | | | | | | | | | | Depending on the card's firmware version, it may return different length responses for MPI2_FUNCTION_IOC_FACTS. But the first part of the response contains the length of the rest, so query it first to get the length and then use that to size the buffer for the full response. Also, correctly zero-initialize MPI2_IOC_FACTS_REQUEST. It only worked by luck before. PR: 264848 Reported by: Julien Cigar <julien@perdition.city> Sponsored by: Axcient Reviewed by: scottl, imp Differential Revision: https://reviews.freebsd.org/D38739 (cherry picked from commit 7d154c4dc64e61af7ca536c4e9927fa07c675a83)
* sys: Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-238-16/+0
| | | | | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/ Similar commit in current: (cherry picked from commit 685dc743dc3b)
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-2320-42/+0
| | | | | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/ Similar commit in current: (cherry picked from commit 95ee2897e98f)
* spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh2023-07-2522-22/+22
| | | | | | | | | | | The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix (cherry picked from commit 4d846d260e2b9a3d4d0a701462568268cbfe7a5b)
* mps: Fix a typo in a source code commentZhenlei Huang2023-05-171-1/+1
| | | | | | | | - s/feild/field/ MFC after: 3 days (cherry picked from commit 5bcbdb0b2eb4c14ef0a8671c996337acdefb7f72)
* Fix kernel memory disclosures in mpr and mpsAlan Somers2023-03-221-3/+4
| | | | | | | | | | | | | | In every mpr and mps ioctl that copies kernel data to userland, validate that the requested length does not exceed the size of the kernel's buffer. Note that all of these ioctls already required root access. Sponsored by: Axcient Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D38842 (cherry picked from commit 72aad3f9028af12e6c56a3a461b46a153abd7b24)
* mps(4): Remove a double word in a source code commentGordon Bergling2022-09-071-1/+1
| | | | | | - s/the the/the/ (cherry picked from commit d6f9a3c0a8b11fa0e26e364266e37805ca1dcca2)
* mpr/mps/mpt: verify cfg page ioctl lengthsEd Maste2022-04-041-0/+13
| | | | | | | | | | | | | | | | | | | | | | | *_CFG_PAGE ioctl handlers in the mpr, mps, and mpt drivers allocated a buffer of a caller-specified size, but copied to it a fixed size header. Add checks that the size is at least the required minimum. Note that the device nodes are owned by root:operator with 0640 permissions so the ioctls are not available to unprivileged users. This change includes suggestions from scottl, markj and mav. Two of the mpt cases were reported by Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative; scottl reported the third case in mpt. Same issue found in mpr and mps after discussion with imp. Reported by: Lucas Leong (@_wmliang_), Trend Micro Zero Day Initiative Reviewed by: imp, mav MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34692 (cherry picked from commit 8276c4149b5fc7c755d6b244fbbf6dae1939f087)
* mps/mpr: Add missing newlines in error messages.Alexander Motin2022-03-021-18/+18
| | | | | | MFC after: 1 week (cherry picked from commit 074bed4f486b4fa54e4d9bd2fccfad3cce732ba1)
* mps/mpr: Relax doorbell polling precision.Alexander Motin2022-01-231-1/+2
| | | | | | | | It does not matter how often do we check firmware for crashes. MFC after: 2 weeks (cherry picked from commit 1849bc5f3ff04c128e85173aa84472a19b784e64)
* mps(4): Fix unmatched devq release.Warner Losh2022-01-061-16/+22
| | | | | | | | | | | | | | | | | | | | Port 9781c28c6d63 and a8837c77efd0 to the mps driver. Before this change devq was frozen only if some command was sent to the target after reset started, but release was called always. This change freezes the devq immediately, leaving mprsas_action_scsiio() check only to cover race condition due to different lock devq use. This should also avoid unnecessary requeue of the commands, creating additional log noise and confusing some broken apps. It also avoids a 'busy' requeue of I/Os failing when we're doing recovery that takes longer than the normal busy timeout. These I/Os failing can lead to filesystems being unmounted in the force unmount case for I/O errors. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D33228 (cherry picked from commit a10253cffea84c0c980a36ba6776b00ed96c3e3b)
* Fix "set but not used" warnings in the mps driver.Scott Long2022-01-063-7/+0
| | | | (cherry picked from commit bcce9c5bedfafd6f0f76c022c8a1e45fa8e9fd0a)
* mps: Fix debugging lineWarner Losh2022-01-061-1/+1
| | | | | | | | | Print cm instead of sc here, as is done in mpr. We can get the sc from cm, but not vice versa. Sponsored by: Netflix (cherry picked from commit b086bc0bf1dd78b161e3ba7a5732fc49ea3c1b82)
* mps/mpr(4): Move xpt_register_async() out of lock.Alexander Motin2021-10-141-8/+10
| | | | | | | | | It fixes lock ordere reversal between SIM and device locks. Also remove registration for AC_FOUND_DEVICE, unused for a while now. MFC after: 1 month (cherry picked from commit 02d8194012a9a0e367a92c7f89567b808bf0e9a8)
* mpr/mps: Minor state machine fixWarner Losh2021-09-031-0/+1
| | | | | | | | | When a DMA chain can't be loaded, set the state to STATE_INQUEUE so that the mp[rs]_complete_command can properly fail the command. Sponsored by: Netflix (cherry picked from commit 33755dbb207878c10fd99de39dadf89fad713bc7)
* Fix mpr(4) and mps(4) state transitions and a use-after-free panic.Kenneth D. Merry2021-09-034-29/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the mpr(4) and mps(4) drivers probe a SATA device, they issue an ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the target is fully setup in the driver. The drivers wait for completion of the identify command, and have a 5 second timeout. If the timeout fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be freed later. That is where the use-after-free problem comes in. Once the ATA Identify times out, the driver sends a target reset, and then frees any identify commands that have timed out. But, once the target reset completes, commands that were queued to the drive are returned to the driver by the controller. At that point, the driver (in mp{s,r}_intr_locked()) looks up the command descriptor for that particular SMID, marks it CM_STATE_BUSY and sends it on for completion handling. The problem at this stage is that the command has already been freed, and put on the free queue, so its state is CM_STATE_FREE. If INVARIANTS are turned on, we get a panic as soon as this command is allocated, because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY. So, the solution is to not free ATA Identify commands that get stuck until they actually return from the controller. Hopefully this works correctly on older firmware versions. If not, it could result in commands hanging around indefinitely. But, the alternative is a use-after-free panic or assertion (in the INVARIANTS case). This also tightens up the state transitions between CM_STATE_FREE, CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen once, and we have assertions to make sure that commands are in the correct state before transitioning to the next state. Also, for each state assertion, we print out the current state of the command if it is incorrect. mp{s,r}.c: Add a new sysctl variable, dump_reqs_alltypes, that controls the behavior of the dump_reqs sysctl. If dump_reqs_alltypes is non-zero, it will dump all commands, not just the commands that are in the CM_STATE_INQUEUE state. (You can see the commands that are in the queue by using mp{s,r}util debug dumpreqs.) Make sure that the INQUEUE -> BUSY state transition happens in one place, the mp{s,r}_complete_command routine. mp{s,r}_sas.c: Make sure we print the current command type in command state assertions. mp{s,r}_sas_lsi.c: Add a new completion handler, mp{s,r}sas_ata_id_complete. This completion handler will free data allocated for an ATA Identify command and free the command structure. In mp{s,r}_ata_id_timeout, do not set the command state to CM_STATE_BUSY. The command is still in queue in the controller. Since we were blocking waiting for this command to complete, there was no completion handler previously. Set the completion handler, so that whenever the command does come back, it will get freed properly. Do not free ATA Identify commands that have timed out in mp{s,r}sas_add_device(). Wait for them to actually come back from the controller. mp{s,r}var.h: Add a dump_reqs_alltypes variable for the new dump_reqs_alltypes sysctl. Make sure we print the current state for state transition asserts. This was tested in the Spectra Logic test bed (as described in the review), as well Netflix's Open Connect fleet (where panics dropped from a dozen or two a month to zero). Reviewed by: imp@ (who is handling the commit with ken's OK) Sponsored by: Spectra Logic Differential Revision: https://reviews.freebsd.org/D25476 (cherry picked from commit 175ad3d00318a345790eecf2f5a33cd16b119e55)
* Mark some sysctls as CTLFLAG_MPSAFE.Alexander Motin2021-08-251-3/+3
| | | | | | MFC after: 2 weeks (cherry picked from commit b776de6796fa0cd1b7dfaad75402e10907d47f29)
* mpr/mps(4): Make device mapping some more robust.Alexander Motin2021-05-241-13/+26
| | | | | | | | | | | | | | | | | | | | | | Allow new enclosure to replace previously existing one if there is no completely unused table entry, same as it is done for devices. If we can not process DPM due to corruption -- wipe it and restart from scratch. Otherwise I don't see a way to recover persistence if something go wrong and there is no BIOS to recover it for us. Together this solves a problem that appeared when 9300-8i firmware update to 16.00.10.00 somehow switched its mapping mode from Device Persistence to Enclosure/Slot without wiping the DPM table. It made HBA completely unusable, since overflowed and conflicting mapping table was unable to map any of enclosures and so devices. Also while there make some enclosure mapping errors more informative. MFC after: 1 month Sponsored by: iXsystems, Inc. (cherry picked from commit b99419aee49e2cc53747730be4d0ec4f9b330eb2)
* Remove unused wrappers around kproc_create() and kproc_exit().John Baldwin2021-03-291-4/+0
| | | | | | Sponsored by: Netflix (cherry picked from commit 645b15e558dc102ff70a6332b1d0b0aa733fd2bb)
* mpr, mps: Fix an off-by-one bug in the BTDH_MAPPING ioctlMark Johnston2021-01-081-1/+1
| | | | | | | | | The device mapping table contains sc->max_devices entries, so only indices in [0, sc->max_devices) are valid. MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27964
* mpr, mps: Fix a stack buffer overflow in the user passthru ioctlMark Johnston2021-01-081-16/+16
| | | | | | | | | | | | | | | | | Previously we copied in the request into a stack-allocated structure that could be smaller than the request size. Furthermore, we checked the request size only after doing the copyin. Fix this by allocating a buffer to hold the request, then copying the buffer's contents into a command descriptor. This is a bit heavy-handed but I expect the overhead will not be noticeable. The approach of coping the header in first is susceptible to TOCTOU problems. Reviewed by: imp Reported by: maxpl0it@protonmail.com MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27963
* Make MAXPHYS tunable. Bump MAXPHYS to 1M.Konstantin Belousov2020-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225 Notes: svn path=/head/; revision=368124
* Introduce support of SCSI Command Priority.Alexander Motin2020-10-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SAM-3 specification introduced concept of Task Priority, that was renamed to Command Priority in SAM-4, and supported by all modern SCSI transports. It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and 0 - default. SAT specification for SATA devices translates priorities 1-3 into NCQ high priority. This change adds new "priority" field into empty spots of struct ccb_scsiio and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL. Respective support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers for both initiator and where applicable target roles. Minimal support was added to CTL to receive the priority value from different frontends, pass it between HA controllers and report in few places. This patch does not add consumers of this functionality, so nothing should really change yet, since the field is still set to 0 (default) on initiator and not actively used on target. Those are to be implemented separately. I've confirmed priority working on WD Red SATA disks connected via mpr(4) and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4). While there, added missing tag_action support to ocs_fc(4) initiator role. MFC after: 1 month Relnotes: yes Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=367044
* Bring the request_descriptor union into harmony internally. NoScott Long2020-10-131-3/+3
| | | | | | | functional change. Notes: svn path=/head/; revision=366668
* Refine the busdma template interface. Provide tools for filling in fieldsScott Long2020-09-143-30/+24
| | | | | | | | | | that can be extended, but also ensure compile-time type checking. Refactor common code out of arch-specific implementations. Move the mpr and mps drivers to this new API. The template type remains visible to the consumer so that it can be allocated on the stack, but should be considered opaque. Notes: svn path=/head/; revision=365706
* Convert the mps driver to use busdma templatesScott Long2020-09-113-89/+46
| | | | Notes: svn path=/head/; revision=365644
* mps: clean up empty lines in .c and .h filesMateusz Guzik2020-09-0122-282/+14
| | | | Notes: svn path=/head/; revision=365203
* Remove extra memset() left after r342388.Alexander Motin2020-08-041-1/+0
| | | | | | | | | | | | | This memset() wiped MPI2_FUNCTION_SCSI_TASK_MGMT set by mprsas_alloc_tm(), that broke target reset on device removal, making later re-insertion into the same slot impossible, since firmware was still waiting for the driver to finish with the removed device. MFC after: 1 week Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=363852
* mpr(4), mps(4): Stop checking for failures from malloc(M_WAITOK).Mark Johnston2020-07-273-30/+0
| | | | | | | | | | | PR: 240545 Submitted by: Andrew Reiter <arr@watson.org> Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25766 Notes: svn path=/head/; revision=363608
* Add a small hack to the ioctl header files so that both mpr and mps canScott Long2020-04-161-0/+2
| | | | | | | | be included. This isn't a great solution, but fixing it correctly is a bigger task and this is the lesser of the existing evils. Notes: svn path=/head/; revision=360001
* Centralize compatability translation macros.Brooks Davis2020-04-141-10/+1
| | | | | | | | | | | | | | | | Copy the CP, PTRIN, etc macros from freebsd32.h into a sys/abi_compat.h and replace existing definitation with includes where required. This eliminates duplicate code and allows Linux and FreeBSD compatability headers to be included in the same files. Input from: cem, jhb Obtained from: CheriBSD MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24275 Notes: svn path=/head/; revision=359937
* Increase buffer in mprsas_log_command() from 192 to 224 bytes.Alexander Motin2020-03-131-1/+1
| | | | | | | | | | | 192 bytes are not enough to print long commands, such as ATA COMMAND PASS THROUGH(16), that makes debug output difficult to read. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=358959
* Remove support for all pre FreeBSD 11.0 versions from mpr and mps.Warner Losh2020-02-263-279/+2
| | | | | | | | | | Remove a number of workarounds for older versions of FreeBSD. FreeBSD stable/10 was branched over 6 years ago. All of these changes date from about that time or earlier. These workarounds are extensive and get in the way of understanding the current flow in the driver. Notes: svn path=/head/; revision=358351
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-6/+10
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Before issing the REMOVE_DEVICE command to the firmware, make sure that allWarner Losh2020-02-252-18/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commands have completed. It's not OK to force complete any pending commands before we send the REMOVE_DEVICE. Instead, make sure that all pending commands are complete before sending that. By trying to second guess the firmware here, we run the risk of completing commands twice, which leads to corruption. This removes the forced completion of commands introduced in r218811. So it's a partial backout of that commit, but replaces it with a more rebust mechanism. Either these commands will complete due to the TARGET RESET, or they will timeout and be aborted, but they will all complete. Add assert that all commands are complete to REMOVE_DEVICE completion routine. We attempt to assure this programatically, so we shouldn't have any commands in the queue because we've waited for them all. Any commands that make it into our action routine after we mark the target in removal will complete immediately with an error. When we're removing a target that's not a volume, advertise up the stack that it's actually gone, as opposed to having a transient selection error we should retry. Do this both in the action routine, and when we get a notification of an aborted command. We don't do this for volumes because the driver tries hard not to advertise to the OS a volume has disappeared. Apply these changes to both mpr and mps since they are based on quite similar designs. Discussed with: scottl@ Differential Revision: https://reviews.freebsd.org/D23768 Notes: svn path=/head/; revision=358308
* Advertise the MPI Message Version that's contained in the IOCFacts messageScott Long2020-02-072-2/+13
| | | | | | | | | | | | in the sysctl block for the driver. mpsutil/mprutil needs this so it can know how big of a buffer to allocate when requesting the IOCFacts from the controller. This eliminates the kernel console messages about wrong allocation sizes. Reported by: imp Notes: svn path=/head/; revision=357651
* mps(4): add missing cam(4) dependencyConrad Meyer2020-01-191-0/+1
| | | | | | | | | | | | | | | | | On a MINIMAL kernel, mps.ko wouldn't load because it uses the xpt_hold_boot symbol from CAM, but didn't have a dependency on cam(4). (CEM: Some context: when linking loaded modules, the kernel dynamic linker only looks for definitions in explictly marked dependency modules. Also, the identical mpr(4) driver uses the same CAM function, but already had the correct MODULE_DEPEND(), so no similar change is needed there.) Submitted by: Greg V <greg AT unrelenting.technology> Reviewed by: imp, myself Differential Revision: https://reviews.freebsd.org/D23272 Notes: svn path=/head/; revision=356901
* Fix leak in state machine for commands.Warner Losh2019-11-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get a device departed message from the firmware, we send a TARGET_REST to the device to let the firmware know we're done and as part of the recovery process. This will abort all the commands. While the documentation says the IOC is responsible for writing the completion message for all the commands pending with an aborted status, we sometimes have queued commands for the target that haven't been completed so are in the INQUEUE state. So, when we later complete the pending CCB as aborted, these commands are freed and we hit the "state not busy" panic. Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do that here as well. In talking to Ken, Scott and Justin, they recommended a series of tests to see if this is 100% safe. Those tests are ongoing, but preliminary tests suggest this is safe as we see no duplicate completions when we hit this case at work. We have a machine that has a dodgy powersupply which usually doesn't apply power to a few drives, but sometimes does when the machine is under heavy load so we get a rash of the connect / disconnect messages over half an hour. Without this change, we'd see state not busy panic. With this change, the drives just annoyingly come and go without affecting the rest of the machine, but without a complete error injection test suite, it's hard to know if all edge cases are now covered or not. Discussed with: scottl, ken, gibbs Notes: svn path=/head/; revision=355056
* Fix bugs in recovery path and improve cm trackingWarner Losh2019-07-084-17/+35
| | | | | | | | | | | | | | | | | | | | | Eliminate the TIMEDOUT state. This state really conveyed two different concepts: I timed out during recovery (and my command got put on the recovery queue), and I timed out diring discovery (which doesn't). Separate those two concepts into two flags. Use the TIMEDOUT flag to fail requests as timed out. Use the on queue flag to remove them from the queue. In mps_intr_locked for MPI2_RPY_DESCRIPT_FLAGS_ADDRESS_REPLY message type, when completing commands, ignore the ones that are not in state INQUEUE. They were already completed as part of the recovery process. When we complete them twice, we wind up with entries on the free queue that are marked as busy, trigging asserts. Reviewed by: scottl (earlier version, just for mpr) Differential Revision: https://reviews.freebsd.org/D20785 Notes: svn path=/head/; revision=349849
* Fix busy status leak in case of incorrect passthrough args.Alexander Motin2019-05-301-2/+4
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=348417
* Extract eventfilter declarations to sys/_eventfilter.hConrad Meyer2019-05-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped. Notes: svn path=/head/; revision=347984
* Add missing newline to debug printf.Warner Losh2019-05-081-1/+1
| | | | Notes: svn path=/head/; revision=347237
* Add missing break statements. Coverity CID 1400446.Scott Long2019-03-271-0/+2
| | | | | | | Reported by: mav Notes: svn path=/head/; revision=345573
* Add event table decoding for SAS Broadcast Primitive events.Scott Long2019-03-241-0/+10
| | | | Notes: svn path=/head/; revision=345485
* Fix a transposition error from the previous commitScott Long2019-03-241-1/+1
| | | | Notes: svn path=/head/; revision=345482