aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/nvmf
Commit message (Collapse)AuthorAgeFilesLines
* nvmf: Add support for DIOCGIDENTJohn Baldwin2025-11-172-0/+6
| | | | | | | | This mirrors commit 6d0001d44490becdd20d627ce663c72a30b9aac3 but for nvmf(4). Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D53339
* nvmft: Honor any IOCCSZ limit imposed by the transportJohn Baldwin2025-11-103-0/+21
| | | | Sponsored by: Chelsio Communications
* nvmf: Add a transport hook to limit the maximum command capsule sizeJohn Baldwin2025-11-104-0/+22
| | | | | | | | | | | NVMe controllers advertise the largest supported size of a command capsule in the controller data (IOCCSZ). Smart NIC offload transports may have a cap on the size of the largest data PDU that can be received. These transports can implement this hook to limit the advertised IOCCSZ to limit the in-capsule-data payload sent by remote hosts. Sponsored by: Chelsio Communications
* nvmf: Honor any data transfer limit imposed by the transportJohn Baldwin2025-11-103-3/+17
| | | | | | | If the transport has a data transfer limit, restrict I/O transfers to the largest multiple of MPS that fits within the limit. Sponsored by: Chelsio Communications
* nvmf: Add a transport hook to limit the size of host data transfersJohn Baldwin2025-11-104-0/+24
| | | | | | | | | | | | Smart NIC offload transports may have a cap on the size of the largest data PDU that can be received. Allow these transports to enforce a cap on the size of an I/O request submitted by the nvmf(4) host. NB: The controller is able to advertise a maximum-supported PDU size during TCP negotiation, but there is no way in the protocol to advertise a maximum size that the host can receive. Sponsored by: Chelsio Communications
* sbuf_delete() after sbuf_finish() & add SBUF_INCLUDENULDavid E. O'Brien2025-10-311-1/+1
| | | | | Reviewed by: adrian, imp Differential Revision: https://reviews.freebsd.org/D53254
* nvmf: Fix a typo in a source code commentGordon Bergling2025-08-251-1/+1
| | | | | | - s/tranfers/transfers/ MFC after: 3 days
* libnvmf: Add nvmf_nqn_valid_strict() functionJohn Baldwin2025-07-281-40/+0
| | | | | | | | | | This moves the checks previously under #ifdef STRICT in nvmf_nqn_valid() into a separate helper for userland. This requires that the NQN starts with "nqn.YYYY-MM." followed by at least one additional character. Reviewed by: asomers Differential Revision: https://reviews.freebsd.org/D48767
* nvmf: Auto-reconnect periodically after a disconnectJohn Baldwin2025-07-093-5/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a timer in the nvmf(4) driver to periodically trigger a devctl "RECONNECT" notification. A trigger in the /etc/devd/nvmf.conf file invokes "nvmecontrol reconnect nvmeX" upon each notification. This differs from iSCSI which uses a dedicated daemon (iscsid(8)) to wait inside a custom ioctl for an iSCSI initiator event to occur, but I think this design might be simpler. Similar to nvme-cli, the interval between reconnection attempts is specified in seconds by the --reconnect-delay argument to the connect and reconnect commands. Note that nvme-cli uses -c for short letter of this command, but that was already taken so nvmecontrol uses -r. The default is 10 seconds to match Linux. In addition, a second timeout can be used to force a full detach of a disconnected the nvmeX device after the controller loss timeout expires. The timeout for this is specified in seconds by the --ctrl-loss-tmo/-l options (identical to nvme-cli). The default is 600 seconds. Either of these timers can be disabled by setting the timer to 0. In that case, the associated action (devctl notifications or full detach) will not occur after a disconnect. Note that this adds a dedicated taskqueue for nvmf tasks instead of using taskqueue_thread as the controller loss task could deadlock waiting for the completion of other tasks queued to taskqueue_thread. (Specifically, tearing down the CAM SIM can trigger destroy_dev_sched_cb() and waits for the callback to run, but the callback is scheduled to run in a task on taskqueue_thread. Possibly, destroy_dev_sched should be using a dedicated taskqueue.) Reviewed by: imp (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D50222
* newbus: replace leftover device unit wildcardsAhmad Khalifa2025-06-211-1/+1
| | | | | | Reviewed by: imp, jhb Approved by: imp (mentor) Differential Revision: https://reviews.freebsd.org/D50913
* machine/stdarg.h -> sys/stdarg.hBrooks Davis2025-06-111-1/+1
| | | | | | | | | | | | | Switch to using sys/stdarg.h for va_list type and va_* builtins. Make an attempt to insert the include in a sensible place. Where style(9) was followed this is easy, where it was ignored, aim for the first block of sys/*.h headers and don't get too fussy or try to fix other style bugs. Reviewed by: imp Exp-run by: antoine (PR 286274) Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
* nvmft: Fix a resource leak for SET_FEATURES/ASYNC_EVENT_CONFIGURATIONJohn Baldwin2025-05-301-0/+1
| | | | | | | | The received command capsule was not freed after sending the success response. Fixes: a15f7c96a276 ("nvmft: The in-kernel NVMe over Fabrics controller") Sponsored by: Chelsio Communications
* nvmft: Export more info for a ctl port for use by ctladmJohn Baldwin2025-02-202-0/+27
| | | | | | | | | In particular, export a "port" entry as well as an array of "host" entries for each active connection. Reviewed by: asomers Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D48775
* nvmft: Switch the per-port lock from sx(9) to mtx(9)John Baldwin2025-02-203-44/+62
| | | | | | This is needed to avoid LORs for a following commit. Sponsored by: Chelsio Communications
* nvmf: Add NVMF_CONNECTION_STATUS ioctlJohn Baldwin2025-01-313-0/+37
| | | | | | | | | This returns an nvlist indicating if a Fabrics host is connected and the time of the most recent disconnection. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D48219
* nvme/nvmf: Add NVME_GET_CONTROLLER_DATA ioctl to fetch cached cdataJohn Baldwin2025-01-311-0/+3
| | | | | | | | | | | Both nvme and nvmf cache a copy of the controller's identify data in the softc. Add an ioctl to fetch this copy of the cdata. This is primarily useful for allowing commands like 'nvmecontrol devlist' to work against a disconnected Fabrics host. Reviewed by: dab, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D48218
* nvmft: Don't offline a port being removed if it is already offlineJohn Baldwin2025-01-311-1/+7
| | | | | | | This is generally harmless but can trigger spurious warnings on the console due to duplicate attempts to disable LUNs. Sponsored by: Chelsio Communications
* nvmf: Refactor reconnection supportJohn Baldwin2025-01-243-22/+48
| | | | | | | | | | | | | | | | | | | | | | | | Save more data associated with a new association including the network address of the remote controller. This permits reconnecting an association without providing the address or other details. To use this new mode, provide only an existing device ID to nvmecontrol's reconnect command. An address can still be provided to request a different address or other different settings for the new association. The saved data includes an entire Discovery Log page entry to aim to be compatible with other transports in the future. When a remote controller is connected to via a Discovery Log page entry (nvmecontrol connect-all), the raw entry is used. When a remote controller is connected to via an explicit address, an entry is synthesized from the parameters. Note that this is a pseudo-ABI break for the ioctls used by nvmf(4) in that the nvlists for handoff and reconnect now use a slightly different set of elements. Since this is only present in main I did not bother implementing compatability shims. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D48214
* nvmf: Defer the post-sync shutdown handler to SHUTDOWN_PRI_LASTJohn Baldwin2025-01-131-1/+13
| | | | | | | | | | | | | | | | nda(4) has its own shutdown handler that runs at SHUTDOWN_PRI_DEFAULT that calls ndaflush() that could run after the nvmf handler. Instead, give a the flush a chance to run before the graceful shutdown of the controller. While here, be a bit more defensive in the post-sync case and shutdown the consumers (sim and /dev/nvmeXnY devices) before destroying the queue pairs so that if any requests are submitted after the post-sync handler they fail gracefully instead of trying to use a destroyed queue pair. Reported by: Sony Arpita Das <sonyarpitad@chelsio.com> Sponsored by: Chelsio Communications
* nvmf: fix build with __assert_unreachable() addition to userlandKyle Evans2025-01-131-1/+0
| | | | | | | | <assert.h> now has a usable definition, so we don't need to shim it out in the nvmf header anymore. Reviewed by: emaste, jhb Differential Revision: https://reviews.freebsd.org/D48078
* nvmf: Switch several ioctls to using nvlistsJohn Baldwin2024-12-3013-207/+399
| | | | | | | | | | | | | | | | | For requests that handoff queues from userspace to the kernel as well as the request to fetch reconnect parameters from the kernel, switch from using flat structures to nvlists. In particular, this will permit adding support for additional transports in the future without breaking the ABI of the structures. Note that this is an ABI break for the ioctls used by nvmf(4) and nvmft(4). Since this is only present in main I did not bother implementing compatability shims. Inspired by: imp (suggestion on a different review) Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D48230
* nvmft: Fix typo in error message if an I/O queue fails to handoffJohn Baldwin2024-12-281-1/+1
| | | | Sponsored by: Chelsio Communications
* nvmf: Track SQ flow controlJohn Baldwin2024-11-113-5/+37
| | | | | | | | | | This isn't really needed since the host driver never submits more commands to a queue than it can hold, but I noticed that the recently-added SQ head and tail sysctl nodes were not updating. This fixes that and also uses these values to assert that there we never submit a command while a queue pair is full. Sponsored by: Chelsio Communications
* nvmf: Schedule requests across multiple I/O queuesJohn Baldwin2024-11-112-5/+5
| | | | | | | | | | | Similar to nvme(4), use the current CPU to select which I/O queue to use. The assignment in nvmf_attach() had to be moved down since sc->num_io_queues is initialized in nvmf_establish_connection(). Note that nvmecontrol(8) still defaults to using a single I/O queue for an association. Sponsored by: Chelsio Communications
* nvmf: Fix an off by one error when scanning active namespace IDsJohn Baldwin2024-11-051-2/+2
| | | | | | | | | | | | | | | The active namespace list query fetches namespaces greater than the passed in namespace ID, not greater than or equal to the passed in namespace ID. Thus, a multi-page request should start with the last namespace ID from the previous page, not that ID plus 1. While here, make use of NVME_GLOBAL_NAMESPACE_TAG instead of a magic number to handle the edge case that the last namespace ID in a page is the largest valid namespace ID. Reviewed by: chuck Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D47393
* nvmf: Deregister the post_sync eventhandler correctly during detachJohn Baldwin2024-11-021-1/+1
| | | | | | | Previously the handler was removed from the wrong eventhandler list. Fixes: f46d4971b5af nvmf: Handle shutdowns more gracefully Sponsored by: Chelsio Communications
* nvmf_tcp: Correct padding calculationJohn Baldwin2024-11-021-1/+1
| | | | | | | | PDU data alignment (PDA) isn't necessarily a power of 2, just a multiple of 4, so use roundup() instead of roundup2() to compute the PDU data offset (PDO). Sponsored by: Chelsio Communications
* nvmf: Add sysctl nodes for each queue pairJohn Baldwin2024-11-023-4/+53
| | | | | | | These report the queue size, queue head, queue tail, and the number of commands submitted. Sponsored by: Chelsio Communications
* mbuf: Don't force all M_EXTPG mbufs to be read-onlyJohn Baldwin2024-10-311-1/+1
| | | | | | | | | | | Some M_EXTPG mbufs are read-only (e.g. those backing sendfile requests), but others are not. Add a flags argument to mb_alloc_ext_pgs that can be used to set M_RDONLY when needed rather than setting it unconditionally. Update mb_unmapped_to_ext to preserve M_RDONLY from the unmapped mbuf. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D46783
* nvmf: Fail pass through commands while a controller is not associatedJohn Baldwin2024-10-171-0/+9
| | | | | | | | | | | | | | Previously this just dereferenced NULL qp pointers and panicked. Instead, use a shared lock on the connection lock to protect access to the qp pointers and allocate a request. If the controller is not associated, fail the request with ECONNABORTED. Possibly this should be honoring kern.nvmf.fail_on_disconnection and block waiting for a reconnect request while disconnected if that tunable is false. Reported by: Suhas Lokesha <suhas@chelsio.com> Sponsored by: Chelsio Communications
* nvmf: Narrow scope of sim lock in nvmf_sim_ioJohn Baldwin2024-09-261-2/+1
| | | | | | | | | | | | | | nvmf_submit_request() handles races with concurrent queue pair destruction (or the queue pair being destroyed between nvmf_allocate_request and nvmf_submit_request), so the lock is not needed here. This avoids holding the lock across transport-specific logic such as queueing mbufs for PDUs to a socket buffer, etc. Holding the lock across nvmf_allocate_request() ensures that the queue pair pointers in the softc are still valid as shutdown attempts will block on the lock before destroying the queue pairs. Sponsored by: Chelsio Communications
* nvmf: Always use xpt_done instead of xpt_done_directJohn Baldwin2024-09-261-1/+1
| | | | | | | | | | | | The last reference on a pending I/O request might be held by an mbuf in the socket buffer. When this mbuf is freed, the I/O request is completed which triggers completion of the CCB. However, this can occur with locks held (e.g. with so_snd locked when the mbuf is freed by sbdrop()) raising a LOR between so_snd and the CAM device lock. Instead, defer CCB completion processing to a thread where locks are not held. Sponsored by: Chelsio Communications
* ctl: Move extern for control_softc into <cam/ctl/ctl_private.h>John Baldwin2024-09-251-2/+0
| | | | | | Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D46778
* nvmft: Defer datamove operations to a pool of taskqueue threadsJohn Baldwin2024-09-243-2/+112
| | | | | | | | | | | Some block devices may request datamove operations from an ithread context while holding locks. Queue datamove operations to a taskqueue backed by a thread pool to safely permit blocking allocations, etc. in datamove handling. Reviewed by: asomers Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D46551
* nvmf_tcp: Fully honor kern.nvmf.tcp.max_transmit_data for C2H_DATA PDUsJohn Baldwin2024-09-051-12/+19
| | | | | | | | | | | | | | | The previous version of tcp_send_controller_data avoided sending a chain of multiple mbufs that exceeded the limit, but if an individual mbuf was larger than the limit it was sent as a single, over-sized PDU. Fix by using m_split() to split individual mbufs larger than the limit. Note that this is not a protocol error, per se, as there is no limit on C2H_DATA PDU lengths (unlike the MAXH2CDATA parameter). This fix just honors the administrative limit more faithfully. This case is also very unlikely with the default limit of 256k. Sponsored by: Chelsio Communications
* nvmfd/nvmft: Fix a typo "whiled" -> "while"John Baldwin2024-09-031-1/+1
| | | | Sponsored by: Chelsio Communications
* nvmf_tcp: Correct calculation of number of TTAGs to allocateJohn Baldwin2024-07-301-1/+1
| | | | | | | The increment of 1 was intended to convert qp->maxr2t from 0's based to 1 based before multiplying by the queue length. Sponsored by: Chelsio Communications
* nvmf_tcp: Update R2T accounting stats when aborting command buffersJohn Baldwin2024-07-301-0/+5
| | | | | | | | | | | If a queue pair is destroyed (e.g. due to the TCP connection dropping) while a host to controller data transfer is in progress, the pending_r2ts counter can be non-zero. This can later trigger an assertion failure when the capsule is freed. To fix, update the relevant R2T accounting stats when aborting active command buffers during queue pair destruction. Sponsored by: Chelsio Communications
* nvmf_tcp: Avoid setting some unused parameters in tcp_allocate_qpairJohn Baldwin2024-07-301-2/+3
| | | | | | | Specifically, some parameters only apply to either controller or host queue pairs but not both. Sponsored by: Chelsio Communications
* nvmf_tcp: Use min() to simplify a few statementsJohn Baldwin2024-07-301-9/+3
| | | | Sponsored by: Chelsio Communications
* nvmf_tcp: Rename max_c2hdata sysctl to max_transmit_dataJohn Baldwin2024-07-251-1/+1
| | | | | | | This sysctl sets a cap on the maximum payload of transmitted data PDUs including both C2H_DATA and H2C_DATA PDUs, not just C2H_DATA PDUs. Sponsored by: Chelsio Communications
* nvmft: Handle qpair allocation failures during handoffJohn Baldwin2024-07-231-0/+10
| | | | | | | If the transport fails to create a queue pair, fail with an error rather than dereferencing a NULL pointer. Sponsored by: Chelsio Communications
* nvmf_tcp: Don't require a data digest for PDUs without dataJohn Baldwin2024-07-221-10/+14
| | | | | | | | | | If a PDU (such as a Command Capsule PDU) on a connection that has enabled data digests does not have a data section, it will not have the the PDU data digest flag set. The previous check was requiring this flag to be present on all PDU types that support data sections even if no data was included in the PDU. Sponsored by: Chelsio Communications
* nvmf: Use device_set_descf()Mark Johnston2024-06-161-3/+1
| | | | | | No functional change intended. MFC after: 1 week
* nvmf: Handle shutdowns more gracefullyJohn Baldwin2024-06-054-7/+114
| | | | | | | | | | | | | If an association is disconnected during a clean shutdown, abort all pending and future I/O requests with an error to avoid hangs either due to filesystem unmounts or a stuck GEOM event. If an association is connected during a clean shutdown, gracefully disconnect from the remote controller and close the open queues. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45462
* nvmf: Permit failing I/O requests while disconnectedJohn Baldwin2024-06-054-7/+36
| | | | | | | | | | | | | Add a kern.nvmf.fail_on_disconnection sysctl similar to the kern.iscsi.fail_on_disconnection sysctl. This causes pending I/O requests to fail with an error if an association is disconnected instead of requeueing to be retried once the association is reconnected. As with iSCSI, the default is to queue and retry operations. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45308
* nvmf: Rescan namespaces after reconnectingJohn Baldwin2024-06-051-0/+2
| | | | | | | | | | While a host was disconnected from a remote controller, namespaces might have been added, removed, or altered properties. Rescan the namespaces after reconnecting to detect any such changes. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45461
* nvmf: Rescan all namespaces if the changed NS log page is too largeJohn Baldwin2024-06-053-1/+51
| | | | | | | | Previously this just punted with a warning message. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45460
* nvmf: Factor out most of nvmf_rescan_ns into a helper routineJohn Baldwin2024-06-051-22/+30
| | | | | | | | | This function accepts a namespace ID and associated namespace data from IDENTIFY and takes care of updating nvmeXnY and ndaZ. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45459
* nvmf: Refactor nvmf_add_namespaces to be more genericJohn Baldwin2024-06-051-27/+47
| | | | | | | | | | | Rename to nvmf_scan_active_namespaces and accept an additional callback function and callback argument. The callback is invoked on each active namespace enumerated by the active namespace list from the IDENTIFY command. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D45458