| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
This mirrors commit 6d0001d44490becdd20d627ce663c72a30b9aac3 but for
nvmf(4).
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D53339
|
| |
|
|
| |
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
| |
NVMe controllers advertise the largest supported size of a command
capsule in the controller data (IOCCSZ). Smart NIC offload transports
may have a cap on the size of the largest data PDU that can be
received. These transports can implement this hook to limit the
advertised IOCCSZ to limit the in-capsule-data payload sent by remote
hosts.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
If the transport has a data transfer limit, restrict I/O transfers to
the largest multiple of MPS that fits within the limit.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Smart NIC offload transports may have a cap on the size of the largest
data PDU that can be received. Allow these transports to enforce a
cap on the size of an I/O request submitted by the nvmf(4) host.
NB: The controller is able to advertise a maximum-supported PDU size
during TCP negotiation, but there is no way in the protocol to
advertise a maximum size that the host can receive.
Sponsored by: Chelsio Communications
|
| |
|
|
|
| |
Reviewed by: adrian, imp
Differential Revision: https://reviews.freebsd.org/D53254
|
| |
|
|
|
|
| |
- s/tranfers/transfers/
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
| |
This moves the checks previously under #ifdef STRICT in
nvmf_nqn_valid() into a separate helper for userland. This
requires that the NQN starts with "nqn.YYYY-MM." followed by at
least one additional character.
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D48767
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a timer in the nvmf(4) driver to periodically trigger a devctl
"RECONNECT" notification. A trigger in the /etc/devd/nvmf.conf file
invokes "nvmecontrol reconnect nvmeX" upon each notification. This
differs from iSCSI which uses a dedicated daemon (iscsid(8)) to wait
inside a custom ioctl for an iSCSI initiator event to occur, but I
think this design might be simpler.
Similar to nvme-cli, the interval between reconnection attempts is
specified in seconds by the --reconnect-delay argument to the connect
and reconnect commands. Note that nvme-cli uses -c for short letter
of this command, but that was already taken so nvmecontrol uses -r.
The default is 10 seconds to match Linux.
In addition, a second timeout can be used to force a full detach of a
disconnected the nvmeX device after the controller loss timeout
expires. The timeout for this is specified in seconds by the
--ctrl-loss-tmo/-l options (identical to nvme-cli). The default is
600 seconds.
Either of these timers can be disabled by setting the timer to 0. In
that case, the associated action (devctl notifications or full detach)
will not occur after a disconnect.
Note that this adds a dedicated taskqueue for nvmf tasks instead of
using taskqueue_thread as the controller loss task could deadlock
waiting for the completion of other tasks queued to taskqueue_thread.
(Specifically, tearing down the CAM SIM can trigger
destroy_dev_sched_cb() and waits for the callback to run, but the
callback is scheduled to run in a task on taskqueue_thread. Possibly,
destroy_dev_sched should be using a dedicated taskqueue.)
Reviewed by: imp (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D50222
|
| |
|
|
|
|
| |
Reviewed by: imp, jhb
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D50913
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Switch to using sys/stdarg.h for va_list type and va_* builtins.
Make an attempt to insert the include in a sensible place. Where
style(9) was followed this is easy, where it was ignored, aim for the
first block of sys/*.h headers and don't get too fussy or try to fix
other style bugs.
Reviewed by: imp
Exp-run by: antoine (PR 286274)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
|
| |
|
|
|
|
|
|
| |
The received command capsule was not freed after sending the success
response.
Fixes: a15f7c96a276 ("nvmft: The in-kernel NVMe over Fabrics controller")
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
| |
In particular, export a "port" entry as well as an array of "host"
entries for each active connection.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48775
|
| |
|
|
|
|
| |
This is needed to avoid LORs for a following commit.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
| |
This returns an nvlist indicating if a Fabrics host is connected and
the time of the most recent disconnection.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48219
|
| |
|
|
|
|
|
|
|
|
|
| |
Both nvme and nvmf cache a copy of the controller's identify data in
the softc. Add an ioctl to fetch this copy of the cdata. This is
primarily useful for allowing commands like 'nvmecontrol devlist' to
work against a disconnected Fabrics host.
Reviewed by: dab, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48218
|
| |
|
|
|
|
|
| |
This is generally harmless but can trigger spurious warnings on the
console due to duplicate attempts to disable LUNs.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Save more data associated with a new association including the network
address of the remote controller. This permits reconnecting an
association without providing the address or other details. To use
this new mode, provide only an existing device ID to nvmecontrol's
reconnect command. An address can still be provided to request a
different address or other different settings for the new association.
The saved data includes an entire Discovery Log page entry to aim to
be compatible with other transports in the future. When a remote
controller is connected to via a Discovery Log page entry (nvmecontrol
connect-all), the raw entry is used. When a remote controller is
connected to via an explicit address, an entry is synthesized from the
parameters.
Note that this is a pseudo-ABI break for the ioctls used by nvmf(4) in
that the nvlists for handoff and reconnect now use a slightly
different set of elements. Since this is only present in main I did
not bother implementing compatability shims.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48214
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nda(4) has its own shutdown handler that runs at SHUTDOWN_PRI_DEFAULT
that calls ndaflush() that could run after the nvmf handler. Instead,
give a the flush a chance to run before the graceful shutdown of the
controller.
While here, be a bit more defensive in the post-sync case and shutdown
the consumers (sim and /dev/nvmeXnY devices) before destroying the
queue pairs so that if any requests are submitted after the post-sync
handler they fail gracefully instead of trying to use a destroyed
queue pair.
Reported by: Sony Arpita Das <sonyarpitad@chelsio.com>
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
| |
<assert.h> now has a usable definition, so we don't need to shim it out
in the nvmf header anymore.
Reviewed by: emaste, jhb
Differential Revision: https://reviews.freebsd.org/D48078
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For requests that handoff queues from userspace to the kernel as well
as the request to fetch reconnect parameters from the kernel, switch
from using flat structures to nvlists. In particular, this will
permit adding support for additional transports in the future without
breaking the ABI of the structures.
Note that this is an ABI break for the ioctls used by nvmf(4) and
nvmft(4). Since this is only present in main I did not bother
implementing compatability shims.
Inspired by: imp (suggestion on a different review)
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D48230
|
| |
|
|
| |
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
| |
This isn't really needed since the host driver never submits more
commands to a queue than it can hold, but I noticed that the
recently-added SQ head and tail sysctl nodes were not updating. This
fixes that and also uses these values to assert that there we never
submit a command while a queue pair is full.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
| |
Similar to nvme(4), use the current CPU to select which I/O queue to
use. The assignment in nvmf_attach() had to be moved down since
sc->num_io_queues is initialized in nvmf_establish_connection().
Note that nvmecontrol(8) still defaults to using a single I/O queue
for an association.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The active namespace list query fetches namespaces greater than the
passed in namespace ID, not greater than or equal to the passed in
namespace ID. Thus, a multi-page request should start with the last
namespace ID from the previous page, not that ID plus 1.
While here, make use of NVME_GLOBAL_NAMESPACE_TAG instead of a magic
number to handle the edge case that the last namespace ID in a page is
the largest valid namespace ID.
Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D47393
|
| |
|
|
|
|
|
| |
Previously the handler was removed from the wrong eventhandler list.
Fixes: f46d4971b5af nvmf: Handle shutdowns more gracefully
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
| |
PDU data alignment (PDA) isn't necessarily a power of 2, just a
multiple of 4, so use roundup() instead of roundup2() to compute the
PDU data offset (PDO).
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
These report the queue size, queue head, queue tail, and the number of
commands submitted.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
| |
Some M_EXTPG mbufs are read-only (e.g. those backing sendfile
requests), but others are not. Add a flags argument to
mb_alloc_ext_pgs that can be used to set M_RDONLY when needed rather
than setting it unconditionally. Update mb_unmapped_to_ext to
preserve M_RDONLY from the unmapped mbuf.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D46783
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this just dereferenced NULL qp pointers and panicked.
Instead, use a shared lock on the connection lock to protect access to
the qp pointers and allocate a request. If the controller is not
associated, fail the request with ECONNABORTED.
Possibly this should be honoring kern.nvmf.fail_on_disconnection and
block waiting for a reconnect request while disconnected if that
tunable is false.
Reported by: Suhas Lokesha <suhas@chelsio.com>
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
nvmf_submit_request() handles races with concurrent queue pair
destruction (or the queue pair being destroyed between
nvmf_allocate_request and nvmf_submit_request), so the lock is not
needed here. This avoids holding the lock across transport-specific
logic such as queueing mbufs for PDUs to a socket buffer, etc.
Holding the lock across nvmf_allocate_request() ensures that the queue
pair pointers in the softc are still valid as shutdown attempts will
block on the lock before destroying the queue pairs.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The last reference on a pending I/O request might be held by an mbuf
in the socket buffer. When this mbuf is freed, the I/O request is
completed which triggers completion of the CCB. However, this can
occur with locks held (e.g. with so_snd locked when the mbuf is freed
by sbdrop()) raising a LOR between so_snd and the CAM device lock.
Instead, defer CCB completion processing to a thread where locks are
not held.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
| |
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46778
|
| |
|
|
|
|
|
|
|
|
|
| |
Some block devices may request datamove operations from an ithread
context while holding locks. Queue datamove operations to a taskqueue
backed by a thread pool to safely permit blocking allocations, etc. in
datamove handling.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46551
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous version of tcp_send_controller_data avoided sending a
chain of multiple mbufs that exceeded the limit, but if an individual
mbuf was larger than the limit it was sent as a single, over-sized
PDU. Fix by using m_split() to split individual mbufs larger than the
limit.
Note that this is not a protocol error, per se, as there is no limit
on C2H_DATA PDU lengths (unlike the MAXH2CDATA parameter). This fix
just honors the administrative limit more faithfully. This case is
also very unlikely with the default limit of 256k.
Sponsored by: Chelsio Communications
|
| |
|
|
| |
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
The increment of 1 was intended to convert qp->maxr2t from 0's based
to 1 based before multiplying by the queue length.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
|
| |
If a queue pair is destroyed (e.g. due to the TCP connection dropping)
while a host to controller data transfer is in progress, the
pending_r2ts counter can be non-zero. This can later trigger an
assertion failure when the capsule is freed. To fix, update the
relevant R2T accounting stats when aborting active command buffers
during queue pair destruction.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
Specifically, some parameters only apply to either controller or host
queue pairs but not both.
Sponsored by: Chelsio Communications
|
| |
|
|
| |
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
This sysctl sets a cap on the maximum payload of transmitted data PDUs
including both C2H_DATA and H2C_DATA PDUs, not just C2H_DATA PDUs.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
| |
If the transport fails to create a queue pair, fail with an error
rather than dereferencing a NULL pointer.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
|
|
|
|
| |
If a PDU (such as a Command Capsule PDU) on a connection that has
enabled data digests does not have a data section, it will not have
the the PDU data digest flag set. The previous check was requiring
this flag to be present on all PDU types that support data sections
even if no data was included in the PDU.
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
| |
No functional change intended.
MFC after: 1 week
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If an association is disconnected during a clean shutdown, abort all
pending and future I/O requests with an error to avoid hangs either due
to filesystem unmounts or a stuck GEOM event.
If an association is connected during a clean shutdown, gracefully
disconnect from the remote controller and close the open queues.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45462
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add a kern.nvmf.fail_on_disconnection sysctl similar to the
kern.iscsi.fail_on_disconnection sysctl. This causes pending I/O
requests to fail with an error if an association is disconnected
instead of requeueing to be retried once the association is
reconnected. As with iSCSI, the default is to queue and retry
operations.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45308
|
| |
|
|
|
|
|
|
|
|
| |
While a host was disconnected from a remote controller, namespaces
might have been added, removed, or altered properties. Rescan the
namespaces after reconnecting to detect any such changes.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45461
|
| |
|
|
|
|
|
|
| |
Previously this just punted with a warning message.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45460
|
| |
|
|
|
|
|
|
|
| |
This function accepts a namespace ID and associated namespace data
from IDENTIFY and takes care of updating nvmeXnY and ndaZ.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45459
|
| |
|
|
|
|
|
|
|
|
|
| |
Rename to nvmf_scan_active_namespaces and accept an additional
callback function and callback argument. The callback is invoked on
each active namespace enumerated by the active namespace list from the
IDENTIFY command.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45458
|