| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
MFC after: 1 week
(cherry picked from commit dd256c3fa738b6941f58355c077224c9a227b169)
(cherry picked from commit 6edef6b6364162932379bd172930d82b067177cf)
|
| |
|
|
|
|
|
| |
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44845
(cherry picked from commit 5e3e4442305d9e5af9862fac73feb0d7f37d4b56)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the DIOCGIDENT ioctl to both nvme controller device
nodes and namespace device nodes.
This information was already available via the nda(4) device node.
However, mapping /dev/nvmeX to /dev/ndaY device nodes is not
straightforward, so it's better to get it directly from the /dev/nvme
device node.
PR: 290259
Sponsored by: ConnectWise
Submitted by: imp (mostly)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1875
(cherry picked from commit 6d0001d44490becdd20d627ce663c72a30b9aac3)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NVMe spec allows the Table BIR (TBIR) and PBA DIR (PBIR) to
be 0, 4, or 5. The existing NVMe driver basically only has support
for 4, perhaps under the assumption that BAR4 is 64-bit and also
occupies BAR5.
This change adds support for BAR5, covering the case where BAR4
and BAR5 might both be present and 32-bit, where the Table BIR
might be 4 and the PBA BIR might be 5, or vice versa.
The NVMe spec (in the SR-IOV section) also permits VFs to use BIR=2,
so I haven't added stricter checks on which BIR will be permitted
by the driver.
This enables FreeBSD on Google Compute Engine C4 Machines.
MFC after: 3 days
Reviewed by: imp
Sponsored by: Google
Co-authored-by: Matt Delco <delco@google.com>
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Differential Revision: https://reviews.freebsd.org/D53140
(cherry picked from commit 7b32f4f0a7fe9b1b2f5a3905ca15f656713255ad)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Rename delta_t to avoid misleading simplistic syntax highlighters
* Simplify the increment calculation
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D52973
(cherry picked from commit 4070ae0e9a60715199f83004e7ebdfb169fc8cfc)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Amazon EC2 m7i cloud instances use PCI hotplug rather than ACPI
hotplug. The card is removed and detach is called to remove the drive
from the system. The hardware is no longer present at this point, but
the bridge doesn't translate the now-missing hardware reads to all ff's
leading us to conclude the hardware is there and we need to do a proper
shutdown of it. Fix this oversight by asking the bridge if the device is
still present as well. We need both tests since some systems one cane
remove the card w/o a hotplug event and we want to fail-safe in those
cases.
Convert gone to a bool while I'm here and update a comment about
shutting down the controller and why that's important.
Tested by: cperciva
Sponsored by: Netflix
(cherry picked from commit dc95228d98474aba940e3885164912b419c5579d)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't post a AER for this page, so there's no need to be able to swap
it to host byte order. It's not one of the standard defined pages that
can post via AER, and the vendor's public docs for this temperature page
don't suggest it's possible to get over or under event changes. Since
nvmecontrol no longer needsd the swap routine, remove it since it's
now unused.
Sponsored by: Netflix
Reviewed by: chuck
Differential Revision: https://reviews.freebsd.org/D44659
(cherry picked from commit 97b77de2d951b4946fb3219a99c98f2dd4c0120f)
|
| |
|
|
|
|
|
|
| |
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44448
(cherry picked from commit 21d3a84db481e84cf240f6802b1a4110854eaec5)
|
| |
|
|
|
|
|
|
| |
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44447
(cherry picked from commit 7fa8adb8c5cd46979b76770794ac1b6584e8baa7)
|
| |
|
|
|
|
|
|
|
|
|
| |
This is used in NVMe over Fabrics to enumerate a list of available
controllers.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44446
(cherry picked from commit 88ecf154c7c5a2e413a81ae1b0511b0295265b99)
|
| |
|
|
|
|
|
|
| |
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44445
(cherry picked from commit b354bb04cb51f373e997cb8911c32dc93243c1d7)
|
| |
|
|
|
|
|
|
|
|
|
| |
nvme(4) doesn't check this flag, but Fabrics implementations may need
to set this flag in the log page attributes cdata field.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44444
(cherry picked from commit cbda1886ab1cd3ec2847b7da5136d3bb68d56101)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is not used in nvme(4) but is used in NVMe over Fabrics
transports which use SGLs to describe buffers instead of PRPs.
While here, adjust the shift value for the FUSE field to be relative
to the 'fuse' member of 'struct nvme_command'.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44443
(cherry picked from commit b8cb8dd3625d7396ea98152d89e1e64b16e77bc6)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fabrics capsules use an SGL structure instead of prp1/2 addresses to
describe the data buffer used for a command. The SGL structure is
added to a union with the existing prp1/2 fields.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44442
(cherry picked from commit f21a54d19080510bce279183f4bf07d5315bd179)
|
| |
|
|
|
|
|
|
|
|
| |
These are useful for NVMe over Fabrics.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44441
(cherry picked from commit 1931b75e004f25cf1d2db809bfd9baba40c04521)
|
| |
|
|
|
|
|
|
| |
Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43605
(cherry picked from commit 5650bd3fe8eff1043ef3df33b5bdd7b24b5f2bc0)
|
| |
|
|
|
|
|
|
|
|
|
| |
This macro accepts a field name and a value for the field and
constructs the shifted field value.
Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43604
(cherry picked from commit 3a477a9b70a34dc0686630599c27f022b1e07bd8)
|
| |
|
|
|
|
|
|
|
|
| |
A few of these omitted a shift of 0, but this is more consistent.
Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43602
(cherry picked from commit 8488fc417fc24af37fa6f1e880f09a5023670950)
|
| |
|
|
|
|
|
|
|
|
|
| |
The current macro always builds a full mask for a named field, so use
the M suffix for mask.
Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43601
(cherry picked from commit 1dade1f255ee535ad357211395b46188bece52dc)
|
| |
|
|
|
|
|
|
| |
Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43595
(cherry picked from commit 479680f235dd89cdabe0503312b3d23f322f4000)
|
| |
|
|
|
|
|
|
|
|
|
| |
In particular, don't try to byteswap the values as 64-bit integers and
always print a non-empty version as a string.
Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44121
(cherry picked from commit 7485926e09a08fbfe83a9bc908e7d43aaca4c172)
|
| |
|
|
|
|
| |
MFC after: 1 week
(cherry picked from commit b46c7b1ed4e5307c689df72ea8a0b69e02456905)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct nvme_hmb_desc contains a pad field which was not getting
initialized before being synced. This doesn't have much consequence but
triggers a report from KMSAN, which verifies that host-filled DMA memory
is initialized before it is made visible to the device. So, let's just
initialize it properly.
Reported by: KMSAN
Reviewed by: mav, imp
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43090
(cherry picked from commit d9b7301bb791faab48b6c7733c34078427b9a374)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty
namespace. In some cases like hot-plug it takes longer, triggering
timeout and controller resets after just 30 seconds. Linux for many
years has separate 60 seconds timeout for admin queue. This patch
does the same. And it is good to be consistent.
Sponsored by: iXsystems, Inc.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42454
(cherry picked from commit 8d6c0743e36e3cff9279c40468711a82db98df23)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running nvme passthrough commands through the ioctl interface
memory is mapped with vmapbuf() but not unmapped. This results in leaked
memory whenever a process executes an nvme passthrough command with a
data buffer. This can be replicated with a simple c function (error
checks skipped for brevity):
void leak_memory(int nvme_ns_fd, uint16_t nblocks) {
struct nvme_pt_command pt = {
.cmd = {
.opc = NVME_OPC_READ,
.cdw12 = nblocks - 1,
},
.len = nblocks * 512, // Assumes devices with 512 byte lba
.is_read = 1, // Reads and writes should both trigger leak
}
void *buf;
posix_memalign(&buf, nblocks * 512);
pt.buf = buf;
ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt);
free(buf);
}
Signed-off-by: David Sloan <david.sloan@eideticom.com>
PR: 273626
Reviewed by: imp, markj
MFC after: 1 week
(cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we're suspending, we get messages about waiting for the controller
to reset. These are in error: we're not waiting for it to reset. We put
the recovery state as part of suspending, so we should suppress these as
a false positive.
Also remove a stray debug that's left over from earlier versions of
the recovery code that no longer makes sense.
Sponsored by: Netflix
(cherry picked from commit 1d6021cd72689f54093af4ed77066a2f8abde664)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when we suspend, we need to tear down all the qpairs. We call
nvme_admin_qpair_abort_aers with the admin qpair lock held, but the
tracker it will call for the pending AER also locks it (recursively)
hitting an assert. This routine is called without the qpair lock held
when we destroy the device entirely in a number of places. Add an assert
to this effect and drop the qpair lock before calling it.
nvme_admin_qpair_abort_aers then locks the qpair lock to traverse the
list, dropping it around calls to nvme_qpair_complete_tracker, and
restarting the list scan after picking it back up.
Note: If interrupts are still running, there's a tiny window for these
AERs: If one fires just an instant after we manually complete it, then
we'll be fine: we set the state of the queue to 'waiting' and we ignore
interrupts while 'waiting'. We know we'll destroy all the queue state
with these pending interrupts before looking at them again and we know
all the TRs will have been completed or rescheduled. So either way we're
covered.
Also, tidy up the failure case as well: failing a queue is a superset of
disabling it, so no need to call disable first. This solves solves some
locking issues with recursion since we don't need to recurse.. Set the
qpair state of failed queues to RECOVERY_FAILED and stop scheduling the
watchdog. Assert we're not failed when we're enabling a qpair, since
failure currently is one-way. Make failure a little less verbose.
Next, kill the pre/post reset stuff. It's completely bogus since we
disable the qparis, we don't need to also hold the lock through the
reset: disabling will cause the ISR to return early. This keeps us from
recursing on the recovery lock when resuming. We only need the recovery
lock to avoid a specific race between the timer and the ISR.
Finally, kill NVME_RESET_2X. It'S been a major release since we put it
in and nobody has used it as far as I can tell. And it was a motivator
for the pre/post uglification.
These are all interrelated, so need to be done at the same time.
Sponsored by: Netflix
Reviewed by: jhb
Tested by: jhb (made sure suspend / resume worked)
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D41866
(cherry picked from commit da8324a9258f1791cd10423103c1746646e33104)
|
| |
|
|
|
|
|
|
|
|
|
| |
Normally, we poll the device every so often to see if commands have
timed out. However, we'll go into the recovery state as part of failing
the drive. To account for all possibilties, if we're failed when we get
into the polling function, just stop polling: Party is over.
Sponsored by: Netflix
(cherry picked from commit d95431624f934fe4740211738fc787808005b14e)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a basically uncontended spinlock that we take out while the ISR is
running. This has two effects: First, when we get a timeout, we can
safely call the nvme_qpair_process_completions w/o racing any ISRs.
Second, we can use it to ensure that we don't reset the card while
the ISRs are active (right now we just sleep and hope for the best,
which usually is fine, but not always).
Sponsored by: Netflix
MFC After: 2 weeks
Reviewed by: chuck, gallatin
Differential Revision: https://reviews.freebsd.org/D41452
(cherry picked from commit 8052b01e7e4113fa8296ce43c354116b0a1774b7)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Next phase of error recovery: Eliminate the REOVERY_START phase, since
we don't need to wait to start recovery. Eliminate the RECOVERY_RESET
phase since it is transient, we now transition from RECOVERY_NORMAL into
RECOVERY_WAITING.
In normal mode, read the status of the controller. If it is in failed
state, or appears to be hot-plugged, jump directly to reset which will
sort out the proper things to do. This will cause all pending I/O to
complete with an abort status before the reset.
When in the NORMAL state, call the interrupt handler. This will complete
all pending transactions when interrupts are broken or temporarily
misbehaving. We then check all the pending completions for timeouts. If
we have abort enabled, then we'll send an abort. Otherwise we'll assume
the controller is wedged and needs a reset. By calling the interrupt
handler here, we'll avoid an issue with the current code where we
transitioned to RECOVERY_START which prevented any completions from
happening. Now completions happen. In addition and follow-on I/O that is
scheduled in the completion routines will be submitted, rather than
queued, because the recovery state is correct. This also fixes a problem
where I/O would timeout, but never complete, leading to hung I/O.
Resetting remains the same as before, just when we chose to reset has
changed.
A nice side effect of these changes is that we now do I/O when
interrupts to the card are totally broken. Followon commits will improve
the error reporting and logging when this happens. Performance will be
aweful, but will at least be minimally functional.
There is a small race when we're checking the completions if interrupts
are working, but this is handled in a future commit.
Sponsored by: Netflix
MFC After: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36922
(cherry picked from commit d4959bfcd110ea471222c7dd87775ba1f4e3d1d9)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When we went to having a shared timeout routine, failing the timed-out
transaction code was inadvertantly dropped. Reinstate it.
Fixes: 502dc84a8b670
Sponsored by: Netflix
MFC After: 2 weeks
Reviewed by: chuck, jhb
Differential Revision: https://reviews.freebsd.org/D36921
(cherry picked from commit 2a6b7055a980f7e7543dfdbda4aa0c356133b77d)
|
| |
|
|
| |
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
| |
|
|
| |
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
| |
|
|
| |
Sponsored by: Chelsio Communications
|
| |
|
|
|
|
| |
nvme driver predates, it seems, mtx_padalign. Modernize.
Sponsored by: Netflix
|
| |
|
|
|
|
|
|
|
|
|
| |
The two bools in nvme_request create a 6 byte hole today. Move them to
after retires to fill the 4 byte hole there and add a spare[2] to make
nvme_request 8 bytes smaller. spare[2] isn't strictly necessary, but
documents how many bytes we have left in that hole, as the number of
booleans will increase shortly.
Suggested by: chuck
Sponsored by: Netflix
|
| |
|
|
|
|
|
|
| |
Rather than have a table to walk through, use a sparse array.
Suggested by: jhb
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D41353
|
| |
|
|
|
|
|
|
|
|
|
| |
Fix comment to note we should grab additional data from the error log
page, but don't currently (it's inclear if we should do that here
and other places in nvd that want it, or if we should let nvd / the
nda periph make the request).
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41315
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When manually completing an I/O, we do so because we have no status back
from the card. Note M, CRD and P are all 0 because this is an artificial
event (and phase isn't checked when it's completed this way). There's no
MORE information in the error log page and there's no delayed retry
(CRD=0) and we don't currently request CRD to be set to anything other
than 0 and thus don't implement delayed retry.
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41314
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When we're resetting, and there's outstanding I/O that we're cancelling,
only report we're cancelling the I/O once rather than once per
I/O. Likewise when we reschedule the I/O. We don't need to say for each
one that we're cancelling/rescheduling something, and then report the
I/O that we're doing. Likewise with cancelling admin commands (we never
retry them here, so a similar change isn't needed).
Sponsored by: Netflix
Reviewed by: chuck, mav
Differential Revision: https://reviews.freebsd.org/D41313
|
| |
|
|
|
|
|
|
|
| |
Add admin commands capacity management, lockdown and fabrics commands.
Add I/O copy command.
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41311
|
| |
|
|
|
|
|
|
|
|
| |
get_admin_opcode_string and get_io_opcode_string are identical, but
start with different tables. Use a helper routine that takes an argument
to implement these instead. A future commit will refine this further.
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41310
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Both nvme_dump_command and nvme_qpair_print_command print nvme
commands. The former latter better. Recode the one call to
nvme_dump_command to use nvme_qpair_print_command and delete the
former. No sense having two nearly identical routines. A future commit
will convert to sbuf.
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41309
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Both nvme_dump_completion and nvme_qpair_print_completion print
completions. The latter is better. Recode the two instances of
nvme_dump_completion to use nvme_qpair_print_completion and delete the
former. No sense having two nearly identical routines. A future commit
will convert this to sbuf.
Sponsored by: Netflix
Reviewed by: chuck
Differential Revision: https://reviews.freebsd.org/D41308
|
| |
|
|
|
|
|
|
|
| |
Adds support for detection of the S3X NVMe controller found in the
13" MacBook Pro 2017 without Touch Bar (MacBook14,1)
It is known to be used in following MacBooks:
- Retina MacBook 2016 (MacBook9,1)
- 13" MacBook Pro 2016 without Touch Bar (MacBook13,1)
- 13" MacBook Pro 2016 with Touch Bar (MacBook13,2)
|
| |
|
|
|
|
|
|
| |
This avoids encoding CAM-specific knowledge in nvme_qpair.c.
Reviewed by: chuck, imp, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41119
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add CAM_NVME_STATUS_ERROR error code. Flag all NVME commands that
completed with an error status as CAM_NVME_STATUS_ERROR (a new value)
instaead of CAM_REQ_CMP_ERR. This indicates to the upper layers of CAM
that the 'cpl' field for nvmeio CCBs is valid and can be examined for
error recovery, if desired.
No functional change. nda will still see these as errors, call
ndaerror() to get the error recovery action, etc. cam_periph_error will
select the same case as before (even w/o the change, though the change
makes it explicit).
Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41085
|
| |
|
|
|
|
| |
Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D40763
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Replace a magic number with CTS_NVME_VALID_SPEC.
- Set the transport and protocol versions the same as for XPT_PATH_INQ.
Probably we shouldn't bother with setting the version in the 'spec'
member of ccb_trans_settings_nvme at all and use the transport
and/or protocol version field instead.
Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D40616
|
| |
|
|
|
|
|
|
|
| |
We already run nda by default on all the !x86 architectures. Switch the
default to nda. nda created nvd compatibility links by default, so this
should be a nop. If this causes problems for your application, set
hw.nvme.use_nvd=1 in your loader.conf.
Sponsored by: Netflix
|