diff options
author | Warner Losh <imp@FreeBSD.org> | 2025-07-10 15:56:26 +0000 |
---|---|---|
committer | Warner Losh <imp@FreeBSD.org> | 2025-07-10 16:17:01 +0000 |
commit | d78d04b17cb2498186e8fd2681f224a760e75b28 (patch) | |
tree | 9e75d79687a30929afa18d6805f7c9d7f348e165 /lib/Transforms/Scalar/(public-mirror) | |
parent | 5a656ef632de2f363f37484b0128aa60b688bf32 (diff) |
HGST disks that are sick are returning 44/0 for START UNIT (which we
ignore) and then 4/2 on READ CAPACITY. START UNIT should be enough for
READ CAPACITY to succeed or UNIT ATTENTION. However, we get NOT_READ +
4/2 back. I've seen this on several models of HGST drives. Invalidate
the peripheral when we detect this condition. This is likely the least
bad thing we can do: It removes access to daX, but leaves passY so logs
may be extracted (if awkwardly). Removing daX access removes the disk
device that causes problems to geom outlined below.
Although the timeout is 5s for READ_CAPACITY, we wait the full 30s for
READ_CAPACITY_16. This causes us to stall booting as we start to taste
as soon as we release the final hold... but the tasting means
g_wait_idle() takes now takes over 5 minutes to clear since we do this
for all the opens. Even using a timeout of 3s instead of 30s leads to
boot times of almost 5 minutes in these cases, so there are other,
downstream operations that are taking a while, so it's not just a matter
of adjusting the timeout. Failing the periph early solves the bulk of
this problem (the tasting related delays). What the HBA does is HBA
specific and some have firmwares that are also confused by this when
they enumerate or discover the drive, leading to long (but still shorter
than 5 minute) delays. This patch won't solve that aspect of startup
delays with sick disks.
Perhaps we should fail the periph when START UNIT fails with the same
codes we check in the read capacity path. I'm reluctant to do such a
global change since it's in cam_periph, and there seems no good way to
flag that we want this behavior. It's also a bit magical when it runs
(some drive report 44/0 always, and some just report it on START UNIT,
and these HGST drive fall into the latter category).
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D51218
Diffstat (limited to 'lib/Transforms/Scalar/(public-mirror)')
0 files changed, 0 insertions, 0 deletions