aboutsummaryrefslogtreecommitdiff
path: root/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge OpenZFS support in to HEAD.Matt Macy2020-08-251-1193/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary benefit is maintaining a completely shared code base with the community allowing FreeBSD to receive new features sooner and with less effort. I would advise against doing 'zpool upgrade' or creating indispensable pools using new features until this change has had a month+ to soak. Work on merging FreeBSD support in to what was at the time "ZFS on Linux" began in August 2018. I first publicly proposed transitioning FreeBSD to (new) OpenZFS on December 18th, 2018. FreeBSD support in OpenZFS was finally completed in December 2019. A CFT for downstreaming OpenZFS support in to FreeBSD was first issued on July 8th. All issues that were reported have been addressed or, for a couple of less critical matters there are pull requests in progress with OpenZFS. iXsystems has tested and dogfooded extensively internally. The TrueNAS 12 release is based on OpenZFS with some additional features that have not yet made it upstream. Improvements include: project quotas, encrypted datasets, allocation classes, vectorized raidz, vectorized checksums, various command line improvements, zstd compression. Thanks to those who have helped along the way: Ryan Moeller, Allan Jude, Zack Welch, and many others. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25872 Notes: svn path=/head/; revision=364746
* Avoid the GEOM topology lock recursion when we automatically expand a pool.Pawel Jakub Dawidek2020-04-251-2/+6
| | | | | | | | | | | | | The steps to reproduce the problem: mdconfig -a -t swap -s 3g -u 0 gpart create -s GPT md0 gpart add -t freebsd-zfs -s 1g md0 zpool create -o autoexpand=on foo md0p1 gpart resize -i 1 -s 2g md0 Notes: svn path=/head/; revision=360325
* If the autoexpand pool property is turned on and vdev is healthy try toPawel Jakub Dawidek2019-03-301-0/+24
| | | | | | | | | | | expand the pool automatically when we detect underlying GEOM provider size change. Obtained from: Fudo Security Tested in: AWS Notes: svn path=/head/; revision=345728
* MFV/ZoL: Disable LBA weighting on files and SSDsAlexander Motin2019-03-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Author: Richard Yao <ryao@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712 zfsonlinux/zfs@fb40095f5f0853946f8150481ca22602d1334dfe To reduce code divergence this merge replaces equivalent but different FreeBSD code detecting non-rotating medium vdevs. MFC after: 1 month Notes: svn path=/head/; revision=344936
* The way ZFS searches for its vdevs is the following: first it looks forPawel Jakub Dawidek2019-02-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a vdev that has the same name as the one stored in metadata and that has all VDEV labels in place. If it cannot find a GEOM provider with the given name and all VDEV labels it will scan all GEOM providers for the best match (the most VDEV labels available), but here the name is ignored. In case the ZFS pool is created, eg. using GPT partition label: # zpool create tank /dev/gpt/tank everything works, and on every import ZFS will pick /dev/gpt/tank and not /dev/da0p4. The problem occurs when da0p4 is extended and ZFS is unable to find all VDEV labels in /dev/gpt/tank anymore (the VDEV labels stored at the end of the partition are now somewhere else). In this case it will scan all GEOM providers and will pick the first one with the best match, ie. da0p4. Fix this problem by checking the VDEV/provider name even if we get the same match. If the name is the same as the one we have in pool's metadata, prefer this GEOM provider. Reported by: oshogbo, Michal Mroz <m.mroz@fudosecurity.com> Tested by: Michal Mroz <m.mroz@fudosecurity.com> Obtained from: Fudo Security Notes: svn path=/head/; revision=344316
* In the vdev_geom_open_by_path() function we assume that vdev path startsPawel Jakub Dawidek2019-02-191-1/+1
| | | | | | | with "/dev/". Make sure this is the case. Notes: svn path=/head/; revision=344314
* Remove BIO_ORDERED flag from BIO_FLUSH sent by ZFS.Alexander Motin2019-01-301-1/+0
| | | | | | | | | | | | | In all cases where ZFS sends BIO_FLUSH, it first waits for all related writes to complete, so its BIO_FLUSH does not care about strict ordering. Removal of one makes life much easier at least for NVMe driver, which hardware has no concept of request ordering, relying completely on software. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=343586
* Fix an nvpair leak in vdev_geom_read_config().Mark Johnston2018-09-171-9/+12
| | | | | | | | | | | | | | | | | | Also change the behaviour slightly: instead of freeing "config" if the last nvlist doesn't pass the tests, return the last config that did pass those tests. This matches the comment at the beginning of the function. PR: 230704 Diagnosed by: avg Reviewed by: asomers, avg Tested by: Mark Martinec <Mark.Martinec@ijs.si> Approved by: re (gjb) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17202 Notes: svn path=/head/; revision=338724
* MFV r336991, r337001:Alexander Motin2018-07-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | 9102 zfs should be able to initialize storage devices The first access to a disk block can incur a performance penalty on some platforms (e.g. AWS's EBS, VMware VMDKs). Therefore it is recommended that volumes be "thick provisioned", where supported by the platform (VMware). Thick provisioning is time consuming and often is ignored. If the thick provision step is omitted, customers will see suboptimal performance until we have written to all parts of the LUN. ZFS should be able to initialize any unused storage to remove any first-write penalty that exists. illumos/illumos-gate@094e47e980b0796b94b1b8f51f462a64d246e516 Reviewed by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: George Wilson <george.wilson@delphix.com> Notes: svn path=/head/; revision=337007
* This originated from ZFS On Linux, asSean Eric Fagan2018-06-081-0/+1
| | | | | | | | | | | | | | | | | | https://github.com/zfsonlinux/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd During scans (scrubs or resilvers), it sorts the blocks in each transaction group by block offset; the result can be a significant improvement. (On my test system just now, which I put some effort to introduce fragmentation into the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the changes.) I've seen similar rations on production systems. Approved by: Alexander Motin Obtained from: ZFS On Linux Relnotes: Yes (improved scrub performance, with tunables) Differential Revision: https://reviews.freebsd.org/D15562 Notes: svn path=/head/; revision=334844
* MFV r329762: 8961 SPA load/import should tell us why it failedAlexander Motin2018-02-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@3ee8c80c747c4aa3f83351a6920f30c411236e1b When we fail to open or import a storage pool, we typically don't get any additional diagnostic information, just "no pool found" or "can not import". While there may be no additional user-consumable information, we should at least make this situation easier to debug/diagnose for developers and support. For example, we could start by using `zfs_dbgmsg()` to log each thing that we try when importing, and which things failed. E.g. "tried uberblock of txg X from label Y of device Z". Also, we could log each of the stages that we go through in `spa_load_impl()`. Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Pavel Zakharov <pavel.zakharov@delphix.com> Notes: svn path=/head/; revision=329765
* MFV r329502: 7614 zfs device evacuation/removalAlexander Motin2018-02-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@5cabbc6b49070407fb9610cfe73d4c0e0dea3e77 https://www.illumos.org/issues/7614: This project allows top-level vdevs to be removed from the storage pool with “zpool remove”, reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now “indirect”) vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become “obsolete” because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been “remapped” in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be “remapped” to their new (concrete) locations if possible. This process can be accelerated by using the “zfs remap” command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. Therefore, mirror and raidz devices can not be removed. Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prashanth Sreenivasa <pks@delphix.com> Notes: svn path=/head/; revision=329732
* zfs: fix formatting in a log statementAlan Somers2018-02-161-1/+1
| | | | | | | | | Submitted by: Dave Baukus <daveb@spectralogic.com> MFC after: 3 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=329412
* Fix assertion when ZFS fails to open certain devicesAlan Somers2017-11-301-27/+30
| | | | | | | | | | | | | | | | | | | | | | "panic: vdev_geom_close_locked: cp->private is NULL" This panic will result if ZFS fails to open a device due to either of the following reasons: 1) The device's sector size is greater than 8KB. 2) ZFS wants to open the device RW, but it can't be opened for writing. The solution is to change the initialization order to ensure that the assertion will be satisfied. PR: 221066 Reported by: David NewHamlet <wheelcomplex@gmail.com> Reviewed by: avg MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13278 Notes: svn path=/head/; revision=326401
* vdev_geom_close: close errored consumer even if vdev_reopening is setAndriy Gapon2017-10-311-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | If vdev_geom_close doesn't close the consumer, then the subsequent call to vdev_geom_open() would be just a NOP and would always return success. Thus, at present vdev_reopen() would always succeed for vdev_geom devices even if the underlying provider is in error state. The problem was introduced as a result of an optimization in rS308055. The most significant manifistation of the problem is that zio_vdev_io_done() --> vdev_probe() --> SPA_ASYNC_PROBE --> spa_async_probe() --> vdev_reopen() chain of calls and events becomes a NOP as well. This chain is invoked when zio_vdev_io_done() detects an "unexpected" error from the lower level I/O. Additionally, that call path may race with SPA_ASYNC_REMOVE path because of the asynchronous nature of them both. So, the SPA_ASYNC_PROBE may erroneously mark a vdev as being healthy after SPA_ASYNC_REMOVE marked it as removed. Reviewed by: asomers, mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12731 Notes: svn path=/head/; revision=325228
* Fix the error message when creating a zpool on a too-small deviceAlan Somers2017-10-231-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path. It's redundant with the check in vdev_open, and failing to attach here results in the wrong error message being printed. However, still check for it in some other situations: * When opening by guids, so we don't get bogged down reading from slow devices like floppy drives. * In vdev_geom_read_pool_label for the same reason, because we iterate over all providers. * If the caller requests that we verify the guid, because then we'll have to read from the device before vdev_open verifies the size. PR: 222227 Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Reviewed by: avg, mav MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12531 Notes: svn path=/head/; revision=324940
* fix memory leak in g_bio zone introduced in r320452, another ABD falloutAndriy Gapon2017-09-201-7/+18
| | | | | | | | | | | | | | | | | | | I overlooked the fact that that ZIO_IOCTL_PIPELINE does not include ZIO_STAGE_VDEV_IO_DONE stage. We do allocate a struct bio for an ioctl zio (a disk cache flush), but we never freed it. This change splits bio handling into two groups, one for normal read/write i/o that passes data around and, thus, needs the abd data tranform; the other group is for "data-less" i/o such as trim and cache flush. PR: 222288 Reported by: Dan Nelson <dnelson@allantgroup.com> Tested by: Borja Marcos <borjam@sarenet.es> MFC after: 10 days Notes: svn path=/head/; revision=323796
* Fix some ZFS debugging messagesAlan Somers2017-08-151-4/+4
| | | | | | | | | | | | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Be more careful about the use of provider names vs vdev names in ZFS_LOG statements. MFC after: 3 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=322546
* fix a regression in r320452, ZFS ABD importAndriy Gapon2017-07-181-0/+8
| | | | | | | | | | | | | | | | | | I overlooked the fact that vdev_op_io_done hook is called even if the actual I/O is skipped, for example, in the case of a missing vdev. Arguably, this could be considered an issue in the zio pipeline engine, but for now I am adding defensive code to check for io_bp being NULL along with assertions that that happens only when it can be really expected. PR: 220691 Reported by: peter, cy Tested by: cy MFC after: 1 week X-MFC with: r320156, r320452 Notes: svn path=/head/; revision=321111
* fix an architectural problem introduced in r320156, ZFS ABD importAndriy Gapon2017-06-281-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of ZFS refcount_t uses the emulated illumos mutex (the sx lock) and the waiting memory allocation when ZFS_DEBUG is enabled. This makes refcount_t unsuitable for use in GEOM g_up thread where sleeping is prohibited. When importing the ABD change I modified vdev_geom using illumos vdev_disk as an example. As a result, I added a call to abd_return_buf in vdev_geom_io_intr. The latter is called on g_up thread while the former uses refcount_t. This change fixes the problem by deferring the abd_return_buf call to the previously unused vdev_geom_io_done that is called on a ZFS zio taskqueue thread where sleeping is allowed. A side bonus of this change is that now a vdev zio has a pointer to its corresponding bio while the zio is active. Reported by: Shawn Webb <shawn.webb@hardenedbsd.org> Tested by: Shawn Webb <shawn.webb@hardenedbsd.org> MFC after: 1 week X-MFC with: r320156 Notes: svn path=/head/; revision=320452
* MFV r318946: 8021 ARC buf data scatter-izationAndriy Gapon2017-06-201-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@770499e185d15678ccb0be57ebc626ad18d93383 https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc626ad18d93383 https://www.illumos.org/issues/8021 The ARC buf data project (known simply as "ABD" since its genesis in the ZoL community) changes the way the ARC allocates `b_pdata` memory from using linear `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This improves ZFS's performance by helping to defragment the address space occupied by the ARC, in particular for cases where compressed ARC is enabled. It could also ease future work to allocate pages directly from `segkpm` for minimal- overhead memory allocations, bypassing the `kmem` subsystem. This is essentially the same change as the one which recently landed in ZFS on Linux, although they made some platform-specific changes while adapting this work to their codebase: 1. Implemented the equivalent of the `segkpm` suggestion for future work mentioned above to bypass issues that they've had with the Linux kernel memory allocator. 2. Changed the internal representation of the ABD's scatter/gather list so it could be used to pass I/O directly into Linux block device drivers. (This feature is not available in the illumos block device interface yet.) FreeBSD notes: - the actual (default) chunk size is 4KB (despite the text above saying 1KB) - we can try to reimplement ABDs, so that they are not permanently mapped into the KVA unless explicitly requested, especially on platforms with scarce KVA - we can try to use unmapped I/O and avoid intermediate allocation of a linear, virtual memory mapped buffer - we can try to avoid extra data copying by referring to chunks / pages in the original ABD Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Dan Kimmel <dan.kimmel@delphix.com> MFC after: 3 weeks Notes: svn path=/head/; revision=320156
* vdev_geom may associate multiple vdevs per g_consumerAlan Somers2017-05-111-49/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vdev_geom.c currently uses the g_consumer's private field to point to a vdev_t. That way, a GEOM event can cause a change to a ZFS vdev. For example, when you remove a disk, the vdev's status will change to REMOVED. However, vdev_geom will sometimes attach multiple vdevs to the same GEOM consumer. If this happens, then geom events will only be propagated to one of the vdevs. Fix this by storing a linked list of vdevs in g_consumer's private field. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c * g_consumer.private now stores a linked list of vdev pointers associated with the consumer instead of just a single vdev pointer. * Change vdev_geom_set_physpath's signature to more closely match vdev_geom_set_rotation_rate * Don't bother calling g_access in vdev_geom_set_physpath. It's guaranteed that we've already accessed the consumer by the time we get here. * Don't call vdev_geom_set_physpath in vdev_geom_attach. Instead, call it in vdev_geom_open, after we know that the open has succeeded. PR: 218634 Reviewed by: gibbs MFC after: 1 week Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10391 Notes: svn path=/head/; revision=318189
* Fix vdev_geom_attach_by_guids for partitioned disksAlan Somers2017-04-131-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When opening a vdev whose path is unknown, vdev_geom must find a geom provider with a label whose guids match the desired vdev. However, due to partitioning, it is possible that two non-synonomous providers will share some labels. For example, if the first partition starts at the beginning of the drive, then ada0 and ada0p1 will share the first label. More troubling, if the last partition runs to the end of the drive, then ada0p3 and ada0 will share the last label. If vdev_geom opens ada0 when it should've opened ada0p3, then the pool won't be readable. If it opens ada0 when it should've opened ada0p1, then it will corrupt some other partition when it writes the 3rd and 4th labels. The easiest way to reproduce this problem is to install a mirrored root pool with the default partition layout, then swap the positions of the two boot drives and reboot. Whether the bug manifests depends on the order in which geom lists its providers, which is arbitrary. Fix this situation by modifying the search algorithm to prefer geom providers that have all four labels intact. If no such provider exists, then open whichever provider has the most. Reviewed by: mav MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D10365 Notes: svn path=/head/; revision=316760
* Add vdev_reopening support to vdev_geom.Alexander Motin2016-10-281-1/+12
| | | | | | | | | | | | It allows to avoid extra GEOM providers flapping without significant need. Since GEOM got resize support, we don't need to reopen provider to get new size. If provider was orphaned and no longer valid, ZFS should already know that, and in such case reopen should be done in full as expected. MFC after: 2 weeks Notes: svn path=/head/; revision=308055
* Matching GUIDs, handle possible race on vdev detach.Alexander Motin2016-10-281-56/+63
| | | | | | | | | | | | | | | | | | | | | In case of vdev detach, causing top level mirror vdev destruction, leaf vdev changes its GUID to one of the destroyed mirror, that creates race condition when GUID in vdev label may not match one in the pool config. This change replicates logic nuance of vdev_validate() by adding special exception, matching the vdev GUID against the top level vdev GUID. Since this exception is not completely reliable (may give false positives if we fail to erase label on detached vdev), use it only as last resort. Quick way to reproduce this scenario now is detach vdev from a pool with enabled autoextend. During vdev detach autoextend logic tries to reopen remaining vdev, that always fails now since in-memory configuration is already updated, while on-disk labels are not yet. MFC after: 2 weeks Notes: svn path=/head/; revision=308051
* Improve few debugging log messages.Alexander Motin2016-10-281-3/+3
| | | | Notes: svn path=/head/; revision=308049
* fix zfs pool creation accidentally broken by r305331Andriy Gapon2016-09-061-1/+2
| | | | | | | | | | | | The upstream change introduced a new load state, SPA_LOAD_CREATE, and vdev_geom code needs to be aware of it. Tested by: cy MFC after: 1 week X-MFC with: r305331 Notes: svn path=/head/; revision=305456
* Fix uninitialized variable from r300881Alan Somers2016-06-211-1/+1
| | | | | | | | | | | | | | | | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Initialize needs_update in vdev_geom_set_physpath PR: 210409 Reported by: kp Reviewed by: kp Approved by: re (hrs) MFC after: 4 weeks X-MFC-With: 300881 Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=302058
* Avoid issuing spa config updates for physical path when not necessaryAlan Somers2016-05-271-21/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | ZFS's configuration needs to be updated whenever the physical path for a device changes, but not when a new device is introduced. This is because new devices necessarily cause config updates, but only if they are actually accepted into the pool. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Split vdev_geom_set_physpath out of vdev_geom_attrchanged. When setting the vdev's physical path, only request a config update if the physical path has changed. Don't request it when opening a device for the first time, because the config sync will happen anyway upstack. sys/geom/geom_dev.c Split g_dev_set_physpath and g_dev_set_media out of g_dev_attrchanged Submitted by: will, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6428 Notes: svn path=/head/; revision=300881
* Speed up vdev_geom_open_by_guidsAlan Somers2016-05-171-43/+87
| | | | | | | | | | | | | | | | | | | | | | | Speedup is hard to measure because the only time vdev_geom_open_by_guids gets called on many drives at the same time is during boot. But with vdev_geom_open hacked to always call vdev_geom_open_by_guids, operations like "zpool create" speed up by 65%. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c * Read all of a vdev's labels in parallel instead of sequentially. * In vdev_geom_read_config, don't read the entire label, including the uberblock. That's a waste of RAM. Just read the vdev config nvlist. Reduces the IO and RAM involved with tasting from 1MB to 448KB. Reviewed by: avg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6153 Notes: svn path=/head/; revision=300059
* Fix a use-after-free when "zpool import" failsAlan Somers2016-04-291-4/+2
| | | | | | | | | | | | | | clear vd->vdev_tsd in vdev_geom_close_locked instead of vdev_geom_detach. In the latter function, it would fail to happen in certain circumstances where cp->private was unset. Ideally, the latter should never happen, but it can happen when vdev open fails, or where spares are involved. MFC after: 4 weeks X-MFC-With: 298786 Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=298814
* Refactor vdev_geom_attach and friends to reduce code duplicationAlan Somers2016-04-291-119/+97
| | | | | | | | | | | | | | | | | | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Move checks for provider's sectorsize and mediasize into a single location in vdev_geom_attach. Remove the zfs::vdev::taste class; it's ok to use the regular vdev class for tasting. Consolidate guid checks into a single location in vdev_attach_ok. Consolidate some error handling code from vdev_geom_attach into vdev_geom_detach, closing a resource leak of geom consumers in the process. Reviewed by: avg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5974 Notes: svn path=/head/; revision=298786
* Don't corrupt ZFS label's physpath attribute when booting while a disk is ↵Alan Somers2016-04-151-3/+4
| | | | | | | | | | | | | | | | | | missing Prior to this change, vdev_geom_open_by_path would call vdev_geom_attach prior to verifying the device's GUIDs. vdev_geom_attach calls vdev_geom_attrchange to set the physpath in the vdev object. The result is that if the disk could not be found, then the labels for other disks in the same TLD would overwrite the missing disk's physpath with the physpath of whichever disk currently has the same devname as the missing one used to have. MFC after: 4 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=298072
* Add more debugging statements in vdev_geom.cAlan Somers2016-04-141-5/+22
| | | | | | | | | | | Log a debugging message whenever geom functions fail in vdev_geom_attach. Printing these messages is controlled by vfs.zfs.debug MFC after: 4 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=298017
* Update a debugging message in vdev_geom_open_by_guids for consistency withAlan Somers2016-04-141-1/+2
| | | | | | | | | | similar messages elsewhere in the file. MFC after: 4 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=297986
* Fix rare double free in vdev_geom_attrchangedAlan Somers2016-04-121-16/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Don't drop the g_topology_lock before freeing old_physpath. That opens up a race where one thread can call vdev_geom_attrchanged, set old_physpath, drop the g_topology_lock, then block trying to acquire the SCL_STATE lock. Then another thread can come into vdev_geom_attrchanged, set old_physpath to the same value, and proceed to free it. When the first thread resumes, it will free the same location. It turns out that the SCL_STATE lock isn't needed. It was originally added by gibbs to protect vd->vdev_physpath while updating the same. However, the update process subsequently was switched to an atomic operation (a pointer swap). Now, there is no need for the SCL_STATE lock, and hence no need to drop the g_topology_lock. Reviewed by: delphij MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D5413 Notes: svn path=/head/; revision=297868
* Alike to r293708 relax pool check in vdev_geom_open_by_path().Alexander Motin2016-04-071-1/+9
| | | | | | | | | | | This made impossible spare disk open by known path, which kind of worked only because the same fix was applied to vdev_geom_attach_by_guids() in r293708. MFC after: 1 week Notes: svn path=/head/; revision=297672
* Make ZFS ignore stripe sizes above SPA_MAXASHIFT (8KB).Alexander Motin2016-03-101-1/+1
| | | | | | | | | | | | If device has stripe size bigger then maximal sector size supported by ZFS, there is nothing can be done to avoid read-modify-write cycles. Taking that stripe size into account will only reduce space efficiency and pointlessly bother user with warnings that can not be fixed. Discussed with: smh Notes: svn path=/head/; revision=296615
* Make ZFS more picky to GEOM stripe sizes and offsets.Alexander Motin2016-03-101-1/+2
| | | | | | | | | Use of misaligned or non-power-of-2 stripes is not really useful for ZFS, since increased ashift won't help to avoid read-modify-write cycles, and only reduce pool space efficiency and compression rates. Notes: svn path=/head/; revision=296613
* MFV r296505: 6531 Provide mechanism to artificially limit disk performanceAlexander Motin2016-03-081-1/+2
| | | | | | | | | | | | | Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Author: Prakash Surya <prakash.surya@delphix.com> illumos/illumos-gate@97e81309571898df9fdd94aab1216dfcf23e060b Notes: svn path=/head/; revision=296510
* Create an API to reset a struct bio (g_reset_bio). This is mandatoryWarner Losh2016-02-171-1/+1
| | | | | | | | | | | | for all struct bio you get back from g_{new,alloc}_bio. Temporary bios that you create on the stack or elsewhere should use this before first use of the bio, and between uses of the bio. At the moment, it is nothing more than a wrapper around bzero, but that may change in the future. The wrapper also removes one place where we encode the size of struct bio in the KBI. Notes: svn path=/head/; revision=295707
* Quell harmless CID about unchecked return value in nvlist_get_guids.Alan Somers2016-01-191-2/+2
| | | | | | | | | | | | | The return value doesn't need to be checked, because nvlist_get_guid's callers check the returned values of the guids. Coverity CID: 1341869 MFC after: 1 week X-MFC-With: 292066 Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=294358
* Disallow zvol-backed ZFS poolsAlan Somers2016-01-191-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Using zvols as backing devices for ZFS pools is fraught with panics and deadlocks. For example, attempting to online a missing device in the presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better to completely disable vdev_geom from ever opening a zvol. The solution relies on setting a thread-local variable during vdev_geom_open, and returning EOPNOTSUPP during zvol_open if that thread-local variable is set. Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent was to prevent a recursive mutex acquisition panic. However, the new check for the thread-local variable also fixes that problem. Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this function was set to panic. But it can occur that a device disappears during tasting, and it causes no problems to ignore this departure. Reviewed by: delphij MFC after: 1 week Relnotes: yes Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4986 Notes: svn path=/head/; revision=294329
* Fix race condition involving ZFS remove eventsAlan Somers2016-01-141-1/+0
| | | | | | | | | | | | | | | | | When a ZFS drive disappears, ZFS sends a resource.fs.zfs.removed event to userland. A userland program like zfsd(8) can use that event, for example to activate a hotspare. The current code contains a race condition: vdev_geom will sent the sysevent _before_ spa.c would update the vdev's status, causing userland processes to see pool state that does not reflect the device removal. This change moves the sysevent to spa.c, closing the race. Reviewed by: delphij, Sean Eric Fagan MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4902 Notes: svn path=/head/; revision=294027
* Fix importing l2arc device by guidAlan Somers2016-01-111-1/+9
| | | | | | | | | | | | | | | | | | | After r292066, vdev_geom verifies both the vdev and pool guids of device labels during open. However, spare and l2arc devices don't have pool guids, so opening them by guid will fail (opening by path, when the pathname is known, still succeeds). This change allows a vdev to be opened by guid if the label contains no pool_guid, which is the case for inactive spares and l2arc devices. PR: 292066 Reported by: delphij Reviewed by: delphij, smh MFC after: 2 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4861 Notes: svn path=/head/; revision=293708
* Record physical path information in ZFS VdevsAlan Somers2016-01-111-32/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: If available, record the physical path of a vdev in ZFS meta-data. Do this both when opening the vdev, and when receiving an attribute change notification from GEOM. Make vdev_geom_close() synchronous instead of deferring its work to a GEOM event handler. There is no benefit to deferring the work and this prevents a future open call from referencing a consumer that is scheduled for destruction. The close followed by an immediate open will occur during a vdev reprobe triggered by any type of I/O error. Consolidate vdev_geom_close() and vdev_geom_detach() into vdev_geom_close() and vdev_geom_close_locked(). This also moves the cross linking operations between vdev and GEOM consumer into a single place (linking in vdev_geom_attach() and unlinking in vdev_geom_close_locked()). Submitted by: gibbs, asomers MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4524 Notes: svn path=/head/; revision=293677
* Change an important error message from ZFS_LOG to printfAlan Somers2015-12-111-1/+1
| | | | | | | | | Submitted by: gibbs MFC after: 4 weeks Sponsored by: Spectra Logic Corp Notes: svn path=/head/; revision=292069
* During vdev_geom_open, require that the vdev guids match the device's labelAlan Somers2015-12-101-39/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | except during split, add, or create operations. This fixes a bug where the wrong disk could be returned, and higher layers of ZFS would immediately eject it again. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool. o Modify the open behavior to: - If we are opening a vdev that hasn't previously been opened, open by path without checking GUIDs. - Otherwise, open by path and verify GUIDs. - If that fails, search all geom providers for a device with matching GUIDs. - If that fails, return ENOENT. Submitted by: gibbs, asomers Reviewed by: smh MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D4486 Notes: svn path=/head/; revision=292066
* Disable TRIM on file backed ZFS vdevs and fix TRIM on initSteven Hartland2014-11-171-0/+5
| | | | | | | | | | | | | | | | | | | After r265152 TRIM requests are ZIO_TYPE_FREE instead of ZIO_TYPE_IOCTL this meant file backed vdevs to attempted to process the ZIO as a write causing a panic. We now disable TRIM on file backed vdevs and ASSERT the ZIO types supported by each vdev type to ensure we explicity support the ZIO type being processed. Also ensure that TRIM on init is not procesed for devices which declare they didn't support TRIM via vdev_notrim. PR: 195061, 194976, 191573 Sponsored by: Multiplay Notes: svn path=/head/; revision=274619
* MFV r274272 and diff reduction with upstream.Xin LI2014-11-091-8/+8
| | | | | | | | | | | Illumos issue: 5244 zio pipeline callers should explicitly invoke next stage Tested with: ztest plus ZFS over GELI configuration MFC after: 1 month Notes: svn path=/head/; revision=274304