aboutsummaryrefslogtreecommitdiff
path: root/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge OpenZFS support in to HEAD.Matt Macy2020-08-251-127/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary benefit is maintaining a completely shared code base with the community allowing FreeBSD to receive new features sooner and with less effort. I would advise against doing 'zpool upgrade' or creating indispensable pools using new features until this change has had a month+ to soak. Work on merging FreeBSD support in to what was at the time "ZFS on Linux" began in August 2018. I first publicly proposed transitioning FreeBSD to (new) OpenZFS on December 18th, 2018. FreeBSD support in OpenZFS was finally completed in December 2019. A CFT for downstreaming OpenZFS support in to FreeBSD was first issued on July 8th. All issues that were reported have been addressed or, for a couple of less critical matters there are pull requests in progress with OpenZFS. iXsystems has tested and dogfooded extensively internally. The TrueNAS 12 release is based on OpenZFS with some additional features that have not yet made it upstream. Improvements include: project quotas, encrypted datasets, allocation classes, vectorized raidz, vectorized checksums, various command line improvements, zstd compression. Thanks to those who have helped along the way: Ryan Moeller, Allan Jude, Zack Welch, and many others. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25872 Notes: svn path=/head/; revision=364746
* MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixesAndriy Gapon2019-11-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@555d674d5d4b8191dc83723188349d28278b2431 https://github.com/illumos/illumos-gate/commit/555d674d5d4b8191dc83723188349d28278b2431 https://www.illumos.org/issues/10592 This is a collection of recent fixes from ZoL: 8eef997679b Error path in metaslab_load_impl() forgets to drop ms_sync_lock 928e8ad47d3 Introduce auxiliary metaslab histograms 425d3237ee8 Get rid of space_map_update() for ms_synced_length 6c926f426a2 Simplify log vdev removal code 21e7cf5da89 zdb -L should skip leak detection altogether df72b8bebe0 Rename range_tree_verify to range_tree_verify_not_present 75058f33034 Remove unused vdev_t fields Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Author: Serapheim Dimitropoulos <serapheim@delphix.com> MFC after: 4 weeks Notes: svn path=/head/; revision=354948
* MFV r354382,r354385: 10601 10757 Pool allocation classesAndriy Gapon2019-11-211-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@663207adb1669640c01c5ec6949ce78fd806efae https://github.com/illumos/illumos-gate/commit/663207adb1669640c01c5ec6949ce78fd806efae 10601 Pool allocation classes https://www.illumos.org/issues/10601 illumos port of ZoL Pool allocation classes. Includes at least these two commits: 441709695 Pool allocation classes misplacing small file blocks cc99f275a Pool allocation classes 10757 Add -gLp to zpool subcommands for alt vdev names https://www.illumos.org/issues/10757 Port from ZoL of d2f3e292d Add -gLp to zpool subcommands for alt vdev names Note that a subsequent ZoL commit changed -p to -P a77f29f93 Change full path subcommand flag from -p to -P Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Portions contributed by: Håkan Johansson <f96hajo@chalmers.se> Portions contributed by: Richard Yao <ryao@gentoo.org> Portions contributed by: Chunwei Chen <david.chen@nutanix.com> Portions contributed by: loli10K <ezomori.nozomu@gmail.com> Author: Don Brady <don.brady@delphix.com> 11541 allocation_classes feature must be enabled to add log device illumos/illumos-gate@c1064fd7ce62fe763a4475e9988ffea3b22137de https://github.com/illumos/illumos-gate/commit/c1064fd7ce62fe763a4475e9988ffea3b22137de https://www.illumos.org/issues/11541 After the allocation_classes feature was integrated, one can no longer add a log device to a pool unless that feature is enabled. There is an explicit check for this, but it is unnecessary in the case of log devices, so we should handle this better instead of forcing the feature to be enabled. Author: Jerry Jelinek <jerry.jelinek@joyent.com> FreeBSD notes. I faithfully added the new -g, -L, -P flags, but only -g does something: vdev GUIDs are displayed instead of device names. -L, resolve symlinks, and -P, display full disk paths, do nothing at the moment. The use of special vdevs is backward compatible for read-only access, so root pools should be bootable, but exercise caution. MFC after: 4 weeks Notes: svn path=/head/; revision=354941
* MFC r353611: 10330 merge recent ZoL vdev and metaslab changesAndriy Gapon2019-10-161-1/+0
| | | | | | | | | | | | | | | | | | | illumos/illumos-gate@a0b03b161c4df3cfc54fbc741db09b3bdc23ffba https://github.com/illumos/illumos-gate/commit/a0b03b161c4df3cfc54fbc741db09b3bdc23ffba https://www.illumos.org/issues/10330 3 recent ZoL changes in the vdev and metaslab code which we can pull over: PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load() PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap encoding Author: Serapheim Dimitropoulos <serapheimd@gmail.com> Obtained from: illumos, ZoL MFC after: 2 weeks Notes: svn path=/head/; revision=353612
* MFV r336948: 9112 Improve allocation performance on high-end systemsAlexander Motin2018-07-311-8/+10
| | | | | | | | | | | | | | | | | | | | On high-end systems running async sequential write workloads, especially NUMA systems with flash or NVMe storage, one significant performance bottleneck is selecting a metaslab to do allocations from. This process can be parallelized, providing significant performance increases for these workloads. illumos/illumos-gate@f78cdc34af236a6199dd9e21376f4a46348c0d56 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Alexander Motin <mav@FreeBSD.org> Approved by: Gordon Ross <gwr@nexenta.com> Author: Paul Dagnelie <pcd@delphix.com> Notes: svn path=/head/; revision=336949
* MFV r331695, 331700: 9166 zfs storage pool checkpointAlexander Motin2018-03-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@8671400134a11c848244896ca51a7db4d0f69da4 The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Notes: svn path=/head/; revision=331701
* MFV r329502: 7614 zfs device evacuation/removalAlexander Motin2018-02-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@5cabbc6b49070407fb9610cfe73d4c0e0dea3e77 https://www.illumos.org/issues/7614: This project allows top-level vdevs to be removed from the storage pool with “zpool remove”, reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now “indirect”) vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become “obsolete” because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been “remapped” in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be “remapped” to their new (concrete) locations if possible. This process can be accelerated by using the “zfs remap” command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. Therefore, mirror and raidz devices can not be removed. Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Approved by: Garrett D'Amore <garrett@damore.org> Author: Prashanth Sreenivasa <pks@delphix.com> Notes: svn path=/head/; revision=329732
* MFV r315290, r315291: 7303 dynamic metaslab selectionAlexander Motin2017-03-241-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@8363e80ae72609660f6090766ca8c2c18aa53f0c https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18 https://www.illumos.org/issues/7303 This change introduces a new weighting algorithm to improve metaslab selection. The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result, the metaslab weight now encodes the type of weighting algorithm used (size-based vs segment-based). This also introduce a new allocation tracing facility and two new dcmds to help debug allocation problems. Each zio now contains a zio_alloc_list_t structure that is populated as the zio goes through the allocations stage. Here's an example of how to use the tracing facility: > c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace MSID DVA ASIZE WEIGHT RESULT VDEV - 0 400 0 NOT_ALLOCATABLE ztest.0a - 0 400 0 NOT_ALLOCATABLE ztest.0a - 0 400 0 ENOSPC ztest.0a - 0 200 0 NOT_ALLOCATABLE ztest.0a - 0 200 0 NOT_ALLOCATABLE ztest.0a - 0 200 0 ENOSPC ztest.0a 1 0 400 1 x 8M 17b1a00 ztest.0a > 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace MSID DVA ASIZE WEIGHT RESULT VDEV - 0 200 0 NOT_ALLOCATABLE mirror-2 - 0 200 0 NOT_ALLOCATABLE mirror-0 1 0 200 1 x 4M 112ae00 mirror-1 - 1 200 0 NOT_ALLOCATABLE mirror-2 - 1 200 0 NOT_ALLOCATABLE mirror-0 1 1 200 1 x 4M 112b000 mirror-1 - 2 200 0 NOT_ALLOCATABLE mirror-2 If the metaslab is using segment-based weighting then the WEIGHT column will display the number of segments available in the bucket where the allocation attempt was made. Author: George Wilson <george.wilson@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Don Brady <don.brady@intel.com> Approved by: Richard Lowe <richlowe@richlowe.net> Notes: svn path=/head/; revision=315896
* MFV r304155: 7090 zfs should improve allocation order and throttle allocationsAlexander Motin2016-09-031-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | illumos/illumos-gate@0f7643c7376dd69a08acbfc9d1d7d548b10c846a https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b 10c846a https://www.illumos.org/issues/7090 When write I/Os are issued, they are issued in block order but the ZIO pipelin e will drive them asynchronously through the allocation stage which can result i n blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible. In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able t o detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices. The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocatio n is successfully completed it's scheduled on a given top-level vdev. Each top- level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied. Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Author: George Wilson <george.wilson@delphix.com> Notes: svn path=/head/; revision=305331
* MFV r275540:Xin LI2014-12-081-2/+2
| | | | | | | | | | | | | | | | | | When importing a pool, don't assume that the passed pool configuration at vdev_load is always vaild. It's possible that a stale configuration that comes with extra vdevs, where metaslab_init() would fail because of lower layer returns error. Change the code to make metaslab_init() handle and return errors from lower layer and pass it back to upper layer and handle it there. Illumos issue: 5213 panic in metaslab_init due to space_map_open returning ENXIO MFC after: 2 weeks Notes: svn path=/head/; revision=275594
* MFV r269010:Xin LI2014-07-261-31/+36
| | | | | | | | | | | | | | | | | | Import Illumos changes to address the following Illumos issues: 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4984 device selection should use fragmentation metric MFC after: 2 weeks Notes: svn path=/head/; revision=269118
* MFV r258371,r258372: 4101 metaslab_debug should allow for fine-grained controlAndriy Gapon2013-11-281-28/+36
| | | | | | | | | | | | | | | | | | | | | | 4101 metaslab_debug should allow for fine-grained control 4102 space_maps should store more information about themselves 4103 space map object blocksize should be increased 4104 ::spa_space no longer works 4105 removing a mirrored log device results in a leaked object 4106 asynchronously load metaslab illumos/illumos-gate@0713e232b7712cd27d99e1e935ebb8d5de61c57d Note that some tunables have been removed and some new tunables have been added. Of particular note, FreeBSD-only knob vfs.zfs.space_map_last_hope is removed as it was a nop for some time now (after one of the previous merges from upstream). MFC after: 11 days Sponsored by: HybridCluster [merge] Notes: svn path=/head/; revision=258717
* Enhance the ZFS vdev layer to maintain both a logical and a physicalJustin T. Gibbs2013-08-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | minimum allocation size for devices. Use this information to automatically increase ZFS's minimum allocation size for new top-level vdevs to a value that more closely matches the optimum device allocation size. Use GEOM's stripesize attribute, if set, as the physical sector size of the GEOM. Calculate the minimum blocksize of each metaslab class. Use the calculated value instead of SPA_MINBLOCKSIZE (512b) when determining the likelyhood of compression yeilding a reduction in physical space usage. Report devices with sub-optimal block size configuration in "zpool status". Also properly fail attempts to attach devices with a logical block size greater than 8kB, since this will cause corruption to ZFS's label area. Sponsored by: Spectra Logic Corporaion MFC after: 2 weeks Background ========== Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1) Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2) The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: Add the SPA_ASHIFT constant. ZFS currently has a hard upper limit of 13 (8k) for ashift and this constant is used to both document and enforce this limit. sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h: Add the VDEV_AUX_ASHIFT_TOO_BIG error code. Add fields for exporting the configured, logical, and physical ashift to the vdev_stat_t structure. Add VDEV_STAT_VALID() macro which can be used to verify the presence of required vdev_stat_t fields in nvlist data. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: Provide a SYSCTL_PROC handler for "max_auto_ashift". Since the limit is only referenced long after boot when a create operation occurs, there's no compelling need for it to be a boot time configurable tunable. This also allows the validation code for the max_auto_ashift value to be contained within the sysctl handler. Populate the new fields in the vdev_stat_t structure. Fail vdev opens if the vdev reports an ashift larger than SPA_MAXASHIFT. Propogate vdev_logical_ashift and vdev_physical_ashift between child and parent vdevs as is done for vdev_ashift. In vdev_open(), restore code that fails opens for devices where vdev_ashift grows. This can only happen now if the device's logical ashift grows, which means it really isn't safe to use the device. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c: Update the vdev_open() API so that both logical (what was just ashift before) and physical ashift are reported. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h: Add two new fields, vdev_physical_ashift and vdev_logical_ashift, to vdev_t. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: Add vdev_ashift_optimize(). Call it anytime a new top-level vdev is allocated. cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: Add text for the VDEV_AUX_ASHIFT_TOO_BIG error. For each sub-optimally configured leaf vdev, report configured and native block sizes. cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT. This status is reported on healthy pools containing vdevs configured to use a block size smaller than their reported physical block size. cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c: Update find_vdev_problem() and supporting functions to provide the full vdev_stat_t structure to problem checking routines, and to allow decent into replacing vdevs. Add a vdev_non_native_ashift() validator which is used on the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT. cddl/contrib/opensolaris/lib/libzpool/common/kernel.c: cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h: Enhance sysctl userland stubs now that a SYSCTL_PROC handler is used in vdev.c. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h: When the group membership of a metaslab class changes (i.e. when a vdev is added or removed from a pool), walk the group list to determine the smallest block size currently available and record this in the metaslab class. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c: Add the metaslab_class_get_minblocksize() accessor. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In zio_compress_data(), take the minimum blocksize as an input parameter instead of assuming SPA_MINBLOCKSIZE. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c: In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum blocksize of the device. The l2arc code performs has it's own code for deciding if compression is worth while, so this effectively disables zio_compress_data() from second guessing the original decision. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c: In zio_write_bp_init(), use the minimum blocksize of the normal metaslab class when compressing data. Notes: svn path=/head/; revision=254591
* MFV r247580:Martin Matuska2013-03-191-1/+2
| | | | | | | | | | | | | | Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd. Notes: svn path=/projects/libzfs_core/; revision=248498
* ZFS tries to allocate blocks evenly across all devices. This means whenMartin Matuska2011-07-181-0/+3
| | | | | | | | | | | | | | | | devices are imbalanced zfs will lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full devices and move onto devices which have more availability. New loader tunable: vfs.zfs.mg_alloc_failures (min = 8) Illumos-gate changeset: 13379:4df42cc92254 Obtained from: Illumos (Bug #1051) MFC after: 2 weeks Notes: svn path=/head/; revision=224177
* Finally... Import the latest open-source ZFS version - (SPA) 28.Pawel Jakub Dawidek2011-02-271-8/+14
| | | | | | | | | | | | | | | | | | Few new things available from now on: - Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode. MFC after: 1 month Notes: svn path=/head/; revision=219089
* Update ZFS metaslab code from OpenSolaris.Martin Matuska2010-08-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This provides a noticeable write speedup, especially on pools with less than 30% of free space. Detailed information (OpenSolaris onnv changesets and Bug IDs): 11146:7e58f40bcb1c 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 11728:59fdb3b856f6 6918420 zdb -m has issues printing metaslab statistics 12047:7c1fcc8419ca 6917066 zfs block picking can be improved Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066) MFC after: 2 weeks Notes: svn path=/head/; revision=211931
* Merge ZFS version 15 and almost all OpenSolaris bugfixes referencedMartin Matuska2010-07-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in Solaris 10 updates 141445-09 and 142901-14. Detailed information: (OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers) 7844:effed23820ae 6755435 zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01) 7897:e520d8258820 6748436 inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01) 7965:b795da521357 6740164 zpool attach can create an illegal root pool (141909-02) 8084:b811cc60d650 6769612 zpool_import() will continue to write to cachefile even if altroot is set (N/A) 8121:7fd09d4ebd9c 6757430 want an option for zdb to disable space map loading and leak tracking (141445-01) 8129:e4f45a0bfbb0 6542860 ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01) 8188:fd00c0a81e80 6761100 want zdb option to select older uberblocks (141445-01) 8190:6eeea43ced42 6774886 zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01) 8225:59a9961c2aeb 6737463 panic while trying to write out config file if root pool import fails (141445-01) 8227:f7d7be9b1f56 6765294 Refactor replay (141445-01) 8228:51e9ca9ee3a5 6572357 libzfs should do more to avoid mnttab lookups (141909-01) 6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01) 8241:5a60f16123ba 6328632 zpool offline is a bit too conservative (141445-01) 6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01) 6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01) 6747698 checksum failures after offline -t / export / import / scrub (141445-01) 6745863 ZFS writes to disk after it has been offlined (141445-01) 6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01) 6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01) 6758107 I/O should never suspend during spa_load() (141445-01) 6776548 codereview(1) runs off the page when faced with multi-line comments (N/A) 6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01) 8242:e46e4b2f0a03 6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01) 8269:03a7e9050cfd 6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01) 6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01) 6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01) 6595194 "zfs get" VALUE column is as wide as NAME (141445-01) 6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01) 6396518 ASSERT strings shouldn't be pre-processed (141445-01) 8274:846b39508aff 6713916 scrub/resilver needlessly decompress data (141445-01) 8343:655db2375fed 6739553 libzfs_status msgid table is out of sync (141445-01) 6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01) 6784108 zfs_realloc() should not free original memory on failure (141445-01) 8525:e0e0e525d0f8 6788830 set large value to reservation cause core dump (141445-01) 6791064 want sysevents for ZFS scrub (141445-01) 6791066 need to be able to set cachefile on faulted pools (141445-01) 6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01) 6792134 getting multiple properties on a faulted pool leads to confusion (141445-01) 8547:bcc7b46e5ff7 6792884 Vista clients cannot access .zfs (141445-01) 8632:36ef517870a3 6798384 It can take a village to raise a zio (141445-01) 8636:7e4ce9158df3 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01) 6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01) 6702206 ZFS read/writer lock contention throttles sendfile() benchmark (141445-01) 6780491 Zone on a ZFS filesystem has poor fork/exec performance (141445-01) 6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01) 8692:692d4668b40d 6801507 ZFS read aggregation should not mind the gap (141445-01) 8697:e62d2612c14d 6633095 creating a filesystem with many properties set is slow (141445-01) 8768:dfecfdbb27ed 6775697 oracle crashes when overwriting after hitting quota on zfs (141909-01) 8811:f8deccf701cf 6790687 libzfs mnttab caching ignores external changes (141445-01) 6791101 memory leak from libzfs_mnttab_init (141445-01) 8845:91af0d9c0790 6800942 smb_session_create() incorrectly stores IP addresses (N/A) 6582163 Access Control List (ACL) for shares (141445-01) 6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A) 6800184 Panic at smb_oplock_conflict+0x35() (N/A) 8876:59d2e67b4b65 6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01) 8924:5af812f84759 6789318 coredump when issue zdb -uuuu poolname/ (141445-01) 6790345 zdb -dddd -e poolname coredump (141445-01) 6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01) 6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01) 6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01) 9030:243fd360d81f 6815893 hang mounting a dataset after booting into a new boot environment (141445-01) 9056:826e1858a846 6809691 'zpool create -f' no longer overwrites ufs infomation (141445-01) 9179:d8fbd96b79b3 6790064 zfs needs to determine uid and gid earlier in create process (141445-01) 9214:8d350e5d04aa 6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01) 6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01) 9229:e3f8b41e5db4 6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01) 9230:e4561e3eb1ef 6821169 offlining a device results in checksum errors (141445-01) 6821170 ZFS should not increment error stats for unavailable devices (141445-01) 6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01) 9234:bffdc4fc05c4 6792139 recovering from a suspended pool needs some work (141445-01) 6794830 reboot command hangs on a failed zfs pool (141445-01) 9246:67c03c93c071 6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01) 9276:a8a7fc849933 6816124 System crash running zpool destroy on broken zpool (141445-03) 9355:09928982c591 6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03) 9391:413d0661ef33 6710376 log device can show incorrect status when other parts of pool are degraded (141445-03) 9396:f41cf682d0d3 (part already merged) 6501037 want user/group quotas on ZFS (141445-03) 6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03) 6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03) 6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03) 9404:319573cd93f8 6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03) 9412:4aefd8704ce0 6717022 ZFS DMU needs zero-copy support (141445-03) 9425:e7ffacaec3a8 6799895 spa_add_spares() needs to be protected by config lock (141445-03) 6826466 want to post sysevents on hot spare activation (141445-03) 6826468 spa 'allowfaulted' needs some work (141445-03) 6826469 kernel support for storing vdev FRU information (141445-03) 6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03) 6826471 I/O errors after device remove probe can confuse FMA (141445-03) 6826472 spares should enjoy some of the benefits of cache devices (141445-03) 9443:2a96d8478e95 6833711 gang leaders shouldn't have to be logical (141445-03) 9463:d0bd231c7518 6764124 want zdb to be able to checksum metadata blocks only (141445-03) 9465:8372081b8019 6830237 zfs panic in zfs_groupmember() (141445-03) 9466:1fdfd1fed9c4 6833162 phantom log device in zpool status (141445-03) 9469:4f68f041ddcd 6824968 add ZFS userquota support to rquotad (141445-03) 9470:6d827468d7b5 6834217 godfather I/O should reexecute (141445-03) 9480:fcff33da767f 6596237 Stop looking and start ganging (141909-02) 9493:9933d599bc93 6623978 lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06) 9512:64cafcbcc337 6801810 Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A) 9515:d3b739d9d043 6586537 async zio taskqs can block out userland commands (142901-09) 9554:787363635b6a 6836768 zfs_userspace() callback has no way to indicate failure (N/A) 9574:1eb6a6ab2c57 6838062 zfs panics when an error is encountered in space_map_load() (141909-02) 9583:b0696cd037cc 6794136 Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03) 9630:e25a03f552e0 6776104 "zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06) 9653:a70048a304d1 6664765 Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06) 9688:127be1845343 6841321 zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A) 6843069 zfs get userused@S-1-... doesn't work (N/A) 9873:8ddc892eca6e 6847229 assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06) 9904:d260bd3fd47c 6838344 kernel heap corruption detected on zil while stress testing (141445-06) 9951:a4895b3dd543 6844900 zfs_ioc_userspace_upgrade leaks (N/A) 10040:38b25aeeaf7a 6857012 zfs panics on zpool import (141445-06) 10000:241a51d8720c 6848242 zdb -e no longer works as expected (N/A) 10100:4a6965f6bef8 6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07) 10160:a45b03783d44 6861983 zfs should use new name <-> SID interfaces (N/A) 6862984 userquota commands can hang (141445-06) 10299:80845694147f 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A) 10302:a9e3d1987706 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A) 10575:2a8816c5173b (partial merge) 6882227 spa_async_remove() shouldn't do a full clear (142901-14) 10800:469478b180d9 6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09) 6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A) 10801:e0bf032e8673 (partial merge) 6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09) 10810:b6b161a6ae4a 6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09) 10890:499786962772 6807339 spurious checksum errors when replacing a vdev (142901-13) 11249:6c30f7dfc97b 6906110 bad trap panic in zil_replay_log_record (142901-13) 6906946 zfs replay isn't handling uid/gid correctly (142901-13) 11454:6e69bacc1a5a 6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10) 11546:42ea6be8961b (partial merge) 6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09) Discussed with: pjd Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 months Notes: svn path=/head/; revision=209962
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.Pawel Jakub Dawidek2008-11-171-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris Notes: svn path=/head/; revision=185029
* Please welcome ZFS - The last word in file systems.Pawel Jakub Dawidek2007-04-061-0/+69
ZFS file system was ported from OpenSolaris operating system. The code in under CDDL license. I'd like to thank all SUN developers that created this great piece of software. Supported by: Wheel LTD (http://www.wheel.pl/) Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/) Supported by: Sentex (http://www.sentex.net/) Notes: svn path=/head/; revision=168404