aboutsummaryrefslogtreecommitdiff
path: root/sys/cddl/contrib/opensolaris/uts/common/fs
Commit message (Collapse)AuthorAgeFilesLines
* ZFS: Allow setting checksum=skein on boot poolsAllan Jude2020-06-191-10/+1
| | | | | | | | | PR: 245889 Reported by: delphij Sponsored by: Klara Inc. Notes: svn path=/head/; revision=362396
* Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.Rick Macklem2020-06-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088 Notes: svn path=/head/; revision=362158
* fix up r362047: a call to zvol_*_minors() was not hidden from userlandAndriy Gapon2020-06-111-0/+2
| | | | | | | | | Reported by: CI/FreeBSD-head-powerpc64-build MFC after: 5 weeks X-MFC with: r362047 Notes: svn path=/head/; revision=362048
* rework how ZVOLs are updated in response to DSL operationsAndriy Gapon2020-06-1110-72/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - https://github.com/zfsonlinux/zfs/commit/a0bd735adb1b1eb81fef10b4db102ee051c4d4ff - https://github.com/zfsonlinux/zfs/issues/3681 - https://github.com/zfsonlinux/zfs/issues/2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478 Notes: svn path=/head/; revision=362047
* Don't block on the range lock in zfs_getpages().Mark Johnston2020-05-203-22/+67
| | | | | | | | | | | | | | | | | | | | | | | | After r358443 the vnode object lock no longer synchronizes concurrent zfs_getpages() and zfs_write() (which must update vnode pages to maintain coherence). This created a potential deadlock between ZFS range locks and VM page busy locks: a fault on a mapped file will cause the fault page to be busied, after which zfs_getpages() locks a range around the file offset in order to map adjacent, resident pages; zfs_write() locks the range first, and then must busy vnode pages when synchronizing. Solve this by adding a non-blocking mode for ZFS range locks, and using it in zfs_getpages(). If zfs_getpages() fails to acquire the range lock, only the fault page will be populated. Reported by: bdrewery Reviewed by: avg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24839 Notes: svn path=/head/; revision=361287
* zfs: reject read(2) of a dirfd with EISDIRKyle Evans2020-05-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | This is independent of the recently-discussed global change, which is still in review/discussion stage. This is effectively a measure for consistency in the ZFS world, where FreeBSD was the only platform (as far as I could find) that allowed this. What ZFS exposes is decidedly not useful for any real purposes, to paraphrase (hopefully faithfully) jhb's findings when exploring this: The size of a directory in ZFS is the number of directory entries within. When reading a directory, you would instead get the leading part of its raw contents; the amount you get being dictated by the "size," i.e. number of directory entries. There's decidedly (luckily) no stack disclosure happening here, though the behavior is bizarre and almost certainly a historical accident. This change has already been upstreamed to OpenZFS. MFC after: 1 week Notes: svn path=/head/; revision=361238
* Correct the order of arguments to copyin() for Q_SETQUOTA.John Baldwin2020-05-181-1/+1
| | | | | | | | | MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24656 Notes: svn path=/head/; revision=361220
* Avoid the GEOM topology lock recursion when we automatically expand a pool.Pawel Jakub Dawidek2020-04-251-2/+6
| | | | | | | | | | | | | The steps to reproduce the problem: mdconfig -a -t swap -s 3g -u 0 gpart create -s GPT md0 gpart add -t freebsd-zfs -s 1g md0 zpool create -o autoexpand=on foo md0p1 gpart resize -i 1 -s 2g md0 Notes: svn path=/head/; revision=360325
* Make ZFS depend on xdr.ko only. It doesn't need kernel RPC.Gleb Smirnoff2020-04-171-1/+1
| | | | | | | Differential Revision: https://reviews.freebsd.org/D24408 Notes: svn path=/head/; revision=360037
* MFOpenZFS: ZVOLs should not be allowed to have childrenRyan Moeller2020-03-254-6/+81
| | | | | | | | | | | | | | | | | | | | | | | | zfs create, receive and rename can bypass this hierarchy rule. Update both userland and kernel module to prevent this issue and use pyzfs unit tests to exercise the ioctls directly. Note: this commit slightly changes zfs_ioc_create() ABI. This allow to differentiate a generic error (EINVAL) from the specific case where we tried to create a dataset below a ZVOL (ZFS_ERR_WRONG_PARENT). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Approved by: mav (mentor) MFC after: 2 weeks Sponsored by: iXsystems, Inc. openzfs/zfs@d8d418ff0cc90776182534bce10b01e9487b63e4 Notes: svn path=/head/; revision=359303
* MFOpenZFS: make zil max block size tunableAlexander Motin2020-03-195-31/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've observed that on some highly fragmented pools, most metaslab allocations are small (~2-8KB), but there are some large, 128K allocations. The large allocations are for ZIL blocks. If there is a lot of fragmentation, the large allocations can be hard to satisfy. The most common impact of this is that we need to check (and thus load) lots of metaslabs from the ZIL allocation code path, causing sync writes to wait for metaslabs to load, which can take a second or more. In the worst case, we may not be able to satisfy the allocation, in which case the ZIL will resort to txg_wait_synced() to ensure the change is on disk. To provide a workaround for this, this change adds a tunable that can reduce the size of ZIL blocks. External-issue: DLPX-61719 Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #8865 openzfs/zfs@b8738257c2607c73c731ce8e0fd73282b266d6ef MFC after: 2 weeks Notes: svn path=/head/; revision=359112
* Fix infinite scan on a pool with only special allocationsAlexander Motin2020-03-161-3/+6
| | | | | | | | | | | | | | | | | | | Attempt to run scrub or resilver on a new pool containing only special allocations (special vdev added on creation) caused infinite loop because of dsl_scan_should_clear() limiting memory usage to 5% of pool size, which it calculated accounting only normal allocation class. Addition of special and just in case dedup classes fixes the issue. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #10106 Closes #8694 openzfs/zfs@fa130e010c2ff9b33aba11d2699b667e454b3ccb Notes: svn path=/head/; revision=359018
* zfs dmu_read: loosen the assertion.Konstantin Belousov2020-03-061-2/+2
| | | | | | | | | | | | Since switch to the lockless grab, shared busy for ahead/behind pages allows other threads to validate and map the pages readonly. Reviewed by: avg, jeff, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23986 Notes: svn path=/head/; revision=358719
* Remove vfs.zfs.top_maxinflight tunable/sysctl.Alexander Motin2020-03-051-9/+0
| | | | | | | | | | It is dead since sorted scrub import at r334844. MFC after: 1 week Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=358683
* Increase number of write completion threads, matching ZoL.Alexander Motin2020-03-031-1/+1
| | | | | | | | | | | | | | Our iSCSI benchmarks on a large 80-core system show that previous limit of 8 threads can be a bottleneck. At some points this change increases write IOPS by as much as 50%. I am still not sure that so many threads is really required, but we tested lower amounts and got no significant benefits, while latencies were a bit worse, so decided to not diverge. MFC after: 1 week Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=358580
* Eliminate object locking in zfs where possible with the new lockless grabJeff Roberson2020-02-282-33/+19
| | | | | | | | | | APIs. Reviewed by: kib, markj, mmacy Differential Revision: https://reviews.freebsd.org/D23848 Notes: svn path=/head/; revision=358443
* remove stray space symbol in r358380Andriy Gapon2020-02-271-1/+1
| | | | | | | | MFC after: 1 week X-MFC with: r358380 Notes: svn path=/head/; revision=358382
* use ZFS_MAX_DATASET_NAME_LEN instead of MAXPATHLEN for dataset namesAndriy Gapon2020-02-271-12/+12
| | | | | | | | | | The change affects only FreeBSD specific code as the common code already mostly uses the more idiomatic and correct ZFS_MAX_DATASET_NAME_LEN. MFC after: 1 week Notes: svn path=/head/; revision=358381
* dsl_dataset_promote_sync: populate 'oldname' before using itAndriy Gapon2020-02-271-0/+4
| | | | | | | | | | | It's very unlikely that zfsvfs_update_fromname() and zvol_rename_minors() ever did anything during the promote operation as the old name was not initialized. MFC after: 1 week Notes: svn path=/head/; revision=358380
* MFZoL: Relax restriction on zfs_ioc_next_obj() iterationAlexander Motin2020-02-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It doesn't need to be created in that snapshot to correspond to a removed file. In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2". Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com> Closes #2081 zfsonlinux/zfs@7290cd3c4ed19fb3f75b8133db2e36afcdd24beb MFC after: 1 week Notes: svn path=/head/; revision=358357
* MFZoL: Fix resilver writes in vdev_indirect_io_startAlexander Motin2020-02-261-8/+15
| | | | | | | | | | | | | | | | | | | | This patch addresses an issue found in ztest where resilver write zios that were passed to an indirect vdev would end up being handled as though they were resilver read zios. This caused issues where the zio->io_abd would be both read to and written from at the same time, causing asserts to fail. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8193 zfsonlinux/zfs@5aa95ba0d3502779695341b5f55fa5ba1d3330ff MFC after: 1 week Notes: svn path=/head/; revision=358342
* Fix patch mismerge in r358336.Alexander Motin2020-02-261-2/+2
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=358340
* MFZoL: Fix issue with scanning dedup blocks as scan endsAlexander Motin2020-02-261-0/+16
| | | | | | | | | | | | | | | | | | | | This patch fixes an issue discovered by ztest where dsl_scan_ddt_entry() could add I/Os to the dsl scan queues between when the scan had finished all required work and when the scan was marked as complete. This caused the scan to spin indefinitely without ending. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@5e0bd0ae056e26de36dee3c199c6fcff8f14ee15 MFC after: 1 week Notes: svn path=/head/; revision=358339
* MFZoL: Fix 2 small bugs with cached dsl_scan_phys_tAlexander Motin2020-02-261-1/+4
| | | | | | | | | | | | | | | | | | | | This patch corrects 2 small bugs where scn->scn_phys_cached was not properly updated to match the primary copy when it needed to be. The first resulted in the pause state not being properly updated and the second resulted in the cached version being completely zeroed even if the primary was not. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8010 zfsonlinux/zfs@8cb119e3dc0ac6c90b1517fbadc021b7e9741fc6 MFC after: 1 week Notes: svn path=/head/; revision=358337
* MFZoL: Fix txg_sync_thread hang in scan_exec_io()Alexander Motin2020-02-261-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scn->scn_maxinflight_bytes has not been initialized it's possible to hang on the condition variable in scan_exec_io(). This issue was uncovered by ztest and is only possible when deduplication is enabled through the following call path. txg_sync_thread() spa_sync() ddt_sync_table() ddt_sync_entry() dsl_scan_ddt_entry() dsl_scan_scrub_cb() dsl_scan_enqueuei() scan_exec_io() cv_wait() Resolve the issue by always initializing scn_maxinflight_bytes to a reasonable minimum value. This value will be recalculated in dsl_scan_sync() to pick up changes to zfs_scan_vdev_limit and the addition/removal of vdevs. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7098 zfsonlinux/zfs@f90a30ad1b32a971f62a540f8944e42f99b254ce MFC after: 1 week Notes: svn path=/head/; revision=358336
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-2611-12/+24
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Remove duplicate dbufs accounting.Alexander Motin2020-02-073-4/+9
| | | | | | | | | | | | | | | | | Since AVL already has embedded element counter, use dn_dbufs_count only for dbufs not counted there (bonus buffers) and just add them. This removes two atomics per dbuf life cycle. According to profiler it reduces time spent by dbuf_destroy() inside bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core. This counter is used only on illumos, so for FreeBSD it was just a waste of time. MFC after: 2 weeks Notes: svn path=/head/; revision=357657
* Reduce number of atomic_add() calls in aggsum.Alexander Motin2020-02-062-33/+34
| | | | | | | | | | | | | | | | | | | | | | | | | Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to re-borrow after the flush. But since asc_borrowed and asc_delta are accessed only while holding asc_lock, it makes no any sense to modify as_lower_bound and as_upper_bound in multiple steps. Instead of that the new code uses only 2 atomics in all the cases, one per as_*_bound variable. I think even that is overkill, simple atomic store and load could be used here, since all modifications are done under the as_lock, but there are no such primitives in ZFS code now. While there, make borrow code consider previous borrow value, so that on mixed request patterns reduce chance of needing to borrow again if much larger request follows tiny one that needed borrow. Also reduce as_numbuckets from uint64_t to u_int. It makes no sense to use so large division operation on every aggsum_add(). Reviewed by: Brian Behlendorf, Paul Dagnelie MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=357639
* Few microoptimizations to dbuf layer.Alexander Motin2020-02-042-28/+16
| | | | | | | | | | | | | | | | | | | | | Move db_link into the same cache line as db_blkid and db_level. It allows significantly reduce avl_add() time in dbuf_create() on systems with large RAM and huge number of dbufs per dnode. Avoid few accesses to dbuf_caches[].size, which is highly congested under high IOPS and never stays in cache for a long time. Use local value we are receiving from zfs_refcount_add_many() any way. Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see a point to do it on dbuf eviction after we done it on insertion in dbuf_rele_and_unlock(). Reviewed by: mahrens, Brian Behlendorf MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=357502
* Unblock kstat.zfs.misc.dbufstats sysctls.Alexander Motin2020-02-031-9/+7
| | | | | | | | | | It is not so much broken to hide it after we wasted time to collect it. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=357453
* Provide O_SEARCHKyle Evans2020-02-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247 Notes: svn path=/head/; revision=357412
* zfs: light refactor to indicate cachedlookup in zfs_lookupKyle Evans2020-02-021-17/+22
| | | | | | | | | | | | | | If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will have been done in the caller (vfs_cache_lookup). This is a part of D23247, which may skip the earlier VEXEC check as well if the root fd was opened with O_SEARCH. This one required slightly more work as zfs_lookup may also be called indirectly as VOP_LOOKUP or a couple of other places where we must do the check. Notes: svn path=/head/; revision=357411
* zfs: ZFS_WLOCK_TEARDOWN_INACTIVE_WLOCKED -> ZFS_TEARDOWN_INACTIVE_WLOCKEDMateusz Guzik2020-02-013-3/+3
| | | | | | | Fix up the argument used in one case as well. Notes: svn path=/head/; revision=357357
* zfs: convert z_teardown_inactive_lock to sleepable read-mostly lockMateusz Guzik2020-01-312-9/+10
| | | | | | | | | | | | This eliminates a global serialisation point. It only gets write locked on unmount. Sample result doing an incremental -j 40 build: before: 173.30s user 458.97s system 2595% cpu 24.358 total after: 168.58s user 254.92s system 2211% cpu 19.147 total Notes: svn path=/head/; revision=357322
* zfs: provide macros to handle z_teardown_inactive_lockMateusz Guzik2020-01-314-14/+32
| | | | Notes: svn path=/head/; revision=357321
* zfs: fix spurious lock contention during path lookupMateusz Guzik2020-01-303-0/+43
| | | | | | | | | | ZFS tracks if anything denies VEXEC to allow for a quick check for the common case of path traversal. Use it. Differential Revision: https://reviews.freebsd.org/D22224 Notes: svn path=/head/; revision=357282
* zfs: use VOP_NEED_INACTIVEMateusz Guzik2020-01-301-0/+24
| | | | | | | | | Big thanks to Greg V for testing previous verions of the patch. Differential Revision: https://reviews.freebsd.org/D22130 Notes: svn path=/head/; revision=357281
* Map ECKSUM and EFRAGS from ZFS onto real errnos.Alexander Motin2020-01-131-5/+4
| | | | | | | | | | | | | | | | | Make it less confusing when, for example, stat sets errno to 122 because a checksum failed in ZFS: Before: getfacl: /foo/bar: stat() failed: Unknown error: 122 After: getfacl: /foo/bar: stat() failed: Integrity check failed Submitted by: Ryan Moeller <ryan@ixsystems.com> Reviewed by: mckusick, mav MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D22973 Notes: svn path=/head/; revision=356707
* Add KERNEL_PANICKED macro for use in place of direct panicstr testsMateusz Guzik2020-01-122-2/+2
| | | | Notes: svn path=/head/; revision=356655
* zfs: add missing CLTFLAG_MPSAFE annotationsMateusz Guzik2020-01-121-3/+6
| | | | Notes: svn path=/head/; revision=356651
* vfs: prealloc vnodes in getnewvnode_reserveMateusz Guzik2020-01-113-7/+7
| | | | | | | | | | | | Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations. Notes: svn path=/head/; revision=356643
* zfs: plug a vnode reserve leak in zfs_make_xattrdirMateusz Guzik2020-01-071-0/+1
| | | | Notes: svn path=/head/; revision=356436
* vfs: drop the mostly unused flags argument from VOP_UNLOCKMateusz Guzik2020-01-038-44/+44
| | | | | | | | | | | Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427 Notes: svn path=/head/; revision=356337
* Remove page locking for queue operations.Mark Johnston2019-12-282-6/+0
| | | | | | | | | | | | | | | | With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885 Notes: svn path=/head/; revision=356157
* vfs: flatten vop vectorsMateusz Guzik2019-12-162-0/+6
| | | | | | | | | | | | | | | This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738 Notes: svn path=/head/; revision=355790
* Use a callout instead of timeout(9) for delayed zio's.John Baldwin2019-12-132-0/+15
| | | | | | | | Reviewed by: avg Differential Revision: https://reviews.freebsd.org/D22597 Notes: svn path=/head/; revision=355726
* vfs: locking primitives which elide ->v_vnlock and shared locking disablementMateusz Guzik2019-12-111-1/+5
| | | | | | | | | | | | | | | | | | | Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665 Notes: svn path=/head/; revision=355633
* vfs: introduce v_irflag and make v_type smallerMateusz Guzik2019-12-083-6/+5
| | | | | | | | | | | | | | | | | | The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715 Notes: svn path=/head/; revision=355537
* Fix an inverted condition introduced in r353539.Mark Johnston2019-12-061-1/+1
| | | | | | | | | This would have most likely resulted in read errors causing page leaks. Submitted by: jeff Notes: svn path=/head/; revision=355471
* Add a VN_OPEN_INVFS flag.Konstantin Belousov2019-11-291-2/+3
| | | | | | | | | | | | | | | | | | | | | vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=355211