aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* unlink, rmdir: call notify_upper from VOP pre method instead of syscallKonstantin Belousov13 days2-2/+2
Suppose that there are two or more nullfs mounts over some fs, and suppose the we unlink a file on one of the nullfs mount. This way notify_upper get called from the lower vnode as well, allowing the other nullfs mounts to note that and drop their caches for the unlinked vnode. PR: 254210 Reviewed by: olce Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D48825
* FreeBSD: Remove some illumos compat from vnode.hAlexander Motin2024-12-032-22/+2
| | | | | | | | | | Should make no difference, just some dead code cleanup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16808
* FreeBSD: Return ifndef IN_BASE back to fix the buildAlexander Motin2024-12-031-0/+2
FreeBSD's libprocstat seems to build kernel code in user space, which does not work here due to undefined vnode_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16808
* vnode: Make the vop_vector reference a pointer to constMark Johnston2024-11-262-3/+3
No functional change intended. MFC after: 1 week
* zfs: merge openzfs/zfs@d0a91b9f8Martin Matuska2024-11-2437-907/+956
| | | | | | | | | | | | | | | | | | | | Notable upstream pull request merges: #16643 -multiple Change rangelock handling in FreeBSD's zfs_getpages() #16697 46c4f2ce0 dsl_dataset: put IO-inducing frees on the pool deadlist #16740 -multiple BRT: Rework structures and locks to be per-vdev #16743 a60ed3822 L2ARC: Move different stats updates earlier #16758 8dc452d90 Fix some nits in zfs_getpages() #16759 534688948 Remove hash_elements_max accounting from DBUF and ARC #16766 9a81484e3 ZAP: Reduce leaf array and free chunks fragmentation #16773 457f8b76e BRT: More optimizations after per-vdev splitting #16782 0ca82c568 L2ARC: Stop rebuild before setting spa_final_txg #16785 d76d79fd2 zio: Avoid sleeping in the I/O path #16791 ae1d11882 BRT: Clear bv_entcount_dirty on destroy #16796 b3b0ce64d FreeBSD: Lock vnode in zfs_ioctl() #16797 d0a91b9f8 FreeBSD: Reduce copy_file_range() source lock to shared Obtained from: OpenZFS OpenZFS commit: d0a91b9f88a47316158508bf304a61baa8c99c10
* FreeBSD: Reduce copy_file_range() source lock to sharedAlexander Motin2024-11-231-1/+1
| | | | | | | | | | | | | | | Linux locks copy_file_range() source as shared. FreeBSD was doing it also, but then was changed to exclusive, partially because KPI of that time was doing so, and partially seems out of caution. Considering zfs_clone_range() uses range locks on both source and destination, neither should require exclusive vnode locks. But one step at a time, just sync it with Linux for now. Reviewed-by: Alan Somers <asomers@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16789 Closes #16797
* FreeBSD: Lock vnode in zfs_ioctl()Alexander Motin2024-11-232-4/+4
Previously vnode was not locked there, unlike Linux. It required locking it in vn_flush_cached_data(), which recursed on the lock if called from zfs_clone_range(), having the vnode locked. Reviewed-by: Alan Somers <asomers@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16789 Closes #16796
* Fix a potential page leak in mappedread_sf()Mark Johnston2024-11-131-1/+3
mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #16643
* Revert commit 8733bc277a383cf59f38a83956f4f523869cfc90Kirk McKusick2024-11-131-18/+4
Author: Mateusz Guzik <mjg@FreeBSD.org> Date: Thu Sep 14 16:13:01 2023 +0000 vfs: don't provoke recycling non-free vnodes without a good reason If the total number of free vnodes is at or above target, there is no point creating more of them. This commit was done as a performance optimization but ends up causing slowdowns when doing operations on many files. Requested by: re (cperciva) MFC after: 1 minute
* Fix "vrefact: wrong use count 0" with DRMEdward Tomasz Napierala2024-11-131-2/+2
Bump the vnode use count, not its hold count. This fixes a panic triggered by fstatat(..., AT_EMPTY_PATH) on DRM device nodes, which happens to be what glxinfo(1) from Ubuntu Jammy is doing. PR: kern/274538 Reviewed By: kib (earlier version), olce Differential Revision: https://reviews.freebsd.org/D47391
* vnode.9: Document vnode_if.awk and vnode_if.srcMateusz Piotrowski2024-10-141-3/+25
Discussed with: bjk, imp Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D27196
* vnode.h: add comment line about VIRF_ flagsKonstantin Belousov2024-10-131-0/+1
Sponsored by: The FreeBSD Foundation MFC after: 3 days
* kinfo_{vmobject,vmentry}: move copy of pathes into the vnode handling scopeKonstantin Belousov2024-10-082-7/+5
Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D46970
* kinfo_vmobject: report backing object of the SysV shm segmentsKonstantin Belousov2024-10-072-1/+15
| | | | | | | | | | | | Use reserved work for kvo_flags. Mark such object with KVMO_FLAG_SYSVSHM. Provide segment key in kvo_vn_fileid, vnode never can back shm mapping. Provide sequence number in kvo_vn_fsid_freebsd11. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D46959
* kinfo_vmentry: report mappings of the SysV shm segmentsKonstantin Belousov2024-10-072-0/+11
Mark such mappings with the new flag KVME_FLAG_SYSVSHM. Provide segment key in kve_vn_fileid, vnode never can back shm mapping. Provide sequence number in kve_vn_fsid_freebsd11. Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D46959
* unix: Use a dedicated mtx pool for vnode name locksJohn Baldwin2024-10-021-4/+6
mtxpool_sleep should be used as a leaf lock since it is a general purpose pool shared across many consumers. Holding a pool lock while acquiring other locks (e.g. the socket buffer lock in soreserve()) can trigger LOR warnings for unrelated code. Reviewed by: glebius Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D46792
* rangelock: Disable cheat mode by defaultMark Johnston2024-08-271-1/+1
Cheat mode is incompatible with code which locks multiple ranges in the same vnode, with at least one range being write-locked. This can arise in kern_copy_file_range(). Until that's handled somehow, avoid the problem to make the fusefs tests stable. PR: 281073 Fixes: 9ef425e560a9 ("rangelocks: add fast cheating mode") Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46457
* unionfs: fix LINT buildJason A. Harmening2024-07-131-2/+2
| | | | | | | | | Fix a stale variable name that snuck into a tracepoint from an earlier version of the change. Fixes: eb60ff1e "unionfs: rework locking scheme to only lock a single vnode" Reported by: jenkins
* p9fs: remove duplicated codeDanilo Egea Gondolfo2024-07-131-10/+0
This code is using the vnode after it has been released and causing a panic when a p9fs shared volume is unmounted. In fact, it seems like it's just duplicated code left behind from a bad merge. PR: 279887 Reported by: Michael Dexter Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1323
* unionfs: rework locking scheme to only lock a single vnodeJason A. Harmening2024-07-124-746/+979
Instead of locking both the lower and upper vnodes, which is both complex and deadlock-prone, only lock the upper vnode, or the lower vnode if no upper vnode is present. In most cases this is all that is needed; for the cases in which both vnodes do need to be locked, this change also employs deadlock- avoiding techniques such as LK_NOWAIT and vn_lock_pair(). There are still some corner cases in which the current implementation ends up taking multiple vnode locks across different filesystems without taking special steps to avoid deadlock; those cases have been noted in the comments. Differential Revision: https://reviews.freebsd.org/D45398 Reviewed by: olce Tested by: pho
* sound: Fix lock order reversals in mseq_open()Christos Margiolis2024-07-063-11/+42
Opening /dev/sequencer after a clean reboot yields: lock order reversal: (sleepable after non-sleepable) 1st 0xfffffe004a2c2c08 seqflq (seqflq, sleep mutex) @ /mnt/src/sys/dev/sound/midi/sequencer.c:754 2nd 0xffffffff84197ed8 midistat lock (midistat lock, sx) @ /mnt/src/sys/dev/sound/midi/midi.c:1478 lock order seqflq -> midistat lock attempted at: 0xffffffff811c9029 at witness_checkorder+0x12b9 0xffffffff810f18a7 at _sx_xlock+0xf7 0xffffffff8417f992 at midimapper_open+0x22 0xffffffff84182770 at mseq_open+0xf0 0xffffffff80e3380f at devfs_open+0x30f 0xffffffff81b8b4b7 at VOP_OPEN_APV+0x57 0xffffffff812da1e7 at vn_open_vnode+0x397 0xffffffff812d96b3 at vn_open_cred+0xb23 0xffffffff812c2c6b at openatfp+0x52b 0xffffffff812c2711 at sys_openat+0x81 0xffffffff84110579 at filemon_wrapper_openat+0x19 0xffffffff81a223ae at amd64_syscall+0x39e 0xffffffff819dd0eb at fast_syscall_common+0xf8 Expose midistat_lock to midi/midi.c so that we can acquire the lock from mseq_open() before we lock seq_lock, and introduce _locked variants of midimapper_open() and midimapper_fetch_synth(). Sponsored by: The FreeBSD Foundation MFC after: 2 days Reviewed by: dev_submerge.ch, emaste Differential Revision: https://reviews.freebsd.org/D45770
* Add an implementation of the 9P filesystemDoug Rabson2024-06-1923-1/+6694
This is derived from swills@ fork of the Juniper virtfs with many changes by me including bug fixes, style improvements, clearer layering and more consistent logging. The filesystem is renamed to p9fs to better reflect its function and to prevent possible future confusion with virtio-fs. Several updates and fixes from Juniper have been integrated into this version by Val Packett and these contributions along with the original Juniper authors are credited below. To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The bhyve virtio-9p device allows access from the guest to files on the host by mapping a 'sharename' to a host path. It is possible to use p9fs as a root filesystem by adding this to /boot/loader.conf: vfs.root.mountfrom="p9fs:sharename" for non-root filesystems add something like this to /etc/fstab: sharename /mnt p9fs rw 0 0 In both examples, substitute the share name used on the bhyve command line. The 9P filesystem protocol relies on stateful file opens which map protocol-level FIDs to host file descriptors. The FreeBSD vnode interface doesn't really support this and we use heuristics to guess the right FID to use for file operations. This can be confused by privilege lowering and does not guarantee that the FID created for a given file open is always used for file operations, even if the calling process is using the file descriptor from the original open call. Improving this would involve changes to the vnode interface which is out-of-scope for this import. Differential Revision: https://reviews.freebsd.org/D41844 Reviewed by: kib, emaste, dch MFC after: 3 months Co-authored-by: Val Packett <val@packett.cool> Co-authored-by: Ka Ho Ng <kahon@juniper.net> Co-authored-by: joyu <joyul@juniper.net> Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
* getblk: track "non-sterile" bufobj to avoid bo lock on miss if sterileRyan Libby2024-06-163-1/+18
This is a scheme to avoid taking the bufobj lock and doing a second lookup in the case where in getblk we do an unlocked lookup and find no buf. Was there really no buf, or were we in the middle of a reassignbuf race? By tracking any use of reassignbuf with a flag, we can know if there can't have been a race because there has been no reassignbuf. Because this scheme is spoiled on the first use of reassignbuf, it is mostly only beneficial for cases where a certain vnode is never expected to use dirty bufs at all. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D45571
* getblk: reduce time under bufobj lockRyan Libby2024-06-063-51/+100
Use the new pctrie combined insert/lookup facility to reduce work and time under the bufobj interlock when associating a buf with a vnode. We now do one lookup in the dirty tree and one combined lookup/insert in the clean tree instead of one lookup in dirty, two in clean, and then an insert in clean. We also avoid touching the possibly unrelated buf at the tail of the queue. Also correct an issue where the actual order of the tail queue depended on the insertion order due to sign issues. Reviewed by: kib (previous version), dougm, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D45395
* Stop treating size 0 as unknown size in vnode_create_vobject().Pawel Jakub Dawidek2024-05-236-18/+51
Whenever file is created, the vnode_create_vobject() function will try to determine its size by calling vn_getsize_locked() as size 0 is ambigious: it means either the file size is 0 or the file size is unknown. Introduce special value for the size argument: VNODE_NO_SIZE. Only when it is given, the vnode_create_vobject() will try to obtain file's size on its own. Introduce dedicated vnode_disk_create_vobject() for use by g_vfs_open(), so we don't have to call vn_isdisk() in the common case (for regular files). Handle the case of mediasize==0 in g_vfs_open(). Reviewed by: alc, kib, markj, olce Approved by: oshogbo (mentor), allanjude (mentor) Differential Revision: https://reviews.freebsd.org/D45244
* mqueuefs: mark newly allocated vnode as constructed, under the lockKonstantin Belousov2024-05-221-0/+1
Sponsored by: The FreeBSD Foundation MFC after: 1 week
* vfs_domount_update(): postpone setting MNT_UNION until VFS_MOUNT() is doneKonstantin Belousov2024-05-161-1/+8
The file system that handles updating the mount point might do lookups during the update, in which case it could find the flag MNT_UNION set on the mp while mount point is still not updated. In particular, the rootvp->v_mount->mnt_vnodecovered is not yet set. Delay setting MNT_UNION until the mount is performed. PR: 265311 Reported by: Robert Morris <rtm@lcs.mit.edu> Reviewed by: mckusick, olce Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D45208
* tmpfs_destroy_vobject(): clear v_object under the object lockKonstantin Belousov2024-05-131-1/+3
Which allows tmpfs_pager_writecount_recalc() to reliably detect reclaimed vnode and make its accesses to object->un_pager.swp.private (== vp) safe against reclaim. Note that vnode instantiation already assigns v_object under the object lock. Reviewed by: markj Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D45119
* tmpfs: recalculate OBJ_TMPFS_VREF on reinstantiating node' vnodeKonstantin Belousov2024-05-131-3/+7
Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D45119
* md: Merge two switch statements in mdstart_vnodeJohn Baldwin2024-05-101-24/+20
While here, use bp->bio_cmd instead of auio.uio_rw to drive read vs write behavior. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D45155
* sound: Add missing oss_mixerinfo devnode and legacy_device fieldsChristos Margiolis2024-05-092-7/+8
They are missing from soundcard.h and are in fact used by some applications, such as OSS' ossinfo(1): http://manuals.opensound.com/developer/ossinfo.c.html The new size for filler is chosen according to the most recent official version of soundcard.h, which includes those 2 fields. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: dev_submerge.ch Differential Revision: https://reviews.freebsd.org/D45137
* unionfs_rename: fix numerous locking issuesJason A. Harmening2024-04-291-56/+96
There are a few places in which unionfs_rename() accesses fvp's private data without holding the necessary lock/interlock. Moreover, the implementation completely fails to handle the case in which fdvp is not the same as tdvp; in this case it simply fails to lock fdvp at all. Finally, it locks fvp while potentially already holding tvp's lock, but makes no attempt to deal with possible LOR there. Fix this by optimistically using the vnode interlock to protect the short accesses to fdvp and fvp private data, sequentially. If a file copy or shadow directory creation is required to prepare the upper FS for the rename operation, the interlock must be dropped and fdvp/fvp locked as necessary. Additionally, use ERELOOKUP (as suggested by kib@) to simplify the locking logic and eliminate unionfs_relookup() calls for file-copy/ shadow-directory cases that require tdvp's lock to be dropped. Reviewed by: kib (earlier version), olce Tested by: pho Differential Revision: https://reviews.freebsd.org/D44788
* sound: Get rid of snd_clone and use DEVFS_CDEVPRIV(9)Christos Margiolis2024-04-119-1814/+332
Currently the snd_clone framework creates device nodes on-demand for every channel, through the dsp_clone() callback, and is responsible for routing audio to the appropriate channel(s). This patch gets rid of the whole snd_clone framework (including any related sysctls) and instead uses DEVFS_CDEVPRIV(9) to handle device opening, channel allocation and audio routing. This results in a significant reduction in code size as well as complexity. Behavior that is preserved: - hw.snd.basename_clone. - Exclusive access of an audio device (i.e VCHANs disabled). - Multiple processes can read from/write to the device. - A device can only be opened as many times as the maximum allowed channel number (see SND_MAXHWCHAN in pcm/sound.h). - OSSv4 compatibility aliases are preserved. Behavior changes: Only one /dev/dspX device node is created (on attach) for each audio device, as opposed to the current /dev/dspX.Y devices created by snd_clone. According to the sound(4) man page, devices are not meant to be opened through /dev/dspX.Y anyway, so it is best if we do not create device nodes for them in the first place. As a result of this, modify dsp_oss_audioinfo() to print /dev/dspX in the "ai->devnode", instead of /dev/dspX.Y. Sponsored by: The FreeBSD Foundation MFC after: 2 months Reviewed by: dev_submerge.ch, bapt, markj Differential Revision: https://reviews.freebsd.org/D44411
* unionfs_lookup(): fix wild accesses to vnode private dataJason A. Harmening2024-04-091-7/+15
There are a few spots in which unionfs_lookup() accesses unionfs vnode private data without holding the corresponding vnode lock or interlock. Reviewed by: kib, olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44601
* tarfs: Inherit mnt_iosize_max from the lower vnodeMark Johnston2024-04-041-0/+2
There is no obvious reason to use a value smaller than that. Reviewed by: des, kib MFC after: 1 week Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D44627
* sound: Get rid of snd_clone and use DEVFS_CDEVPRIV(9)Christos Margiolis2024-03-319-1793/+329
Currently the snd_clone framework creates device nodes on-demand for every channel, through the dsp_clone() callback, and is responsible for routing audio to the appropriate channel(s). This patch gets rid of the whole snd_clone framework (including any related sysctls) and instead uses DEVFS_CDEVPRIV(9) to handle device opening, channel allocation and audio routing. This results in a significant reduction in code size as well as complexity. Behavior that is preserved: - hw.snd.basename_clone. - Exclusive access of an audio device (i.e VCHANs disabled). - Multiple processes can read from/write to the device. - A device can only be opened as many times as the maximum allowed channel number (see SND_MAXHWCHAN in pcm/sound.h). - OSSv4 compatibility aliases are preserved. Behavior changes: Only one /dev/dspX device node is created (on attach) for each audio device, as opposed to the current /dev/dspX.Y devices created by snd_clone. According to the sound(4) man page, devices are not meant to be opened through /dev/dspX.Y anyway, so it is best if we do not create device nodes for them in the first place. As a result of this, modify dsp_oss_audioinfo() to print /dev/dspX in the "ai->devnode", instead of /dev/dspX.Y. Sponsored by: The FreeBSD Foundation MFC after: 2 months Reviewed by: dev_submerge.ch, markj Differential Revision: https://reviews.freebsd.org/D44411
* unionfs: implement VOP_UNP_* and remove special VSOCK vnode handlingJason A. Harmening2024-03-241-89/+84
unionfs has a bunch of clunky special-case code to avoid creating unionfs wrapper vnodes for AF_UNIX sockets. This was added in 2008 to address PR 118346, but in the intervening years the VOP_UNP_* operations have been added to provide a clean interface to allow sockets to work in the presence of stacked filesystems. PR: 275871 Reviewed by: kib (prior version), olce Tested by: Karlo Miličević <karlo98.m@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44288
* kerneldump: add livedump_start_vnode(9)Vijeyalakshumi Koteeswaran2024-03-183-10/+25
livedump_start_vnode(9) is introduced such that the live minidump on the system could take a vnode. This interface could be used to extend support for the existing framework in downstream. Bump __FreeBSD_version for introducing livedump_start_vnode(9). Sponsored by: Juniper Networks, Inc. Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D43471
* unionfs: accommodate underlying FS calls that may re-lockJason A. Harmening2024-03-103-60/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since non-doomed unionfs vnodes always share their primary lock with either the lower or upper vnode, any forwarded call to the base FS which transiently drops that upper or lower vnode lock may result in the unionfs vnode becoming completely unlocked during that transient window. The unionfs vnode may then become doomed by a concurrent forced unmount, which can lead to either or both of the following: --Complete loss of the unionfs lock: in the process of being doomed, the unionfs vnode switches back to the default vnode lock, so even if the base FS VOP reacquires the upper/lower vnode lock, that no longer translates into the unionfs vnode being relocked. This will then violate that caller's locking assumptions as well as various assertions that are enabled with DEBUG_VFS_LOCKS. --Complete less of reference on the upper/lower vnode: the caller normally holds a reference on the unionfs vnode, while the unionfs vnode in turn holds references on the upper/lower vnodes. But in the course of being doomed, the unionfs vnode will drop the latter set of references, which can effectively lead to the base FS VOP executing with no references at all on its vnode, violating the assumption that vnodes can't be recycled during these calls and (if lucky) violating various assertions in the base FS. Fix this by adding two new functions, unionfs_forward_vop_start_pair() and unionfs_forward_vop_finish_pair(), which are intended to bookend any forwarded VOP which may transiently unlock the relevant vnode(s). These functions are currently only applied to VOPs that modify file state (and require vnode reference and lock state to be identical at call entry and exit), as the common reason for transiently dropping locks is to update filesystem metadata. Reviewed by: olce Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44076
* uipc_bindat(): Explicitly specify exclusive locking for the new vnodeJason A. Harmening2024-03-101-1/+12
| | | | | | | | | | | | | | | | | | | | | | When calling VOP_CREATE(), uipc_bindat() reuses the componentname object from the preceding lookup operation, which is likely to specify LK_SHARED. Furthermore, the VOP_CREATE() interface technically only requires the newly-created vnode to be returned with a shared lock. However, the socket layer requires the new vnode to be locked exclusive and asserts to that effect. In most cases, this is not a practical concern because most if not all base-layer filesystems (certainly FFS, ZFS, and msdosfs at least) always return the vnode locked exclusive regardless of the lock flags. However, it is an issue for unionfs which uses cn_lkflags to determine how the new unionfs wrapper vnode should be locked. While it would be easy enough to work around this issue within unionfs itself, it seems better for the socket layer to be explicit about its locking requirements when issuing VOP_CREATE(). Reviewed by: kib, olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44047
* vn_lock_pair(): allow lkflags1/lkflags2 to be 0 if vp1/vp2 is NULLJason A. Harmening2024-03-101-2/+4
It's a bit strange to require the caller to pass contrived lock flags if the corresponding vnode is NULL, simply to appease the assertion that exactly one of LK_SHARED or LK_EXCLUSIVE must be set. On the other hand, we still want to catch cases in which completely bogus or corrupt flags are passed even if the corresponding vnode is NULL. Therefore, specifically allow empty flags for lkflags1/lkflags2 iff the respective vp1/vp2 param is NULL. Reviewed by: kib, olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44046
* mount_nullfs(8): document -o cache and vfs.nullfs.cache_vnodesKonstantin Belousov2024-03-081-2/+8
Sponsored by: The FreeBSD Foundation MFC after: 1 week
* msdosfs: fix potential inode collision on FAT12 and FAT16Stefan Eßer2024-02-203-5/+24
FAT file systems do not use inodes, instead all file meta-information is stored in directory entries. FAT12 and FAT16 use a fixed size area for root directories, with typically 512 entries of 32 bytes each (for a total of 16 KB) on hard disk formats. The file system data is stored in clusters of typically 512 to 4096 bytes, depending on the size of the file system. The current code uses the offset of a DOS 8.3 style directory entry as a pseudo-inode, which leads to inode values of 0 to 16368 for typical root directories with 512 entries. Sub-directories use 2 cluster length plus the byte offset of the directory entry in the data area for the pseudo-inode, which may be as low as 1024 in case of 512 byte clusters. A sub-directory in cluster 2 and with 512 byte clusters will therefore lead to a re-use of inode 1024 when there are at least 32 DOS 8.3 style filenames in the root directory (or 11 14-character Windows long file names, each of which takes up 3 directory entries). FAT32 file systems are not affected by this issue and FAT12/FAT16 file systems with larger cluster sizes are unlikely to have as many directory entries in the root directory as are required to cause the collision. This commit leads to inode numbers that are guaranteed to not collide for all valid FAT12 and FAT16 file system parameters. It does also provide a small speed-up due to more efficient use of the vnode cache. Approved by: mckusick MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D43978
* unionfs: upgrade the vnode lock during fsync() if necessaryJason A. Harmening2024-02-182-1/+10
If the underlying upper FS supports shared locking for write ops, as is the case with ZFS, VOP_FSYNC() may only be called with the vnode lock held shared. In this case, temporarily upgrade the lock for those unionfs maintenance operations which require exclusive locking. While here, make unionfs inherit the upper FS' support for shared write locking. Since the upper FS is the target of VOP_GETWRITEMOUNT() this is what will dictate the locking behavior of any unionfs caller that uses vn_start_write() + vn_lktype_write(), so unionfs must be prepared for the caller to only hold a shared vnode lock in these cases. Found in local testing of unionfs atop ZFS with DEBUG_VFS_LOCKS. MFC after: 2 weeks Reviewed by: kib, olce Differential Revision: https://reviews.freebsd.org/D43817
* unionfs: cache upper/lower mount objectsJason A. Harmening2024-02-183-19/+24
Store the upper/lower FS mount objects in unionfs per-mount data and use these instead of the v_mount field of the upper/lower root vnodes. As described in the referenced PR, it is unsafe to access this field on the unionfs unmount path as ZFS rollback may have obliterated the v_mount field of the upper or lower root vnode. Use these stored objects to slightly simplify other code that needs access to the upper/lower mount objects as well. PR: 275870 Reported by: Karlo Miličević <karlo98.m@gmail.com> Tested by: Karlo Miličević <karlo98.m@gmail.com> Reviewed by: kib (prior version), olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D43815
* fusefs: only test for incoherency if FN_SIZECHANGE is setEmil Tsalapatis2024-02-091-2/+2
FUSE emits spurious incoherency warnings in writethrough mode. The warnings are triggered by setattr calls generated by vnode truncation turning the cached va_size vattr stale, causing comparisons with the fresh version provided by the server to fail. Only validate the vnode's va_size vattr if the FN_SIZECHANGE flag is set. This is a part of the research work at RCSLab, University of Waterloo. Reviewed by: asomers MFC after: 1 week Pull Request: https://github.com/freebsd/freebsd-src/pull/1110
* kcmp(2): implement for devfs filesKonstantin Belousov2024-01-241-0/+9
| | | | | | | | | | Compare not vnodes, which are different between mount points, but actual cdev referenced by the devfs node. Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
* kcmp(2): implement for vnode filesKonstantin Belousov2024-01-243-0/+11
Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
* sysctl vm.objects/vm.swap_objects: do not fill vnode info if jailedKonstantin Belousov2024-01-161-1/+5
Reported by: Shawn Webb via markj Reviewed by: jhb, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add vnode_pager_clean_{a,}sync(9)Konstantin Belousov2024-01-1116-118/+82
Bump __FreeBSD_version for ZFS use. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43356