aboutsummaryrefslogtreecommitdiff
path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
* Fix bug 253158 - Panic: snapacct_ufs2: bad block - mksnap_ffs(8) crashKirk McKusick2021-02-121-67/+70
| | | | | | | | | | | | | | | | | | | | | | | The panic reported in 253158 arises because the /mnt/.snap/.factory snapshot allocated the last block in the filesystem. The snapshot code allocates the last block in the filesystem as a way of setting its length to be the size of the filesystem. Part of taking a snapshot is to remove all the earlier snapshots from the image of the newest snapshot so that newer snapshots will not claim the blocks of the earlier snapshots. The panic occurs when the new snapshot finds that both it and an earlier snapshot claim the same block. The fix is to set the size of the snapshot to be one block after the last block in the filesystem. This block can never be allocated since it is not a valid block in the filesystem. This extra block is used as a place to store the initial list of blocks that the snapshot has already copied and is used to avoid a deadlock in and speed up the ffs_copyonwrite() function. Reported by: Harald Schmalzbauer Tested by: Peter Holm PR: 253158 Sponsored by: Netflix
* fifo: minor comment and assert improvements.Konstantin Belousov2021-02-122-4/+6
| | | | | | | | | | In particular, replace a note that reload through vget() is obsoleted, with explanation why this code is required. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_unlock: assert that IN_ENDOFF is not leaked past locked scopeKonstantin Belousov2021-02-121-0/+3
| | | | | | | | | | This catches both missed processing of IN_ENDOFF and missed application of VOP_VPUT_PAIR() after VOP that created an entry in the directory. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs softdep: Force processing of VI_OWEINACT vnodes when there is inode shortageKonstantin Belousov2021-02-122-0/+63
| | | | | | | | | | Such vnodes prevent inode reuse, and should be force-cleared when ffs_valloc() is unable to find a free inode. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* softdep_request_cleanup: wait for softdep_request_clean_flush() to passKonstantin Belousov2021-02-121-0/+6
| | | | | | | | | | | if we noted a parallel request is active and declined to overflow the system with parallel redundant sync of the vnodes. But we need to wait for the flush to finish to see if there are any freed resources. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_inactive(): stop hiding ERELOOKUP from ffs_truncate(), return it.Konstantin Belousov2021-02-122-6/+5
| | | | | | | | | | VFS should retry inactivation when possible, then. This should provide timely removal of unlinked unreferenced inodes. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Stop ignoring ERELOOKUP from VOP_INACTIVE()Konstantin Belousov2021-02-123-16/+42
| | | | | | | | | | | | | When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs vnops: brace softdep_prelink() with DOINGSUJ instead of DOINGSOFTDEPKonstantin Belousov2021-02-121-6/+6
| | | | | | | | | | | | because softdep_prelink() is reverted to NOP for non-J case. There is no need to do anything before ufs_direnter() in SU/non-J case, everything required to sync the directory is done in VOP_VPUT_PAIR(). Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 week Sponsored by: The FreeBSD Foundation
* ffs softdep: remove will_direnter argument of softdep_prelink()Konstantin Belousov2021-02-123-45/+15
| | | | | | | | | | | | | | | | | | | Originally this was done in 8a1509e442bc9a075 to forcibly cover cases where a hole in the directory could be created by extending into indirect block, since dependency of writing out indirect block is not tracked. This results in excessive amount of fsyncing the directories, where all creation of new entry forced fsync before it. This is not needed, it is enough to fsync when IN_NEEDSYNC is set, and VOP_VPUT_PAIR() provides the required hook to only perform required syncing. The series of changes culminating in this commit puts the performance of metadata-intensive loads back to that before 8a1509e442bc9a075. Analyzed by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_direnter: directory truncation does not need special case for renameKonstantin Belousov2021-02-124-26/+23
| | | | | | | | | | | | | | | | | | | In ufs_rename case, tdvp is locked from the place where ufs_direnter() is done till VOP_VPUT_PAIR(), which means that we no longer need to specially handle rename in ufs_direnter(). Truncation, if possible, is done in the same way in ffs_vput_pair() both for rename and other VOPs calling ufs_direnter(). Remove isrename argument and set IN_ENDOFF if ufs_direnter() succeeded and directory needs truncation. In ffs_vput_pair(), stop verifying the condition that directory needs truncation when IN_ENDOFF is set, instead assert that the condition is true. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_rename: use VOP_VPUT_PAIR and rely on directory sync/truncation thereKonstantin Belousov2021-02-121-28/+6
| | | | | | | | Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_direnter: move directory truncation to ffs_vput_pair().Konstantin Belousov2021-02-123-25/+46
| | | | | | | | | | | | | | | | VOP_VPUT_PAIR() provides the hook to do the truncation right before unlock, which is required since truncation might need to fsync(), which itself might unlock the directory vnode. Set new flag IN_ENDOFF which indicates that i_endoff is valid and should be checked against inode size. Excessive size is chomped, but this operation is advisory and failure to truncate should not result in the failure of the main VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_vput_pair(): try harder to recover from the vnode reclaimKonstantin Belousov2021-02-121-3/+36
| | | | | | | | | | | | | | | | | In particular, if unlock_vp is false, save vp's inode number and generation. If ffs_inotovp() can re-create the vnode with the same number and generation after we finished with handling dvp, then we most likely raced with unmount, and were able to restore atomicity of open. We use FFSV_REPLACE_DOOMED there, to drop the old vnode. This additional recovery is not strictly required, but it improves the quality of the implementation. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* FFS: implement special VOP_VPUT_PAIR().Konstantin Belousov2021-02-121-0/+55
| | | | | | | | | | | | It cleans IN_NEEDSYNC flag on dvp before returning, by applying ffs_syncvnode() until success or an error different from ERELOOKUP. IN_NEEDSYNC cleanup is required to avoid creating holes in the directories when extended into indirect block. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* nfsserver: use VOP_VPUT_PAIR().Konstantin Belousov2021-02-121-16/+22
| | | | | | | | | | Apply VOP_VPUT_PAIR() to the end of vnode operations after the VOP_MKNOD(), VOP_MKDIR(), VOP_LINK(), VOP_SYMLINK(), VOP_CREATE(). Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_snapshot: use VOP_VPUT_PAIR after VOP_CREATE.Konstantin Belousov2021-02-121-2/+7
| | | | | | | | | If the snapshot embrio was reclaimed under us, return error outright. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Use VOP_VPUT_PAIR() for eligible VFS syscalls.Konstantin Belousov2021-02-123-21/+18
| | | | | | | | | | | The current list is limited to the cases where UFS needs to handle vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(), VOP_LINK(), and VOP_SYMLINK(). Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* nullfs: provide special bypass for VOP_VPUT_PAIRKonstantin Belousov2021-02-121-0/+49
| | | | | | | | | Generic bypass cannot understand the rules of liveness for the VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Add VOP_VPUT_PAIR() with trivial default implementation.Konstantin Belousov2021-02-122-0/+24
| | | | | | | | | | | | | | | | | | | | | | The VOP is intended to be used in situations where VFS has two referenced locked vnodes, typically a directory vnode dvp and a vnode vp that is linked from the directory, and at least dvp is vput(9)ed. The child vnode can be also vput-ed, but optionally left referenced and locked. There, at least UFS may need to do some actions with dvp which cannot be done while vp is also locked, so its lock might be dropped temporary. For instance, in some cases UFS needs to sync dvp to avoid filesystem state that is currently not handled by either kernel nor fsck. Having such VOP provides the neccessary context for filesystem which can do correct locking and handle potential reclamation of vp after relock. Trivial implementation does vput(dvp) and optionally vput(vp). Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* vn_open(): If the vnode is reclaimed during open(2), do not return error.Konstantin Belousov2021-02-122-4/+9
| | | | | | | | | | | | | | Most future operations on the returned file descriptor will fail anyway, and application should be ready to handle that failures. Not forcing it to understand the transient failure mode on open, which is implementation-specific, should make us less special without loss of reporting of errors. Suggested by: chs Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ufs_direnter/SU: unconditionally UFS_UPDATE inode when extending directoryKonstantin Belousov2021-02-121-3/+1
| | | | | | | | | | for all kinds of async/SU mount variants. Submitted by: mckusick Reviewed by: chs Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_syncvnode: only clear IN_NEEDSYNC after successfull syncKonstantin Belousov2021-02-121-1/+2
| | | | | | | | | | If it is cleaned before the sync, other threads might see the inode without the flag set, because syncing could unlock it. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Merge ufs_fhtovp() into ffs_inotovp().Konstantin Belousov2021-02-123-30/+17
| | | | | | | | | | The function alone was not used for anything but ffs_fstovp() for long time. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs_inotovp(): interface to convert (ino, gen) into alive vnodeKonstantin Belousov2021-02-124-29/+40
| | | | | | | | | | | | It generalizes the VFS_FHTOVP() interface, making it possible to fetch the inode without faking filehandle. Also it adds the ffs flags argument which allows to control ffs_vgetf() call. Requested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs: Add FFSV_REPLACE_DOOMED flag to ffs_vgetf()Konstantin Belousov2021-02-122-4/+8
| | | | | | | | | | | | It specifies that caller requests a fresh non-doomed vnode. If doomed vnode is found in the hash, it should behave similarly to FFSV_REPLACE. Or, to put it differently, the flag is same as FFSV_REPLACE, but only when the found hashed vnode is doomed. Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* ffs: call ufsdirhash_dirtrunc() right after setting directory sizeKonstantin Belousov2021-02-123-6/+13
| | | | | | | | | | | Later processing of ffs_truncate() might temporary unlock the directory vnode, causing unsychronized dirhash and inode sizes if update is postponed to UFS_TRUNCATE() callers. Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* buf SU hooks: track buf_start() calls with B_IOSTARTED flagKonstantin Belousov2021-02-123-7/+27
| | | | | | | | | | | | | | | | | and only call buf_complete() if previously started. Some error paths, like CoW failire, might skip buf_start() and do bufdone(), which itself call buf_complete(). Various SU handle_written_XXX() functions check that io was started and incomplete parts of the buffer data reverted before restoring them. This is a useful invariant that B_IO_STARTED on buffer layer allows to keep instead of changing check and panic into check and return. Reported by: pho Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundations
* ffs_vnops.c: Move opt_*.h includes to the top.Konstantin Belousov2021-02-121-2/+3
| | | | | | | | | | as it is done in other places. Header files might need options defined for correct operation. Reviewed by: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix blackhole/reject routes.Alexander V. Chernikov2021-02-111-2/+56
| | | | | | | | | | | | | | | | | | | Traditionally *BSD routing stack required to supply some interface data for blackhole/reject routes. This lead to varieties of hacks in routing daemons when inserting such routes. With the recent routeing stack changes, gateway sockaddr without RTF_GATEWAY started to be treated differently, purely as link identifier. This change broke net/bird, which installs blackhole routes with 127.0.0.1 gateway without RTF_GATEWAY flags. Fix this by automatically constructing necessary gateway data at rtsock level if RTF_REJECT/RTF_BLACKHOLE is set. Reported by: Marek Zarychta <zarychtam at plan-b.pwste.edu.pl> Reviewed by: donner MFC after: 1 week
* cam: Properly find the sim in the assertion in xpt_pollwait().John Baldwin2021-02-111-1/+2
| | | | | | | | I had missed merging this fixup into 447b3557a9cc5f00a301be8404339f21a9a0faa8 before pushing it. Pointy hat to: jhb MFC after: 2 weeks
* iscsi: Mark iSCSI CAM sims as non-pollable.John Baldwin2021-02-111-9/+1
| | | | | | | | | | | | | Previously, iscsi_poll() just panicked. This meant if you got a panic on a box when using the iSCSI initiator, the attempt to shutdown would trigger a nested panic and never write out a core. Now, CCB's sent to iSCSI devices (such as the sychronize-cache request in dashutdown()) just fail with a timeout during a panic shutdown. Reviewed by: scottl, mav MFC after: 2 weeks Sponsored by: Chelsio Differential Revision: https://reviews.freebsd.org/D28455
* cam: Don't permit crashdumps on non-pollable devices.John Baldwin2021-02-113-4/+7
| | | | | | | | | | | If a disk's SIM doesn't support polling, then it can't be used to store crashdumps. Leave d_dump NULL in that case so that dumpon(8) fails gracefully rather than having dumps fail at crash time. Reviewed by: scottl, mav, imp MFC after: 2 weeks Sponsored by: Chelsio Differential Revision: https://reviews.freebsd.org/D28454
* cam: Permit non-pollable sims.John Baldwin2021-02-113-1/+15
| | | | | | | | | | | | | | Some CAM sim drivers do not support polling (notably iscsi(4)). Rather than using a no-op poll routine that always times out requests, permit a SIM to set a NULL poll callback. cam_periph_runccb() will fail polled requests non-pollable sims immediately as if they had timed out. Reviewed by: scottl, mav (earlier version) Reviewed by: imp MFC after: 2 weeks Sponsored by: Chelsio Differential Revision: https://reviews.freebsd.org/D28453
* Widen ifnet_detach_sxlock coverageKristof Provost2021-02-113-7/+11
| | | | | | | | | Widen the ifnet_detach_sxlock to cover the entire vnet sysuninit code. This ensures that we can't end up having the vnet_sysuninit free the UDP pcb while the detach code is running and trying to purge the UDP pcb. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28530
* mlx4, mthca: Silence warnings about no-op alignment operationsMark Johnston2021-02-114-9/+13
| | | | | | | | | | | | | Since commit 8fa6abb6f4f64f ("Expose clang's alignment builtins and use them for roundup2/rounddown2"), clang emits warnings for several alignment operations in these drivers because the operation is a no-op. The compiler is arguably being too strict here, but in the meantime let's silence the warnings by conditionally compiling the alignment operations. Reviewed by: arichardson, hselasky MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28576
* [udp] fix possible mbuf and lock leak in udp_input().Andrey V. Elsukov2021-02-111-5/+8
| | | | | | | | | | | | | | In error case we can leave `inp' locked, also we need to free mbuf chain `m' in the same case. Release the lock and use `badunlocked' label to exit with freed mbuf. Also modify UDP error statistic to match the IPv6 code. Remove redundant INP_RUNLOCK() from the `if (last == NULL)' block, there are no ways to reach this point with locked `inp'. Obtained from: Yandex LLC MFC after: 3 days Sponsored by: Yandex LLC
* [udp6] fix possible panic due to lack of locking.Andrey V. Elsukov2021-02-111-33/+28
| | | | | | | | | | | | | | | | | The lookup for a IPv6 multicast addresses corresponding to the destination address in the datagram is protected by the NET_EPOCH section. Access to each PCB is protected by INP_RLOCK during comparing. But access to socket's so_options field is not protected. And in some cases it is possible, that PCB pointer is still valid, but inp_socket is not. The patch wides lock holding to protect access to inp_socket. It copies locking strategy from IPv4 UDP handling. PR: 232192 Obtained from: Yandex LLC MFC after: 3 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D28232
* arm64: if_dwc is also needed by intel stratix10 platformEmmanuel Vadot2021-02-101-2/+3
| | | | MFC after: 3 days
* arm64: Add a SOC_BRCM_NS2 optionEmmanuel Vadot2021-02-103-3/+5
| | | | | | | | Only compile files needed for this platform if the option is enabled in the kernel config file. Add the option to GENERIC. MFC after: 3 days
* arm64: Make thunderx vnic file depend on soc_cavm_thunderxEmmanuel Vadot2021-02-101-4/+4
| | | | MFC after: 3 days
* arm64: Order sys/conf/files.arm64Emmanuel Vadot2021-02-101-420/+474
| | | | | | | | This is now easier to read and see what's compiled-in No functional changes intended. MFC after: 3 days
* netgraph/ng_bridge: Add counters for the first link, tooLutz Donnerhacke2021-02-101-21/+38
| | | | | | | | | | | For broadcast, multicast and unknown unicast, the replication loop sends a copy of the packet to each link, beside the first one. This special path is handled later, but the counters are not updated. Factor out the common send and count actions as a function. Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28537
* vm: Honour the "noreuse" flag to vm_page_unwire_managed()Mark Johnston2021-02-101-1/+1
| | | | | | | | | | | | | | | | | | This flag indicates that the page should be enqueued near the head of the inactive queue, skipping the LRU queue. It is used when unwiring pages from the buffer cache following direct I/O or after I/O when POSIX_FADV_NOREUSE or _DONTNEED advice was specified, or when sendfile(SF_NOCACHE) completes. For the direct I/O and sendfile cases we only enqueue the page if we decide not to free it, typically because it's mapped. Pass "noreuse" through to vm_page_release_toq() so that we actually honour the desired LRU policy for these scenarios. Reported by: bdrewery Reviewed by: alc, kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28555
* Fix non-IPv6 build post 57785538c6e0d7e8ca0f161ab95bae10fd304047.Cy Schubert2021-02-101-4/+0
| | | | | | | | | | | | 57785538c6e0d7e8ca0f161ab95bae10fd304047 change the test for FreeBSD from __FreeBSD_version to __FreeBSD__. However this test was performed before sys/param.h was included, therefore __FreeBSD_version was never defined. As the test was never true opt_random_ip_id.h was never included. Submitted by: bdragon Reported by: bdragon MFC after: 1 week X-MFC with: 57785538c6e0d7e8ca0f161ab95bae10fd304047
* netgraph/ng_bridge: Document staleness in multithreaded operationLutz Donnerhacke2021-02-091-1/+4
| | | | | | | | | | | | In the data path of ng_bridge(4), the only value of the host struct, which needs to be modified, is the staleness, which is reset every time a frame is received. It's save to leave the code as it is. This patch is part of a series to make ng_bridge(4) multithreaded. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28546
* netgraph/ng_bridge: Merge internal structuresLutz Donnerhacke2021-02-092-48/+44
| | | | | | | | | | | | | | | | In a earlier version of ng_bridge(4) the exernal visible host entry structure was a strict subset of the internal one. So internal view was a direct annotation of the external structure. This strict inheritance was lost many versions ago. There is no need to encapsulate a part of the internal represntation as a separate structure. This patch is a preparation to make the internal structure read only in the data path in order to make ng_bridge(4) multithreaded. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28545
* Set file mode during zfs_writeAntonio Russo2021-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apply https://github.com/openzfs/zfs/pull/11576 Direct commit from upstream openzfs. Full commit message below: Set file mode during zfs_write 3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11474 Closes #11576 Obtained from: openzfs/zfs@f8ce8aed0 MFC after: 0 days Sponsored by: iXsystems, Inc.
* cache: assorted comment fixupsMateusz Guzik2021-02-091-7/+12
|
* Revert "amd64: implement strlen in assembly"Mateusz Guzik2021-02-092-66/+1
| | | | | | | | | | | This reverts commit af366d353b84bdc4e730f0fc563853abc338271c. Trips over '\xa4' byte and terminates early, as found in lib/libc/gen/setdomainname_test:setdomainname_basic testcase However, keep moving libkern/strlen.c out of conf/files. Reported by: lwhsu
* arm32: Align arguments of sync_icache() syscall to cacheline size.Michal Meloun2021-02-091-6/+3
| | | | | Otherwise, we may miss synchronization of the last cacheline. MFC after: 3 days