aboutsummaryrefslogtreecommitdiff
path: root/sys/sys/buf.h
Commit message (Collapse)AuthorAgeFilesLines
* vnode: move write cluster support data to inodes.Konstantin Belousov2021-02-211-1/+15
| | | | | | | | | | | | The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679
* Remove #define _KERNEL hacks from libprocstatKonstantin Belousov2021-02-211-1/+2
| | | | | | | | | | | | | | | | | | Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in userspace, assuming that the consumer has an idea what it is for. Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h, sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the same caveat. Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h being unusable in userspace, where it override struct buf with its own definition. Instead, provide struct m_buf and struct m_vnode and adapt code to use local variants. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D28679
* buf SU hooks: track buf_start() calls with B_IOSTARTED flagKonstantin Belousov2021-02-121-5/+11
| | | | | | | | | | | | | | | | | and only call buf_complete() if previously started. Some error paths, like CoW failire, might skip buf_start() and do bufdone(), which itself call buf_complete(). Various SU handle_written_XXX() functions check that io was started and incomplete parts of the buffer data reverted before restoring them. This is a useful invariant that B_IO_STARTED on buffer layer allows to keep instead of changing check and panic into check and return. Reported by: pho Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundations
* Make MAXPHYS tunable. Bump MAXPHYS to 1M.Konstantin Belousov2020-11-281-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225 Notes: svn path=/head/; revision=368124
* vmapbuf: don't smuggle address or length in bufBrooks Davis2020-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784 Notes: svn path=/head/; revision=366911
* Use unlocked page lookup for inmem() to avoid object lock contentionBryan Drewery2020-10-091-0/+1
| | | | | | | | | | Reviewed By: kib, markj Submitted by: mlaier Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D26653 Notes: svn path=/head/; revision=366594
* Add unlocked/SMR fast path to getblk()Conrad Meyer2020-07-241-0/+4
| | | | | | | | | | | | | | | | Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked() API wrapping this functionality. Use it for a fast path in getblkx(), falling back to locked lookup if we raced a thread changing the buf's identity. Reported by: Attilio Reviewed by: kib, markj Testing: pho (in progress) Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D25782 Notes: svn path=/head/; revision=363482
* This commit enables a UFS filesystem to do a forcible unmount whenChuck Silvers2020-05-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088 Notes: svn path=/head/; revision=361491
* sys/vm: quiet -Wwrite-stringsRyan Libby2020-02-231-1/+1
| | | | | | | | | Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796 Notes: svn path=/head/; revision=358256
* Currently the breadn_flags() and getblkx() interfaces are passedKirk McKusick2019-12-031-7/+8
| | | | | | | | | | | | | | | | | | the vnode, logical block number, and size of data block that is being requested. They then use the VOP_BMAP function to calculate the mapping from logical block number to physical block number from which to access the data. This change expands the interface to also pass the physical block number in cases where the VOP_MAP function may no longer work, for example when a file is being truncated. No functional change. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix Notes: svn path=/head/; revision=355371
* White space cleanup. No functional change.Kirk McKusick2019-11-201-2/+2
| | | | | | | Sponsored by: Netflix Notes: svn path=/head/; revision=354873
* Args for buf_track() might be unusedRavi Pokala2019-10-251-1/+1
| | | | | | | | | | | | | | If neither FULL_BUF_TRACKING nor BUF_TRACKING are defined, then the body of buf_track() becomes empty. Mark the arguments with "__unused" so the compiler doesn't complain about unused arguments in that case. Reported by: Bruce Leverett (Panasas) Reviewed by: cem (on IRC) MFC after: 1 month Sponsored by: Panasas Notes: svn path=/head/; revision=354102
* (1/6) Replace busy checks with acquires where it is trival to do so.Jeff Roberson2019-10-151-1/+2
| | | | | | | | | | | | | | This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548 Notes: svn path=/head/; revision=353535
* buf: Add B_INVALONERR flag to discard dataConrad Meyer2019-09-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | Setting the B_INVALONERR flag before a synchronous write causes the buf cache to forcibly invalidate contents if the write fails (BIO_ERROR). This is intended to be used to allow layers above the buffer cache to make more informed decisions about when discarding dirty buffers without successful write is acceptable. As a proof of concept, use in msdosfs to handle failures to mark the on-disk 'dirty' bit during rw mount or ro->rw update. Extending this to other filesystems is left as future work. PR: 210316 Reviewed by: kib (with objections) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21539 Notes: svn path=/head/; revision=352233
* Remove long-dead BUF_ASSERT_{,UN}HELD assertionsConrad Meyer2019-09-051-4/+0
| | | | | | | | | | These were fully neutered in r177676 (2008), but not removed at the time for unclear reasons. They're totally dead code, so go ahead and yank them now. No functional change. Notes: svn path=/head/; revision=351899
* Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.Gleb Smirnoff2019-01-151-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho Notes: svn path=/head/; revision=343030
* Detect and optimize reads from the hole on UFS.Konstantin Belousov2018-05-131-0/+3
| | | | | | | | | | | | | | | | | | | | | - Create getblkx(9) variant of getblk(9) which can return error. - Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP was performed before the buffer is created, and EJUSTRETURN returned in case the requested block does not exist. - Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and allocating the pages for it), copying from zero_region instead. The end result is less page allocations and buffer recycling when a hole is read, which is important for some benchmarks. Requested and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14917 Notes: svn path=/head/; revision=333576
* Further parallelize the buffer cache.Jeff Roberson2018-02-201-3/+5
| | | | | | | | | | | | | | | | | | | | | | Provide multiple clean queues partitioned into 'domains'. Each domain manages its own bufspace and has its own bufspace daemon. Each domain has a set of subqueues indexed by the current cpuid to reduce lock contention on the cleanq. Refine the sleep/wakeup around the bufspace daemon to use atomics as much as possible. Add a B_REUSE flag that is used to requeue bufs during the scan to approximate LRU rather than locking the queue on every use of a frequently accessed buf. Implement bufspace_reserve with only atomic_fetchadd to avoid loop restarts. Reviewed by: markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14274 Notes: svn path=/head/; revision=329612
* Merge biodone_finish() back into biodone(). The primary purpose isKirk McKusick2018-02-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to make the order of operations clearer to avoid the race condition that was fixed in r328914. In particular, this commit corrects a similar race that existed in the soft updates callback. Doing some sleuthing through the SVN repository, it appears that bufdone_finish() was added to support XFS: ------------------------------------------------------------------------ r153192 | rodrigc | 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) | 13 lines Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@ ------------------------------------------------------------------------ It does not appear to ever have been used for anything else. XFS was disconnected in r241607: ------------------------------------------------------------------------ r241607 | attilio | 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) | 5 lines Disconnect non-MPSAFE XFS from the build in preparation for dropping GIANT from VFS. This is not targeted for MFC. ------------------------------------------------------------------------ and removed entirely in r247631: ------------------------------------------------------------------------ r247631 | attilio | 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) | 5 lines Garbage collect XFS bits which are now already completely disconnected from the tree since few months. This is not targeted for MFC. ------------------------------------------------------------------------ Since XFS support is gone, there is no reason to retain biodone_finish(). Suggested by: Warner Losh (imp) Discussed with: cem, kib Tested by: Peter Holm (pho) Notes: svn path=/head/; revision=329078
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Continuing efforts to provide hardening of FFS, this change adds aKirk McKusick2017-09-221-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho) Notes: svn path=/head/; revision=323923
* Add the definition of maxbcachebuf to sys/buf.h.Rick Macklem2017-06-191-0/+1
| | | | | | | | | | | | | r320070 removed the definition of maxbcachebuf from sys/param.h to fix the build for arm. This patch adds the definition of maxbcachebuf to sys/buf.h, which should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S. Suggested by: kib MFC after: 2 weeks Notes: svn path=/head/; revision=320126
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Release laundered vnode pages to the head of the inactive queue.Mark Johnston2016-11-231-1/+3
| | | | | | | | | | | | | | | | | The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589 Notes: svn path=/head/; revision=309062
* Add BUF_TRACKING and FULL_BUF_TRACKING buffer debuggingConrad Meyer2016-10-311-0/+20
| | | | | | | | | | | | | | | | | | Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366 Notes: svn path=/head/; revision=308155
* Generalize UFS buffer pager to allow it serving other filesystemsKonstantin Belousov2016-10-281-0/+7
| | | | | | | | | | | | | | | which also use buffer cache. Most important addition to the code is the handling of filesystems where the block size is less than the machine page size, which might require reading several buffers to validate single page. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=308026
* Remove prototypes missed in r303951.Mark Johnston2016-08-161-3/+0
| | | | Notes: svn path=/head/; revision=304235
* Remove b_pin_count from struct buf.Mark Johnston2016-08-111-1/+0
| | | | | | | | | | | | | It was added in r153192 for XFS and doesn't appear to have been used for anything else. XFS was disconnected in r241607 and removed entirely in r247631. Reported by: mlaier Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D7468 Notes: svn path=/head/; revision=303951
* Remove lockmgr_waiters(9) and BUF_LOCKWAITERS(9); they were not usedEdward Tomasz Napierala2016-08-051-6/+0
| | | | | | | | | | | for anything. Reviewed by: kib@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7420 Notes: svn path=/head/; revision=303767
* sys/sys: minor spelling fixes.Pedro F. Giffuni2016-05-031-1/+1
| | | | | | | | | While the changes are minor, these headers are very visible. MFC after: 2 weeks Notes: svn path=/head/; revision=298981
* PRINT_BUF_FLAGS: Remove removed DIRTY/PERSIST flagsConrad Meyer2016-04-291-2/+2
| | | | | | | | | | | | This is a follow-up to r298789, which removed the B_DIRTY and B_PERSISTENT flags. This changeset removes them from the associated %b bit description string as well. Reviewed by: pfg Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=298791
* bufs: make B_DIRTY and B_PERSISTENT flags availablePedro F. Giffuni2016-04-291-2/+2
| | | | | | | | | | It appears these flags were related to ext2fs but are completely unused nowadays. Retire them. Suggested by: mckusick Notes: svn path=/head/; revision=298789
* Bump bio_cmd and bio_*flags from 8 bits to 16.Warner Losh2016-04-141-2/+2
| | | | | | | Differential Revision: https://reviews.freebsd.org/D5784 Notes: svn path=/head/; revision=297955
* Add comment about where b_iocmd and b_ioflags come from.Warner Losh2016-04-141-2/+2
| | | | Notes: svn path=/head/; revision=297951
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().Gleb Smirnoff2015-12-161-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=292373
* Tweak some unused field defines to have the correct number of zeroes.Benno Rice2015-12-041-3/+3
| | | | | | | | | Reviewed by: jhb Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4365 Notes: svn path=/head/; revision=291728
* As a step towards the elimination of PG_CACHED pages, rework the handlingMark Johnston2015-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726 Notes: svn path=/head/; revision=288431
* - Make 'struct buf *buf' private to vfs_bio.c. Having a global variableJeff Roberson2015-07-291-2/+1
| | | | | | | | | | | | | | | | | 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285993
* Refactor unmapped buffer address handling.Jeff Roberson2015-07-231-15/+16
| | | | | | | | | | | | | | | | | | | | - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285819
* Handle errors from background write of the cylinder group blocks.Konstantin Belousov2015-06-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, on the write error, bufdone() call from ffs_backgroundwrite() panics because pbrelvp() cleared bp->b_bufobj, while brelse() would try to re-dirty the copy of the cg buffer. Handle this by setting B_INVAL for the case of BIO_ERROR. Second, we must re-dirty the real buffer containing the cylinder group block data when background write failed. Real cg buffer was already marked clean in ffs_bufwrite(). After the BV_BKGRDINPROG flag is cleared on the real cg buffer in ffs_backgroundwrite(), buffer scan may reuse the buffer at any moment. The result is lost write, and if the write error was only transient, we get corrupted bitmaps. We cannot re-dirty the original cg buffer in the ffs_backgroundwritedone(), since the context is not sleepable, preventing us from sleeping for origbp' lock. Add BV_BKGDERR flag (protected by the buffer object lock), which is converted into delayed write by brelse(), bqrelse() and buffer scan. In collaboration with: Conrad Meyer <cse.cem@gmail.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation (kib), EMC/Isilon storage division (Conrad) MFC after: 2 weeks Notes: svn path=/head/; revision=284887
* Add B_KVAALLOC and B_UNMAPPED to the buf flag name list.Mark Johnston2015-04-071-2/+2
| | | | | | | | | Differential Revision: https://reviews.freebsd.org/D1895 Submitted by: Conrad Meyer MFC after: 1 week Notes: svn path=/head/; revision=281225
* - In vnode_pager_generic_getpages() use different free counters forGleb Smirnoff2015-03-061-0/+2
| | | | | | | | | | | | | | | synchronous and asynchronous requests. The latter can saturate the I/O and we do not want them to affect regular paging. - Allocate the pbuf at the very beginning of the function, so that if we are low on certain kind of pbufs don't even proceed to BMAP, but sleep. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=279688
* Merge from projects/sendfile:Gleb Smirnoff2014-11-231-4/+9
| | | | | | | | | | | | | | | | | o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274914
* Fix typo in flag name.Warner Losh2014-07-071-1/+1
| | | | Notes: svn path=/head/; revision=268375
* Naughty NANDFS was using hidden unused flag, hiding the fact that theWarner Losh2014-07-071-1/+1
| | | | | | | | | flag was used and wasn't really available. Change the name without fixing any laying issues that might be present in NANDFS' use of this flag. Notes: svn path=/head/; revision=268374
* Both cluster_rbuild() and cluster_wbuild() sometimes set the pagesKonstantin Belousov2013-08-221-0/+1
| | | | | | | | | | | | | | | | shared busy without first draining the hard busy state. Previously it went unnoticed since VPO_BUSY and m->busy fields were distinct, and vm_page_io_start() did not verified that the passed page has VPO_BUSY flag cleared, but such page state is wrong. New implementation is more strict and catched this case. Drain the busy state as needed, before calling vm_page_sbusy(). Tested by: pho, jkim Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=254668
* - Convert the bufobj lock to rwlock.Jeff Roberson2013-05-311-6/+5
| | | | | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf Notes: svn path=/head/; revision=251171
* - Add a new general purpose path-compressed radix trie which can be usedJeff Roberson2013-05-121-2/+0
| | | | | | | | | | | | | | with any structure containing a uint64_t index. The tree code auto-generates type safe wrappers. - Eliminate the buf splay and replace it with pctrie. This is not only significantly faster with large files but also allows for the possibility of shared locking. Reviewed by: alc, attilio Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=250551
* Do not remap usermode pages into KVA for physio.Konstantin Belousov2013-03-191-1/+1
| | | | | | | | Sponsored by: The FreeBSD Foundation Tested by: pho Notes: svn path=/head/; revision=248515
* Add a helper function vfs_bio_bzero_buf() to zero the portion of theKonstantin Belousov2013-03-191-0/+1
| | | | | | | | | | | | buffer, transparently handling mapped or unmapped buffers. Its intent is to replace the use of bzero(bp->b_data) in cases where the buffer might be unmapped, to avoid unneeded upgrades. Sponsored by: The FreeBSD Foundation Tested by: pho Notes: svn path=/head/; revision=248510