aboutsummaryrefslogtreecommitdiff
path: root/sys/sys/buf.h
Commit message (Collapse)AuthorAgeFilesLines
* Convert lockmgr locks from using simple locks to using mutexes.Jason Evans2000-10-041-4/+11
| | | | | | | | | Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed. Notes: svn path=/head/; revision=66615
* Major update to the way synchronization is done in the kernel. HighlightsJason Evans2000-09-071-1/+1
| | | | | | | | | | | | | | | | | | include: * Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) * Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh Notes: svn path=/head/; revision=65557
* Disable LK_CANRECURSE on buffer locks. The recusion is needed only forKirk McKusick2000-07-261-1/+1
| | | | | | | | | certain uses of snapshots and currently appears to be causing some other problems. So for now, I am reverting to the old semantics until I have had time to investigate what is causing the other problems. Notes: svn path=/head/; revision=63898
* This patch corrects the first round of panics and hangs reportedKirk McKusick2000-07-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance. Notes: svn path=/head/; revision=63788
* Add snapshots to the fast filesystem. Most of the changes supportKirk McKusick2000-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words). Notes: svn path=/head/; revision=62976
* Virtualizes & untangles the bioops operations vector.Poul-Henning Kamp2000-06-161-2/+37
| | | | | | | Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@ Notes: svn path=/head/; revision=61724
* Back out the previous change to the queue(3) interface.Jake Burkholder2000-05-261-10/+10
| | | | | | | | | It was not discussed and should probably not happen. Requested by: msmith and others Notes: svn path=/head/; revision=60938
* Change the way that the queue(3) structures are declared; don't assume thatJake Burkholder2000-05-231-10/+10
| | | | | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd Notes: svn path=/head/; revision=60833
* Separate the struct bio related stuff out of <sys/buf.h> intoPoul-Henning Kamp2000-05-051-117/+0
| | | | | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter Notes: svn path=/head/; revision=60041
* Convert the vm_pager_strategy() interface to take a struct bio instead ofPoul-Henning Kamp2000-05-031-9/+7
| | | | | | | | | a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already. Notes: svn path=/head/; revision=59915
* Give struct bio it's own call back mechanism.Poul-Henning Kamp2000-05-011-6/+11
| | | | Notes: svn path=/head/; revision=59840
* s/biowait/bufwait/gPoul-Henning Kamp2000-04-291-1/+1
| | | | | | | Prodded by: several. Notes: svn path=/head/; revision=59762
* Clone the {b|bio}_offset field, and make sure it is always initializedPoul-Henning Kamp2000-04-251-1/+1
| | | | | | | | | | in struct bio. Eventually, bio_offset will probably obsolete the bio_blkno and bio_pblkno fields. Remove the special hack in atapi-cd.c to determine of bio_offset was valid. Notes: svn path=/head/; revision=59623
* Don't declare common variables in include files:Poul-Henning Kamp2000-04-181-1/+1
| | | | | | | move buftimelock til vfs_bio.c where it is initialized. Notes: svn path=/head/; revision=59358
* Complete the bio/buf divorce for all code below devfs::strategyPoul-Henning Kamp2000-04-151-1/+10
| | | | | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS Notes: svn path=/head/; revision=59249
* Clone bio versions of certain bits of infrastructure:Poul-Henning Kamp2000-04-021-6/+61
| | | | | | | | | | | | | | | | devstat_end_transaction_bio() bioq_* versions of bufq_* incl bioqdisksort() the corresponding "buf" versions will disappear when no longer used. Move b_offset, b_data and b_bcount to struct bio. Add BIO_FORMAT as a hack for fd.c etc. We are now largely ready to start converting drivers to use struct bio instead of struct buf. Notes: svn path=/head/; revision=58942
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.Poul-Henning Kamp2000-04-021-31/+36
| | | | | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde. Notes: svn path=/head/; revision=58934
* Draw the outline of "struct bio".Poul-Henning Kamp2000-04-021-11/+33
| | | | | | | Struct bio is the future carrier of I/O requests for "struct buf". Notes: svn path=/head/; revision=58926
* Change the write-behind code to take more care when startingMatthew Dillon2000-04-021-1/+1
| | | | | | | | | | | | | | | | | async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick Notes: svn path=/head/; revision=58909
* Rename the existing BUF_STRATEGY() to DEV_STRATEGY()Poul-Henning Kamp2000-03-201-0/+3
| | | | | | | | | | | substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts. Notes: svn path=/head/; revision=58349
* Remove B_READ, B_WRITE and B_FREEBUF and replace them with a newPoul-Henning Kamp2000-03-201-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful. Notes: svn path=/head/; revision=58345
* Several performance improvements for soft updates have been added:Kirk McKusick2000-01-101-1/+6
| | | | | | | | | | | | | | | | | | 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them. Notes: svn path=/head/; revision=55697
* Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"Peter Wemm1999-12-291-6/+6
| | | | | | | | | is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come. Notes: svn path=/head/; revision=55205
* Prettyness police: Identify flags in b_xflags with BX_ to distinguishKirk McKusick1999-12-221-2/+2
| | | | | | | them from flags in b_flags which are prefixed with B_ Notes: svn path=/head/; revision=54989
* Synopsis of problem being fixed: Dan Nelson originally reported thatMatthew Dillon1999-12-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com> Notes: svn path=/head/; revision=54480
* Remove the B_BAD buffer flag, it is no longer used.Poul-Henning Kamp1999-12-101-1/+1
| | | | Notes: svn path=/head/; revision=54388
* "b_unused1" was.Poul-Henning Kamp1999-11-171-3/+2
| | | | | | | | | Fix comment for b_caller[12] fields. Spotted by: grog Notes: svn path=/head/; revision=53304
* Adjust the buffer cache to better handle small-memory machines. AMatthew Dillon1999-10-241-0/+1
| | | | | | | | | | | | | | | | slightly older version of this code was tested by BDE and I. Also fixes a lockup situation when kva gets too fragmented. Remove the maxvmiobufspace variable and sysctl, they are no longer used. Also cleanup (remove) #if 0 sections from prior commits. This code is more of a hack, but presumably the whole buffer cache implementation is going to be rewritten in the next year so it's no big deal. Notes: svn path=/head/; revision=52452
* Give physio a makeover.Poul-Henning Kamp1999-10-091-5/+3
| | | | | | | | | | | | | | | | | | | | | | - Let physio take read/write compatible args and have it use uio->uio_rw to determine the direction. - physread/physwrite are now #defines for physio - Remove the inversly named minphys(), dev->si_iosize_max takes over. - Physio() always uses pbufs. - Fix the check for non page-aligned transfers, now only unaligned transfers larger than (MAXPHYS - PAGE_SIZE) get fragmented (only interesting for tapes using max blocksize). - General wash-and-clean of code. Constructive input from: bde Notes: svn path=/head/; revision=52066
* Fix process p_locks accounting. Conversions of the owner to LK_KERNPROCMatthew Dillon1999-09-271-0/+3
| | | | | | | | | caused p_locks to be improperly accounted. Submitted by: Tor.Egge@fast.no Notes: svn path=/head/; revision=51702
* $Id$ -> $FreeBSD$Peter Wemm1999-08-281-1/+1
| | | | Notes: svn path=/head/; revision=50477
* Spring cleaning around strategy and disklabels/slices:Poul-Henning Kamp1999-08-141-3/+3
| | | | | | | | | | | | | | | | | Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout. please see comment in sys/conf.h about the flag argument. Remove strategy argument from all the diskslice/label/bad144 implementations, it should be found from the dev_t. Remove bogus and unused strategy1 routines. Remove open/close arguments from dssize(). Pick them up from dev_t. Remove unused and unfinished setgeom support from diskslice/label/bad144 code. Notes: svn path=/head/; revision=49771
* bufhashinit() is called with a caddr_t and is expected to return thePeter Wemm1999-07-091-2/+2
| | | | | | | same in both the alpha and i386 ports. Notes: svn path=/head/; revision=48710
* These changes appear to give us benefits with both small (32MB) andKirk McKusick1999-07-081-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com> Notes: svn path=/head/; revision=48677
* The buffer queue mechanism has been reformulated. Instead of havingKirk McKusick1999-07-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com> Notes: svn path=/head/; revision=48544
* Hopefully fix the remaining glitches with the BUF_*() changes. This shouldPeter Wemm1999-06-291-11/+2
| | | | | | | | | | | | | | | | (really this time) fix pageout to swap and a couple of clustering cases. This simplifies BUF_KERNPROC() so that it unconditionally reassigns the lock owner rather than testing B_ASYNC and having the caller decide when to do the reassign. At present this is required because some places use B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also, vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers rather than the parent walking the cluster_head tailq. Reviewed by: Kirk McKusick <mckusick@mckusick.com> Notes: svn path=/head/; revision=48333
* The BUF_*() routines must be internally splbio() protected otherwise theyPeter Wemm1999-06-271-8/+37
| | | | | | | | | | can cause a biodone() from a disk interrupt to spin when the interrupt code tries to grab the simplelock. Masking BIO here means buftimelock and/or lk->lk_interlock shouldn't be held when an interrupt tries to grab them. Notes: svn path=/head/; revision=48273
* Make SMP work again. lockmgr() needed to be told to free the buftimelockPeter Wemm1999-06-271-1/+3
| | | | | | | interlock. Notes: svn path=/head/; revision=48267
* Quick fix to make libcam compile.. I don't know about the rest of worldPeter Wemm1999-06-261-1/+7
| | | | | | | yet. Notes: svn path=/head/; revision=48247
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingKirk McKusick1999-06-261-5/+85
| | | | | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits. Notes: svn path=/head/; revision=48225
* Introduce two functions: physread() and physwrite() and use these directlyPoul-Henning Kamp1999-05-071-1/+3
| | | | | | | | | in *devsw[] rather than the 46 local copies of the same functions. (grog will do the same for vinum when he has time) Notes: svn path=/head/; revision=46625
* remove b_proc from struct buf, it's (now) unused.Poul-Henning Kamp1999-05-061-2/+3
| | | | | | | Reviewed by: dillon, bde Notes: svn path=/head/; revision=46580
* Remove unused fields from struct buf:Poul-Henning Kamp1999-05-061-6/+1
| | | | | | | | | | | b_savekva b_validoff b_validend Reviewed by: dillon, bde Notes: svn path=/head/; revision=46566
* The VFS/BIO subsystem contained a number of hacks in order to optimizeAlan Cox1999-05-021-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Notes: svn path=/head/; revision=46349
* Reviewed by: Many at differnt times in differnt parts,Julian Elischer1999-03-121-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt) Notes: svn path=/head/; revision=44679
* When fsync'ing a file on a filesystem using soft updates, we first tryKirk McKusick1999-03-021-2/+2
| | | | | | | | | | | | | | | | | | | to write all the dirty blocks. If some of those blocks have dependencies, they will be remarked dirty when the I/O completes. On systems with really fast I/O systems, it is possible to get in an infinite loop trying to flush the buffers, because the I/O finishes before we can get all the dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The previous algorithm looped over the v_dirtyblkhd list writing out buffers until the list emptied.) So, now we mark each buffer that we try to write so that we can distinguish the ones that are being remarked dirty from those that we have not yet tried to flush. Once we have tried to push every buffer once, we then push any associated metadata that is causing the remaining buffers to be redirtied. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Notes: svn path=/head/; revision=44391
* Typo: s/bpreassignbuf/pbreassignbuf/ so the prototype matches it's functionPeter Wemm1999-01-211-2/+2
| | | | Notes: svn path=/head/; revision=42980
* This is a rather large commit that encompasses the new swapper,Matthew Dillon1999-01-211-6/+30
| | | | | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com> Notes: svn path=/head/; revision=42957
* Restored the "reallocblks" code to its former glory. What this does isDavid Greenman1998-11-131-1/+15
| | | | | | | | | basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working. Notes: svn path=/head/; revision=41124
* Convert the vnode clean/dirty attached buffer lists from LISTs to TAILQs.Peter Wemm1998-10-311-6/+10
| | | | | | | | | | | | | | Add a new flags field (we get this for free because of struct packing) for cleaner management of tailq membership. We had two spare b_flags slots, but they are a precious resource and may be needed for other things that are related to other b_flags bits. The two new flags are convenient to use in a seperate location. Reviewed (in principle) by: dg Obtained from: John Dyson's old work-in-progress Notes: svn path=/head/; revision=40786