| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Notes:
svn path=/head/; revision=122747
|
|
|
|
|
|
|
|
|
|
|
| |
for dev_strategy() use.
Retire bio_driver[12] (aliases for b_io.bio_driver[12]) these fields are
reserved for device driver use and can as such never have any interest
in the buf end of things.
Notes:
svn path=/head/; revision=121297
|
|
|
|
|
|
|
| |
device drivers only.
Notes:
svn path=/head/; revision=121219
|
|
|
|
|
|
|
|
|
| |
bio_offset is the field drivers should use.
bio_pblkno remains as a convenient place to store the number of
the device drivers.
Notes:
svn path=/head/; revision=121216
|
|
|
|
|
|
|
|
| |
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
Notes:
svn path=/head/; revision=121205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the point where it being a macro is no longer sensible, and it will
only be more so in days to come.
BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not
be used directly anymore.
Put the contents of both in the new function dev_strategy() and
make DEV_STRATEGY() call that function.
In addition, this allows us to make the rather magic bufdonebio()
helper function static.
This alse saves hunderedandsome bytes of code in a typical kernel.
Notes:
svn path=/head/; revision=121188
|
|
|
|
|
|
|
|
|
|
| |
bail out if the buffer is not already present.
- The buffer returned by incore() is not locked and should not be sent to
brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the
desired semantics.
Notes:
svn path=/head/; revision=119603
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
interlock.
- Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
buffers. Check for the BKGRDINPROG flag before recycling or throwing
away a buffer. We do this instead because it is not safe for us to move
the original buffer to a new queue from the callback on the background
write buffer.
- Remove the B_LOCKED flag and the locked buffer queue. They are no longer
used.
- The vnode interlock is used around checks for BKGRDINPROG where it may
not be strictly necessary. If we hold the buf lock the a back-ground
write will not be started without our knowledge, one may only be
completed while we're not looking. Rather than remove the code, Document
two of the places where this extra locking is done. A pass should be
done to verify and minimize the locking later.
Notes:
svn path=/head/; revision=119521
|
|
|
|
| |
Notes:
svn path=/head/; revision=118522
|
|
|
|
| |
Notes:
svn path=/head/; revision=118462
|
|
|
|
|
|
|
| |
available caller private field.
Notes:
svn path=/head/; revision=116430
|
|
|
|
|
|
|
|
| |
Retain b_bio as the first element of struct buf for now in case some code
somewhere still do the evil cast thing.
Notes:
svn path=/head/; revision=116419
|
|
|
|
|
|
|
|
|
| |
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
Notes:
svn path=/head/; revision=115456
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
is suitable for use as b_iodone for buf consumers who are not going
through the buf cache.
- Create a new function bwait() which waits for the buf to be done at a set
priority and with a specific wmesg.
- Replace several cases where the above functionality was implemented
without locking with the new functions.
Notes:
svn path=/head/; revision=112183
|
|
|
|
|
|
|
|
|
|
|
|
| |
requests whether or not the lock is available. To avoid "unlocked
buffer" panics after a crash, we just claim that all buffers
are locked when cleaning up after a system panic.
Reported by: Attila Nagy <bra@fsn.hu>
Sponsored by: DARPA & NAI Labs.
Notes:
svn path=/head/; revision=111952
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch
Not objected to by: mckusick
Notes:
svn path=/head/; revision=111856
|
|
|
|
| |
Notes:
svn path=/head/; revision=111694
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
Notes:
svn path=/head/; revision=111463
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that is protected by the vnode lock.
- Move B_SCANNED into b_vflags and call it BV_SCANNED.
- Create a vop_stdfsync() modeled after spec's sync.
- Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
fs specific processing. This gives all of these filesystems proper
behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
- Annotate the locking in buf.h
Notes:
svn path=/head/; revision=110584
|
|
|
|
|
|
|
|
| |
Submitted by: david Xu (davidxu@)
Reviewed by: jhb@
Notes:
svn path=/head/; revision=110414
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not convinced there is anything major wrong with the patch but
them's the rules..
I am using my "David's mentor" hat to revert this as he's
offline for a while.
Notes:
svn path=/head/; revision=110190
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.
A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.
Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.
Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.
KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.
When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.
The code hasn't been tested under SMP by author due to lack of hardware.
Reviewed by: julian
Notes:
svn path=/head/; revision=109877
|
|
|
|
|
|
|
|
|
|
| |
I/O, CAM, and AIO. Still TODO: streamline useracc() checks.
Reviewed by: alc, tegge
MFC after: 7 days
Notes:
svn path=/head/; revision=109572
|
|
|
|
|
|
|
| |
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
Notes:
svn path=/head/; revision=108589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the original hardwired sysctl implementation.
The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.
Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?
Notes:
svn path=/head/; revision=102600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).
The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.
Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:
if (ioflags & IO_SYNC)
flags |= BA_SYNC;
For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).
I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.
Sponsored by: DARPA & NAI Labs.
Notes:
svn path=/head/; revision=100344
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on. Extensive testing has appeared to have shown no
increase in overhead.
Disadvantages
Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal
when there is locality of reference).
Advantages
vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried
to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on
vnodes will be easier.
This commit along with another that Alan is working on for the VM page
global hash table will allow me to implement ranged fsync(), optimize
server-side nfs commit rpcs, and implement partial syncs by the
filesystem syncer (aka filesystem syncer would detect that someone is
trying to get the vnode lock, remembers its place, and skip to the
next vnode).
Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
Notes:
svn path=/head/; revision=99737
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Cache a pointer to the vnode's object in the buf.
- Hold a reference to that object in addition to the vnode's reference just
to be consistent.
- Cleanup code that got the object indirectly through the vp and VOP calls.
This fixes at least one case where we were calling GETVOBJECT without a lock.
It also avoids an expensive layered call at the cost of another pointer in
struct buf.
Notes:
svn path=/head/; revision=99489
|
|
|
|
|
|
|
|
|
| |
Retire daddr64_t and use daddr_t instead.
Sponsored by: DARPA & NAI Labs.
Notes:
svn path=/head/; revision=96572
|
|
|
|
|
|
|
| |
be used.
Notes:
svn path=/head/; revision=96073
|
|
|
|
| |
Notes:
svn path=/head/; revision=96072
|
|
|
|
| |
Notes:
svn path=/head/; revision=96039
|
|
|
|
| |
Notes:
svn path=/head/; revision=96036
|
|
|
|
| |
Notes:
svn path=/head/; revision=92719
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
Notes:
svn path=/head/; revision=92363
|
|
|
|
|
|
|
|
|
| |
vm/vm_pager.c, which is the only place it is used.
* Make the QUEUE_* definitions and bufqueues local to vfs_bio.c.
* constify buf_wmesg.
Notes:
svn path=/head/; revision=91700
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove bowrite(), it is now unused.
This is the first step in getting entirely rid of BIO_ORDERED which is
a generally accepted evil thing.
Approved by: mckusick
Notes:
svn path=/head/; revision=91060
|
|
|
|
| |
Notes:
svn path=/head/; revision=91058
|
|
|
|
|
|
|
|
|
|
|
| |
against VM_WAIT in the pageout code. Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.
Hopefully MFC: before the 4.5 release
Notes:
svn path=/head/; revision=88318
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
Notes:
svn path=/head/; revision=87864
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in wdrain during a write. This flag needs to be used in devices whos
strategy routines turn-around and issue another high level I/O, such as
when MD turns around and issues a VOP_WRITE to vnode backing store, in order
to avoid deadlocking the dirty buffer draining code.
Remove a vprintf() warning from MD when the backing vnode is found to be
in-use. The syncer of buf_daemon could be flushing the backing vnode at
the time of an MD operation so the warning is not correct.
MFC after: 1 week
Notes:
svn path=/head/; revision=86089
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
Notes:
svn path=/head/; revision=83366
|
|
|
|
|
|
|
|
|
|
|
| |
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.
Reviewed by: freebsd-smp, jake
Notes:
svn path=/head/; revision=82127
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.
Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.
Notes:
svn path=/head/; revision=81933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.
Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%. For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.
I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.
Submitted by: tegge, dillon
Notes:
svn path=/head/; revision=77115
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
Notes:
svn path=/head/; revision=76166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.
This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.
The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine. For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
Notes:
svn path=/head/; revision=75580
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
Notes:
svn path=/head/; revision=72200
|
|
|
|
|
|
|
| |
other then curproc.
Notes:
svn path=/head/; revision=70861
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution. However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box). The new
algorithm limits the number of pages laundered in the first pageout daemon
pass. If that is not sufficient then suceessive will be run without any
limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace. This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads. It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis. At the moment it is globalized.
Notes:
svn path=/head/; revision=70374
|