Virtual Memory SystemMatthewDillonContributed by Management of Physical
Memory—vm_page_tvirtual memoryphysical memoryvm_page_t structurePhysical memory is managed on a page-by-page basis through
the vm_page_t structure. Pages of physical
memory are categorized through the placement of their respective
vm_page_t structures on one of several paging
queues.A page can be in a wired, active, inactive, cache, or free
state. Except for the wired state, the page is typically placed
in a doubly link list queue representing the state that it is
in. Wired pages are not placed on any queue.FreeBSD implements a more involved paging queue for cached
and free pages in order to implement page coloring. Each of
these states involves multiple queues arranged according to the
size of the processor's L1 and L2 caches. When a new page needs
to be allocated, FreeBSD attempts to obtain one that is
reasonably well aligned from the point of view of the L1 and L2
caches relative to the VM object the page is being allocated
for.Additionally, a page may be held with a reference count or
locked with a busy count. The VM system also implements an
ultimate locked state for a page using the
PG_BUSY bit in the page's flags.In general terms, each of the paging queues operates in a
LRU fashion. A page is typically placed in a wired or active
state initially. When wired, the page is usually associated
with a page table somewhere. The VM system ages the page by
scanning pages in a more active paging queue (LRU) in order to
move them to a less-active paging queue. Pages that get moved
into the cache are still associated with a VM object but are
candidates for immediate reuse. Pages in the free queue are
truly free. FreeBSD attempts to minimize the number of pages in
the free queue, but a certain minimum number of truly free pages
must be maintained in order to accommodate page allocation at
interrupt time.If a process attempts to access a page that does not exist
in its page table but does exist in one of the paging queues
(such as the inactive or cache queues), a relatively inexpensive
page reactivation fault occurs which causes the page to be
reactivated. If the page does not exist in system memory at
all, the process must block while the page is brought in from
disk.paging queuesFreeBSD dynamically tunes its paging queues and attempts to
maintain reasonable ratios of pages in the various queues as
well as attempts to maintain a reasonable breakdown of clean
versus dirty pages. The amount of rebalancing that occurs
depends on the system's memory load. This rebalancing is
implemented by the pageout daemon and involves laundering dirty
pages (syncing them with their backing store), noticing when
pages are activity referenced (resetting their position in the
LRU queues or moving them between queues), migrating pages
between queues when the queues are out of balance, and so forth.
FreeBSD's VM system is willing to take a reasonable number of
reactivation page faults to determine how active or how idle a
page actually is. This leads to better decisions being made as
to when to launder or swap-out a page.The Unified Buffer
Cache—vm_object_tunified buffer cachevm_object_t structureFreeBSD implements the idea of a generic
VM object. VM objects can be associated with
backing store of various types—unbacked, swap-backed,
physical device-backed, or file-backed storage. Since the
filesystem uses the same VM objects to manage in-core data
relating to files, the result is a unified buffer cache.VM objects can be shadowed. That is,
they can be stacked on top of each other. For example, you
might have a swap-backed VM object stacked on top of a
file-backed VM object in order to implement a MAP_PRIVATE
mmap()ing. This stacking is also used to implement various
sharing properties, including copy-on-write, for forked address
spaces.It should be noted that a vm_page_t can
only be associated with one VM object at a time. The VM object
shadowing implements the perceived sharing of the same page
across multiple instances.Filesystem I/O—struct bufvnodevnode-backed VM objects, such as file-backed objects,
generally need to maintain their own clean/dirty info
independent from the VM system's idea of clean/dirty. For
example, when the VM system decides to synchronize a physical
page to its backing store, the VM system needs to mark the page
clean before the page is actually written to its backing store.
Additionally, filesystems need to be able to map portions of a
file or file metadata into KVM in order to operate on it.The entities used to manage this are known as filesystem
buffers, struct buf's, or
bp's. When a filesystem needs to operate on
a portion of a VM object, it typically maps part of the object
into a struct buf and the maps the pages in the struct buf into
KVM. In the same manner, disk I/O is typically issued by
mapping portions of objects into buffer structures and then
issuing the I/O on the buffer structures. The underlying
vm_page_t's are typically busied for the duration of the I/O.
Filesystem buffers also have their own notion of being busy,
which is useful to filesystem driver code which would rather
operate on filesystem buffers instead of hard VM pages.FreeBSD reserves a limited amount of KVM to hold mappings
from struct bufs, but it should be made clear that this KVM is
used solely to hold mappings and does not limit the ability to
cache data. Physical data caching is strictly a function of
vm_page_t's, not filesystem buffers.
However, since filesystem buffers are used to placehold I/O,
they do inherently limit the amount of concurrent I/O possible.
However, as there are usually a few thousand filesystem buffers
available, this is not usually a problem.Mapping Page Tables—vm_map_t,
vm_entry_tpage tablesFreeBSD separates the physical page table topology from the
VM system. All hard per-process page tables can be
reconstructed on the fly and are usually considered throwaway.
Special page tables such as those managing KVM are typically
permanently preallocated. These page tables are not
throwaway.FreeBSD associates portions of vm_objects with address
ranges in virtual memory through vm_map_t and
vm_entry_t structures. Page tables are
directly synthesized from the
vm_map_t/vm_entry_t/
vm_object_t hierarchy. Recall that I
mentioned that physical pages are only directly associated with
a vm_object; that is not quite true.
vm_page_t's are also linked into page tables
that they are actively associated with. One
vm_page_t can be linked into several
pmaps, as page tables are called. However,
the hierarchical association holds, so all references to the
same page in the same object reference the same
vm_page_t and thus give us buffer cache
unification across the board.KVM Memory MappingFreeBSD uses KVM to hold various kernel structures. The
single largest entity held in KVM is the filesystem buffer
cache. That is, mappings relating to
struct buf entities.Unlike Linux, FreeBSD does not map all
of physical memory into KVM. This means that FreeBSD can handle
memory configurations up to 4G on 32 bit platforms. In fact, if
the mmu were capable of it, FreeBSD could theoretically handle
memory configurations up to 8TB on a 32 bit platform. However,
since most 32 bit platforms are only capable of mapping 4GB of
ram, this is a moot point.KVM is managed through several mechanisms. The main
mechanism used to manage KVM is the
zone allocator. The zone allocator takes a
chunk of KVM and splits it up into constant-sized blocks of
memory in order to allocate a specific type of structure. You
can use vmstat -m to get an overview of
current KVM utilization broken down by zone.Tuning the FreeBSD VM SystemA concerted effort has been made to make the FreeBSD kernel
dynamically tune itself. Typically you do not need to mess with
anything beyond the and
kernel config options. That is,
kernel compilation options specified in (typically)
/usr/src/sys/i386/conf/CONFIG_FILE.
A description of all available kernel configuration options can
be found in
/usr/src/sys/i386/conf/LINT.In a large system configuration you may wish to increase
. Values typically range from 10 to
128. Note that raising too high can
cause the system to overflow available KVM resulting in
unpredictable operation. It is better to leave
at some reasonable number and add
other options, such as , to increase
specific resources.If your system is going to use the network heavily, you may
want to increase . Typical values
range from 1024 to 4096.The NBUF parameter is also traditionally
used to scale the system. This parameter determines the amount
of KVA the system can use to map filesystem buffers for I/O.
Note that this parameter has nothing whatsoever to do with the
unified buffer cache! This parameter is dynamically tuned in
3.0-CURRENT and later kernels and should generally not be
adjusted manually. We recommend that you
not try to specify an
NBUF parameter. Let the system pick it. Too
small a value can result in extremely inefficient filesystem
operation while too large a value can starve the page queues by
causing too many pages to become wired down.By default, FreeBSD kernels are not optimized. You can set
debugging and optimization flags with the
makeoptions directive in the kernel
configuration. Note that you should not use
unless you can accommodate the large (typically 7 MB+) kernels
that result.makeoptions DEBUG="-g"
makeoptions COPTFLAGS="-O -pipe"Sysctl provides a way to tune kernel parameters at run-time.
You typically do not need to mess with any of the sysctl
variables, especially the VM related ones.Run time VM and system tuning is relatively straightforward.
First, use Soft Updates on your UFS/FFS filesystems whenever
possible.
/usr/src/sys/ufs/ffs/README.softupdates
contains instructions (and restrictions) on how to configure
it.swap partitionSecond, configure sufficient swap. You should have a swap
partition configured on each physical disk, up to four, even on
your work disks. You should have at least 2x the
swap space as you have main memory, and possibly even more if
you do not have a lot of memory. You should also size your swap
partition based on the maximum memory configuration you ever
intend to put on the machine so you do not have to repartition
your disks later on. If you want to be able to accommodate a
crash dump, your first swap partition must be at least as large
as main memory and /var/crash must have
sufficient free space to hold the dump.NFS-based swap is perfectly acceptable on 4.X or later
systems, but you must be aware that the NFS server will take the
brunt of the paging load.