aboutsummaryrefslogtreecommitdiff
path: root/en_US.ISO8859-1/articles/geom-class/article.xml
diff options
context:
space:
mode:
Diffstat (limited to 'en_US.ISO8859-1/articles/geom-class/article.xml')
-rw-r--r--en_US.ISO8859-1/articles/geom-class/article.xml817
1 files changed, 0 insertions, 817 deletions
diff --git a/en_US.ISO8859-1/articles/geom-class/article.xml b/en_US.ISO8859-1/articles/geom-class/article.xml
deleted file mode 100644
index 94a13553af..0000000000
--- a/en_US.ISO8859-1/articles/geom-class/article.xml
+++ /dev/null
@@ -1,817 +0,0 @@
-<?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook XML V5.0-Based Extension//EN"
- "http://www.FreeBSD.org/XML/share/xml/freebsd50.dtd">
-<article xmlns="http://docbook.org/ns/docbook"
- xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
- xml:lang="en">
- <info>
- <title>Writing a GEOM Class</title>
-
- <authorgroup>
- <author>
- <personname>
- <firstname>Ivan</firstname>
- <surname>Voras</surname>
- </personname>
- <affiliation>
- <address>
- <email>ivoras@FreeBSD.org</email>
- </address>
- </affiliation>
- </author>
- </authorgroup>
-
- <legalnotice xml:id="trademarks" role="trademarks">
- &tm-attrib.freebsd;
- &tm-attrib.intel;
- &tm-attrib.general;
- </legalnotice>
-
- <pubdate>$FreeBSD$</pubdate>
-
- <releaseinfo>$FreeBSD$</releaseinfo>
-
- <abstract>
- <para>This text documents some starting points in developing
- GEOM classes, and kernel modules in general. It is assumed
- that the reader is familiar with C userland
- programming.</para>
- </abstract>
- </info>
-
-<!-- Introduction -->
- <sect1 xml:id="intro">
- <title>Introduction</title>
-
- <sect2 xml:id="intro-docs">
- <title>Documentation</title>
-
- <para>Documentation on kernel programming is scarce &mdash; it
- is one of few areas where there is nearly nothing in the way
- of friendly tutorials, and the phrase <quote>use the
- source!</quote> really holds true. However, there are some
- bits and pieces (some of them seriously outdated) floating
- around that should be studied before beginning to code:</para>
-
- <itemizedlist>
- <listitem>
- <para>The <link
- xlink:href="&url.books.developers-handbook;/index.html">FreeBSD
- Developer's Handbook</link> &mdash; part of the
- documentation project, it does not contain anything
- specific to kernel programming, but rather some general
- useful information.</para>
- </listitem>
-
- <listitem>
- <para>The <link
- xlink:href="&url.books.arch-handbook;/index.html">FreeBSD
- Architecture Handbook</link> &mdash; also from the
- documentation project, contains descriptions of several
- low-level facilities and procedures. The most important
- chapter is 13, <link
- xlink:href="&url.books.arch-handbook;/driverbasics.html">Writing
- FreeBSD device drivers</link>.</para>
- </listitem>
-
- <listitem>
- <para>The Blueprints section of <link
- xlink:href="http://www.freebsddiary.org">FreeBSD
- Diary</link> web site &mdash; contains several
- interesting articles on kernel
- facilities.</para>
- </listitem>
-
- <listitem>
- <para>The man pages in section 9 &mdash; for important
- documentation on kernel functions.</para>
- </listitem>
-
- <listitem>
- <para>The &man.geom.4; man page and <link
- xlink:href="http://phk.freebsd.dk/pubs/">PHK's GEOM
- slides</link> &mdash; for general introduction of the
- GEOM subsystem.</para>
- </listitem>
-
- <listitem>
- <para>Man pages &man.g.bio.9;, &man.g.event.9;,
- &man.g.data.9;, &man.g.geom.9;, &man.g.provider.9;
- &man.g.consumer.9;, &man.g.access.9; &amp; others linked
- from those, for documentation on specific
- functionalities.</para>
- </listitem>
-
- <listitem>
- <para>The &man.style.9; man page &mdash; for documentation
- on the coding-style conventions which must be followed for
- any code which is to be committed to the FreeBSD
- tree.</para>
- </listitem>
- </itemizedlist>
- </sect2>
- </sect1>
-
- <sect1 xml:id="prelim">
- <title>Preliminaries</title>
-
- <para>The best way to do kernel development is to have (at least)
- two separate computers. One of these would contain the
- development environment and sources, and the other would be used
- to test the newly written code by network-booting and
- network-mounting filesystems from the first one. This way if
- the new code contains bugs and crashes the machine, it will not
- mess up the sources (and other <quote>live</quote> data). The
- second system does not even require a proper display. Instead,
- it could be connected with a serial cable or KVM to the first
- one.</para>
-
- <para>But, since not everybody has two or more computers handy,
- there are a few things that can be done to prepare an otherwise
- <quote>live</quote> system for developing kernel code. This
- setup is also applicable for developing in a <link
- xlink:href="http://www.vmware.com/">VMWare</link> or <link
- xlink:href="http://www.qemu.org/">QEmu</link> virtual machine
- (the next best thing after a dedicated development
- machine).</para>
-
- <sect2 xml:id="prelim-system">
- <title>Modifying a System for Development</title>
-
- <para>For any kernel programming a kernel with
- <option>INVARIANTS</option> enabled is a must-have. So enter
- these in your kernel configuration file:</para>
-
- <programlisting>options INVARIANT_SUPPORT
-options INVARIANTS</programlisting>
-
- <para>For more debugging you should also include WITNESS
- support, which will alert you of mistakes in locking:</para>
-
- <programlisting>options WITNESS_SUPPORT
-options WITNESS</programlisting>
-
- <para>For debugging crash dumps, a kernel with debug symbols is
- needed:</para>
-
- <programlisting> makeoptions DEBUG=-g</programlisting>
-
- <para>With the usual way of installing the kernel (<command>make
- installkernel</command>) the debug kernel will not be
- automatically installed. It is called
- <filename>kernel.debug</filename> and located in
- <filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
- convenience it should be copied to
- <filename>/boot/kernel/</filename>.</para>
-
- <para>Another convenience is enabling the kernel debugger so you
- can examine a kernel panic when it happens. For this, enter
- the following lines in your kernel configuration file:</para>
-
- <programlisting>options KDB
-options DDB
-options KDB_TRACE</programlisting>
-
- <para>For this to work you might need to set a sysctl (if it is
- not on by default):</para>
-
- <programlisting> debug.debugger_on_panic=1</programlisting>
-
- <para>Kernel panics will happen, so care should be taken with
- the filesystem cache. In particular, having softupdates might
- mean the latest file version could be lost if a panic occurs
- before it is committed to storage. Disabling softupdates
- yields a great performance hit, and still does not guarantee
- data consistency. Mounting filesystem with the
- <quote>sync</quote> option is needed for that. For a
- compromise, the softupdates cache delays can be shortened.
- There are three sysctl's that are useful for this (best to be
- set in <filename>/etc/sysctl.conf</filename>):</para>
-
- <programlisting>kern.filedelay=5
-kern.dirdelay=4
-kern.metadelay=3</programlisting>
-
- <para>The numbers represent seconds.</para>
-
- <para>For debugging kernel panics, kernel core dumps are
- required. Since a kernel panic might make filesystems
- unusable, this crash dump is first written to a raw partition.
- Usually, this is the swap partition. This partition must be
- at least as large as the physical RAM in the machine. On the
- next boot, the dump is copied to a regular file. This happens
- after filesystems are checked and mounted, and before swap is
- enabled. This is controlled with two
- <filename>/etc/rc.conf</filename> variables:</para>
-
- <programlisting>dumpdev="/dev/ad0s4b"
-dumpdir="/usr/core </programlisting>
-
- <para>The <varname>dumpdev</varname> variable specifies the swap
- partition and <varname>dumpdir</varname> tells the system
- where in the filesystem to relocate the core dump on
- reboot.</para>
-
- <para>Writing kernel core dumps is slow and takes a long time so
- if you have lots of memory (&gt;256M) and lots of panics it
- could be frustrating to sit and wait while it is done (twice
- &mdash; first to write it to swap, then to relocate it to
- filesystem). It is convenient then to limit the amount of RAM
- the system will use via a
- <filename>/boot/loader.conf</filename> tunable:</para>
-
- <programlisting> hw.physmem="256M"</programlisting>
-
- <para>If the panics are frequent and filesystems large (or you
- simply do not trust softupdates+background fsck) it is
- advisable to turn background fsck off via
- <filename>/etc/rc.conf</filename> variable:</para>
-
- <programlisting> background_fsck="NO"</programlisting>
-
- <para>This way, the filesystems will always get checked when
- needed. Note that with background fsck, a new panic could
- happen while it is checking the disks. Again, the safest way
- is not to have many local filesystems by using another
- computer as an NFS server.</para>
- </sect2>
-
- <sect2 xml:id="prelim-starting">
- <title>Starting the Project</title>
-
- <para>For the purpose of creating a new GEOM class, an empty
- subdirectory has to be created under an arbitrary
- user-accessible directory. You do not have to create the
- module directory under <filename>/usr/src</filename>.</para>
- </sect2>
-
- <sect2 xml:id="prelim-makefile">
- <title>The Makefile</title>
-
- <para>It is good practice to create
- <filename>Makefile</filename>s for every nontrivial coding
- project, which of course includes kernel modules.</para>
-
- <para>Creating the <filename>Makefile</filename> is simple
- thanks to an extensive set of helper routines provided by the
- system. In short, here is how a minimal
- <filename>Makefile</filename> looks for a kernel
- module:</para>
-
- <programlisting>SRCS=g_journal.c
-KMOD=geom_journal
-
-.include &lt;bsd.kmod.mk&gt;</programlisting>
-
- <para>This <filename>Makefile</filename> (with changed
- filenames) will do for any kernel module, and a GEOM class can
- reside in just one kernel module. If more than one file is
- required, list it in the <envar>SRCS</envar> variable,
- separated with whitespace from other filenames.</para>
- </sect2>
- </sect1>
-
- <sect1 xml:id="kernelprog">
- <title>On FreeBSD Kernel Programming</title>
-
- <sect2 xml:id="kernelprog-memalloc">
- <title>Memory Allocation</title>
-
- <para>See &man.malloc.9;. Basic memory allocation is only
- slightly different than its userland equivalent. Most
- notably, <function>malloc</function>() and
- <function>free</function>() accept additional parameters as is
- described in the man page.</para>
-
- <para>A <quote>malloc type</quote> must be declared in the
- declaration section of a source file, like this:</para>
-
- <programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
-
- <para>To use this macro, <filename>sys/param.h</filename>,
- <filename>sys/kernel.h</filename> and
- <filename>sys/malloc.h</filename> headers must be
- included.</para>
-
- <para>There is another mechanism for allocating memory, the UMA
- (Universal Memory Allocator). See &man.uma.9; for details,
- but it is a special type of allocator mainly used for speedy
- allocation of lists comprised of same-sized items (for
- example, dynamic arrays of structs).</para>
- </sect2>
-
- <sect2 xml:id="kernelprog-lists">
- <title>Lists and Queues</title>
-
- <para>See &man.queue.3;. There are a LOT of cases when a list
- of things needs to be maintained. Fortunately, this data
- structure is implemented (in several ways) by C macros
- included in the system. The most used list type is TAILQ
- because it is the most flexible. It is also the one with
- largest memory requirements (its elements are doubly-linked)
- and also the slowest (although the speed variation is on the
- order of several CPU instructions more, so it should not be
- taken seriously).</para>
-
- <para>If data retrieval speed is very important, see
- &man.tree.3; and &man.hashinit.9;.</para>
- </sect2>
-
- <sect2 xml:id="kernelprog-bios">
- <title>BIOs</title>
-
- <para>Structure <varname remap="structname">bio</varname> is
- used for any and all Input/Output operations concerning GEOM.
- It basically contains information about what device
- ('provider') should satisfy the request, request type, offset,
- length, pointer to a buffer, and a bunch of
- <quote>user-specific</quote> flags and fields that can help
- implement various hacks.</para>
-
- <para>The important thing here is that <varname
- remap="structname">bio</varname>s are handled
- asynchronously. That means that, in most parts of the code,
- there is no analogue to userland's &man.read.2; and
- &man.write.2; calls that do not return until a request is
- done. Rather, a developer-supplied function is called as a
- notification when the request gets completed (or results in
- error).</para>
-
- <para>The asynchronous programming model (also called
- <quote>event-driven</quote>) is somewhat harder than the much
- more used imperative one used in userland (at least it takes a
- while to get used to it). In some cases the helper routines
- <function>g_write_data</function>() and
- <function>g_read_data</function>() can be used, but
- <emphasis>not always</emphasis>. In particular, they cannot
- be used when a mutex is held; for example, the GEOM topology
- mutex or the internal mutex held during the
- <function>.start</function>() and <function>.stop</function>()
- functions.</para>
- </sect2>
- </sect1>
-
- <sect1 xml:id="geom">
- <title>On GEOM Programming</title>
-
- <sect2 xml:id="geom-ggate">
- <title>Ggate</title>
-
- <para>If maximum performance is not needed, a much simpler way
- of making a data transformation is to implement it in userland
- via the ggate (GEOM gate) facility. Unfortunately, there is
- no easy way to convert between, or even share code between the
- two approaches.</para>
- </sect2>
-
- <sect2 xml:id="geom-class">
- <title>GEOM Class</title>
-
- <para>GEOM classes are transformations on the data. These
- transformations can be combined in a tree-like fashion.
- Instances of GEOM classes are called
- <emphasis>geoms</emphasis>.</para>
-
- <para>Each GEOM class has several <quote>class methods</quote>
- that get called when there is no geom instance available (or
- they are simply not bound to a single instance):</para>
-
- <itemizedlist>
- <listitem>
- <para><function>.init</function> is called when GEOM becomes
- aware of a GEOM class (when the kernel module gets
- loaded.)</para>
- </listitem>
-
- <listitem>
- <para><function>.fini</function> gets called when GEOM
- abandons the class (when the module gets
- unloaded)</para>
- </listitem>
-
- <listitem>
- <para><function>.taste</function> is called next, once for
- each provider the system has available. If applicable,
- this function will usually create and start a geom
- instance.</para>
- </listitem>
-
- <listitem>
- <para><function>.destroy_geom</function> is called when the
- geom should be disbanded</para>
- </listitem>
-
- <listitem>
- <para><function>.ctlconf</function> is called when user
- requests reconfiguration of existing
- geom</para>
- </listitem>
- </itemizedlist>
-
- <para>Also defined are the GEOM event functions, which will get
- copied to the geom instance.</para>
-
- <para>Field <function>.geom</function> in the <varname
- remap="structname">g_class</varname> structure is a LIST of
- geoms instantiated from the class.</para>
-
- <para>These functions are called from the g_event kernel
- thread.</para>
- </sect2>
-
- <sect2 xml:id="geom-softc">
- <title>Softc</title>
-
- <para>The name <quote>softc</quote> is a legacy term for
- <quote>driver private data</quote>. The name most probably
- comes from the archaic term <quote>software control
- block</quote>. In GEOM, it is a structure (more precise:
- pointer to a structure) that can be attached to a geom
- instance to hold whatever data is private to the geom
- instance. Most GEOM classes have the following
- members:</para>
-
- <itemizedlist>
- <listitem>
- <para><varname>struct g_provider *provider</varname> : The
- <quote>provider</quote> this geom
- instantiates</para>
- </listitem>
-
- <listitem>
- <para><varname>uint16_t n_disks</varname> : Number of
- consumer this geom consumes</para>
- </listitem>
-
- <listitem>
- <para><varname>struct g_consumer **disks</varname> : Array
- of <varname>struct g_consumer*</varname>. (It is not
- possible to use just single indirection because struct
- g_consumer* are created on our behalf by
- GEOM).</para>
- </listitem>
- </itemizedlist>
-
- <para>The <varname remap="structname">softc</varname> structure
- contains all the state of geom instance. Every geom instance
- has its own softc.</para>
- </sect2>
-
- <sect2 xml:id="geom-metadata">
- <title>Metadata</title>
-
- <para>Format of metadata is more-or-less class-dependent, but
- MUST start with:</para>
-
- <itemizedlist>
- <listitem>
- <para>16 byte buffer for null-terminated signature (usually
- the class name)</para>
- </listitem>
-
- <listitem>
- <para>uint32 version ID</para>
- </listitem>
- </itemizedlist>
-
- <para>It is assumed that geom classes know how to handle
- metadata with version ID's lower than theirs.</para>
-
- <para>Metadata is located in the last sector of the provider
- (and thus must fit in it).</para>
-
- <para>(All this is implementation-dependent but all existing
- code works like that, and it is supported by
- libraries.)</para>
- </sect2>
-
- <sect2 xml:id="geom-creating">
- <title>Labeling/creating a GEOM</title>
-
- <para>The sequence of events is:</para>
-
- <itemizedlist>
- <listitem>
- <para>user calls &man.geom.8; utility (or one of its
- hardlinked friends)</para>
- </listitem>
-
- <listitem>
- <para>the utility figures out which geom class it is
- supposed to handle and searches for
- <filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
- library (usually in
- <filename>/lib/geom</filename>).</para>
- </listitem>
-
- <listitem>
- <para>it &man.dlopen.3;-s the library, extracts the
- definitions of command-line parameters and helper
- functions.</para>
- </listitem>
- </itemizedlist>
-
- <para>In the case of creating/labeling a new geom, this is what
- happens:</para>
-
- <itemizedlist>
- <listitem>
- <para>&man.geom.8; looks in the command-line argument for
- the command (usually <option>label</option>), and calls a
- helper function.</para>
- </listitem>
-
- <listitem>
- <para>The helper function checks parameters and gathers
- metadata, which it proceeds to write to all concerned
- providers.</para>
- </listitem>
-
- <listitem>
- <para>This <quote>spoils</quote> existing geoms (if any) and
- initializes a new round of <quote>tasting</quote> of the
- providers. The intended geom class recognizes the
- metadata and brings the geom up.</para>
- </listitem>
- </itemizedlist>
-
- <para>(The above sequence of events is implementation-dependent
- but all existing code works like that, and it is supported by
- libraries.)</para>
- </sect2>
-
- <sect2 xml:id="geom-command">
- <title>GEOM Command Structure</title>
-
- <para>The helper <filename>geom_CLASSNAME.so</filename> library
- exports <varname remap="structname">class_commands</varname>
- structure, which is an array of <varname
- remap="structname">struct g_command</varname> elements.
- Commands are of uniform format and look like:</para>
-
- <programlisting> verb [-options] geomname [other]</programlisting>
-
- <para>Common verbs are:</para>
-
- <itemizedlist>
- <listitem>
- <para>label &mdash; to write metadata to devices so they can
- be recognized at tasting and brought up in
- geoms</para>
- </listitem>
-
- <listitem>
- <para>destroy &mdash; to destroy metadata, so the geoms get
- destroyed</para>
- </listitem>
- </itemizedlist>
-
- <para>Common options are:</para>
-
- <itemizedlist>
- <listitem>
- <para><literal>-v</literal> : be verbose</para>
- </listitem>
-
- <listitem>
- <para><literal>-f</literal> : force</para>
- </listitem>
- </itemizedlist>
-
- <para>Many actions, such as labeling and destroying metadata can
- be performed in userland. For this, <varname
- remap="structname">struct g_command</varname> provides field
- <varname>gc_func</varname> that can be set to a function (in
- the same <filename>.so</filename>) that will be called to
- process a verb. If <varname>gc_func</varname> is NULL, the
- command will be passed to kernel module, to
- <function>.ctlreq</function> function of the geom
- class.</para>
- </sect2>
-
- <sect2 xml:id="geom-geoms">
- <title>Geoms</title>
-
- <para>Geoms are instances of GEOM classes. They have internal
- data (a softc structure) and some functions with which they
- respond to external events.</para>
-
- <para>The event functions are:</para>
-
- <itemizedlist>
- <listitem>
- <para><function>.access</function> : calculates permissions
- (read/write/exclusive)</para>
- </listitem>
-
- <listitem>
- <para><function>.dumpconf</function> : returns XML-formatted
- information about the geom</para>
- </listitem>
-
- <listitem>
- <para><function>.orphan</function> : called when some
- underlying provider gets disconnected</para>
- </listitem>
-
- <listitem>
- <para><function>.spoiled</function> : called when some
- underlying provider gets written to</para>
- </listitem>
-
- <listitem>
- <para><function>.start</function> : handles I/O</para>
- </listitem>
- </itemizedlist>
-
- <para>These functions are called from the
- <function>g_down</function> kernel thread and there can be no
- sleeping in this context, (see definition of sleeping
- elsewhere) which limits what can be done quite a bit, but
- forces the handling to be fast.</para>
-
- <para>Of these, the most important function for doing actual
- useful work is the <function>.start</function>() function,
- which is called when a BIO request arrives for a provider
- managed by a instance of geom class.</para>
- </sect2>
-
- <sect2 xml:id="geom-threads">
- <title>GEOM Threads</title>
-
- <para>There are three kernel threads created and run by the GEOM
- framework:</para>
-
- <itemizedlist>
- <listitem>
- <para><literal>g_down</literal> : Handles requests coming
- from high-level entities (such as a userland request) on
- the way to physical devices</para>
- </listitem>
-
- <listitem>
- <para><literal>g_up</literal> : Handles responses from
- device drivers to requests made by higher-level
- entities</para>
- </listitem>
-
- <listitem>
- <para><literal>g_event</literal> : Handles all other cases:
- creation of geom instances, access counting,
- <quote>spoil</quote> events, etc.</para>
- </listitem>
- </itemizedlist>
-
- <para>When a user process issues <quote>read data X at offset Y
- of a file</quote> request, this is what happens:</para>
-
- <itemizedlist>
- <listitem>
- <para>The filesystem converts the request into a struct bio
- instance and passes it to the GEOM subsystem. It knows
- what geom instance should handle it because filesystems
- are hosted directly on a geom instance.</para>
- </listitem>
-
- <listitem>
- <para>The request ends up as a call to the
- <function>.start</function>() function made on the g_down
- thread and reaches the top-level geom
- instance.</para>
- </listitem>
-
- <listitem>
- <para>This top-level geom instance (for example the
- partition slicer) determines that the request should be
- routed to a lower-level instance (for example the disk
- driver). It makes a copy of the bio request (bio requests
- <emphasis>ALWAYS</emphasis> need to be copied between
- instances, with <function>g_clone_bio</function>()!),
- modifies the data offset and target provider fields and
- executes the copy with
- <function>g_io_request</function>()</para>
- </listitem>
-
- <listitem>
- <para>The disk driver gets the bio request also as a call to
- <function>.start</function>() on the
- <literal>g_down</literal> thread. It talks to hardware,
- gets the data back, and calls
- <function>g_io_deliver</function>() on the
- bio.</para>
- </listitem>
-
- <listitem>
- <para>Now, the notification of bio completion <quote>bubbles
- up</quote> in the <literal>g_up</literal> thread. First
- the partition slicer gets <function>.done</function>()
- called in the <literal>g_up</literal> thread, it uses
- information stored in the bio to free the cloned <varname
- remap="structname">bio</varname> structure (with
- <function>g_destroy_bio</function>()) and calls
- <function>g_io_deliver</function>() on the original
- request.</para>
- </listitem>
-
- <listitem>
- <para>The filesystem gets the data and transfers it to
- userland.</para>
- </listitem>
- </itemizedlist>
-
- <para>See &man.g.bio.9; man page for information how the data is
- passed back and forth in the <varname
- remap="structname">bio</varname> structure (note in
- particular the <varname>bio_parent</varname> and
- <varname>bio_children</varname> fields and how they are
- handled).</para>
-
- <para>One important feature is: <emphasis>THERE CAN BE NO
- SLEEPING IN G_UP AND G_DOWN THREADS</emphasis>. This means
- that none of the following things can be done in those threads
- (the list is of course not complete, but only
- informative):</para>
-
- <itemizedlist>
- <listitem>
- <para>Calls to <function>msleep</function>() and
- <function>tsleep</function>(),
- obviously.</para>
- </listitem>
-
- <listitem>
- <para>Calls to <function>g_write_data</function>() and
- <function>g_read_data</function>(), because these sleep
- between passing the data to consumers and
- returning.</para>
- </listitem>
-
- <listitem>
- <para>Waiting for I/O.</para>
- </listitem>
-
- <listitem>
- <para>Calls to &man.malloc.9; and
- <function>uma_zalloc</function>() with
- <varname>M_WAITOK</varname> flag set</para>
- </listitem>
-
- <listitem>
- <para>sx and other sleepable locks</para>
- </listitem>
- </itemizedlist>
-
- <para>This restriction is here to stop GEOM code clogging the
- I/O request path, since sleeping is usually not time-bound and
- there can be no guarantees on how long will it take (there are
- some other, more technical reasons also). It also means that
- there is not much that can be done in those threads; for
- example, almost any complex thing requires memory allocation.
- Fortunately, there is a way out: creating additional kernel
- threads.</para>
- </sect2>
-
- <sect2 xml:id="geom-kernelthreads">
- <title>Kernel Threads for Use in GEOM Code</title>
-
- <para>Kernel threads are created with &man.kthread.create.9;
- function, and they are sort of similar to userland threads in
- behavior, only they cannot return to caller to signify
- termination, but must call &man.kthread.exit.9;.</para>
-
- <para>In GEOM code, the usual use of threads is to offload
- processing of requests from <literal>g_down</literal> thread
- (the <function>.start</function>() function). These threads
- look like <quote>event handlers</quote>: they have a linked
- list of event associated with them (on which events can be
- posted by various functions in various threads so it must be
- protected by a mutex), take the events from the list one by
- one and process them in a big <literal>switch</literal>()
- statement.</para>
-
- <para>The main benefit of using a thread to handle I/O requests
- is that it can sleep when needed. Now, this sounds good, but
- should be carefully thought out. Sleeping is well and very
- convenient but can very effectively destroy performance of the
- geom transformation. Extremely performance-sensitive classes
- probably should do all the work in
- <function>.start</function>() function call, taking great care
- to handle out-of-memory and similar errors.</para>
-
- <para>The other benefit of having a event-handler thread like
- that is to serialize all the requests and responses coming
- from different geom threads into one thread. This is also
- very convenient but can be slow. In most cases, handling of
- <function>.done</function>() requests can be left to the
- <literal>g_up</literal> thread.</para>
-
- <para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have one
- distinction from their more common userland cousins &mdash;
- the code cannot sleep while holding a mutex). If the code
- needs to sleep a lot, &man.sx.9; locks may be more
- appropriate. On the other hand, if you do almost everything
- in a single thread, you may get away with no mutexes at
- all.</para>
- </sect2>
- </sect1>
-</article>