<feed xmlns='http://www.w3.org/2005/Atom'>
<title>src/module, branch zfs-0.6.0-rc3</title>
<subtitle>FreeBSD source tree</subtitle>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/'/>
<entry>
<title>Linux 2.6.29 compat, credentials</title>
<updated>2011-04-07T21:27:09+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-04-07T21:23:45+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=0d3ac5e7356d29fbb7d2880c0a0c457656355ca0'/>
<id>0d3ac5e7356d29fbb7d2880c0a0c457656355ca0</id>
<content type='text'>
The .sync_fs fix as applied did not use the updated SPL credential
API.  This broke builds on Debian Lenny, this change applies the
needed fix to use the portable API.  The original credential changes
are part of commit 81e97e21872a9c38ad66c37fafe1436ee25abee3.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The .sync_fs fix as applied did not use the updated SPL credential
API.  This broke builds on Debian Lenny, this change applies the
needed fix to use the portable API.  The original credential changes
are part of commit 81e97e21872a9c38ad66c37fafe1436ee25abee3.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix ASSERTION(!dsl_pool_sync_context(tx-&gt;tx_pool))</title>
<updated>2011-04-07T16:52:16+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-31T17:05:58+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=eec8164771bee067c3cd55ed0a16dadeeba276de'/>
<id>eec8164771bee067c3cd55ed0a16dadeeba276de</id>
<content type='text'>
Disable the normal reclaim path for the txg_sync thread.  This
ensures the thread will never enter dmu_tx_assign() which can
otherwise occur due to direct reclaim.  If this is allowed to
happen the system can deadlock.  Direct reclaim call path:

  -&gt;shrink_icache_memory-&gt;prune_icache-&gt;dispose_list-&gt;
  clear_inode-&gt;zpl_clear_inode-&gt;zfs_inactive-&gt;dmu_tx_assign
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Disable the normal reclaim path for the txg_sync thread.  This
ensures the thread will never enter dmu_tx_assign() which can
otherwise occur due to direct reclaim.  If this is allowed to
happen the system can deadlock.  Direct reclaim call path:

  -&gt;shrink_icache_memory-&gt;prune_icache-&gt;dispose_list-&gt;
  clear_inode-&gt;zpl_clear_inode-&gt;zfs_inactive-&gt;dmu_tx_assign
</pre>
</div>
</content>
</entry>
<entry>
<title>Add direct+indirect ARC reclaim</title>
<updated>2011-04-07T16:52:10+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-30T01:08:59+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=7cb67b45f33fd7a61af24c675c7347eb5264b38c'/>
<id>7cb67b45f33fd7a61af24c675c7347eb5264b38c</id>
<content type='text'>
Under OpenSolaris all memory reclaim is done asyncronously.  Under
Linux memory reclaim is done asynchronously _and_ synchronously.
When a process allocates memory with GFP_KERNEL it explicitly allows
the kernel to do reclaim on its behalf to satify the allocation.
If that GFP_KERNEL allocation fails the kernel may take more drastic
measures to reclaim the memory such as killing user space processes.

This was observed to happen with ZFS because the ARC could consume
a large fraction of the system memory but no synchronous reclaim
could be performed on it.  The result was GFP_KERNEL allocations
could fail resulting in OOM events, and only moments latter the
arc_reclaim thread would free unused memory from the ARC.

This change leaves the arc_thread in place to manage the fundamental
ARC behavior.  But it adds a synchronous (direct) reclaim path for
the ARC which can be called when memory is badly needed.  It also
adds an asynchronous (indirect) reclaim path which is called
much more frequently to prune the ARC slab caches.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under OpenSolaris all memory reclaim is done asyncronously.  Under
Linux memory reclaim is done asynchronously _and_ synchronously.
When a process allocates memory with GFP_KERNEL it explicitly allows
the kernel to do reclaim on its behalf to satify the allocation.
If that GFP_KERNEL allocation fails the kernel may take more drastic
measures to reclaim the memory such as killing user space processes.

This was observed to happen with ZFS because the ARC could consume
a large fraction of the system memory but no synchronous reclaim
could be performed on it.  The result was GFP_KERNEL allocations
could fail resulting in OOM events, and only moments latter the
arc_reclaim thread would free unused memory from the ARC.

This change leaves the arc_thread in place to manage the fundamental
ARC behavior.  But it adds a synchronous (direct) reclaim path for
the ARC which can be called when memory is badly needed.  It also
adds an asynchronous (indirect) reclaim path which is called
much more frequently to prune the ARC slab caches.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add missing arcstats</title>
<updated>2011-04-07T16:52:05+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-24T19:13:55+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=1834f2d8b715d25bafbb0e4a099994f45c3211ae'/>
<id>1834f2d8b715d25bafbb0e4a099994f45c3211ae</id>
<content type='text'>
The following useful values were missing the arcstats.  This change
adds them in to provide greater visibility in to the arcs behavior.

arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_meta_used                   4    624774592
arc_meta_limit                  4    400785408
arc_meta_max                    4    625594176
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following useful values were missing the arcstats.  This change
adds them in to provide greater visibility in to the arcs behavior.

arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_meta_used                   4    624774592
arc_meta_limit                  4    400785408
arc_meta_max                    4    625594176
</pre>
</div>
</content>
</entry>
<entry>
<title>Call d_instantiate before unlocking inode</title>
<updated>2011-04-07T16:51:57+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-30T06:04:39+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=c85b224fafacafa49817101f26f0a9760a8d05d1'/>
<id>c85b224fafacafa49817101f26f0a9760a8d05d1</id>
<content type='text'>
Under Linux a dentry referencing an inode must be instantiated before
the inode is unlocked.  To accomplish this without overly modifing
the core ZFS code the dentry it passed via the vattr_t.  There are
cases such as replay when a dentry is not available.  In which case
it is obviously not initialized at inode creation time, if a dentry
is needed it will be spliced as when required via d_lookup().
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under Linux a dentry referencing an inode must be instantiated before
the inode is unlocked.  To accomplish this without overly modifing
the core ZFS code the dentry it passed via the vattr_t.  There are
cases such as replay when a dentry is not available.  In which case
it is obviously not initialized at inode creation time, if a dentry
is needed it will be spliced as when required via d_lookup().
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix `make distclean` for `./configure --with-config=user</title>
<updated>2011-04-05T20:33:28+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-04-05T20:13:01+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=d433c206515e567c52ce09589033405a0ae3716e'/>
<id>d433c206515e567c52ce09589033405a0ae3716e</id>
<content type='text'>
    Making distclean in module
    make[1]: Entering directory `/zfs/module'
    make -C  SUBDIRS=`pwd`  clean
    make: Entering an unknown directory
    make: *** SUBDIRS=/zfs/module: No such file or directory.  Stop.

When using --with-config=user the 'distclean' target would fail
because it assumes the kernel configuration infrastrure is set up.
This is not the case, nor does it need to be, because the
'--with-config=user' option will prune the entire ./module subtree
from SUBDIRS.  This prevents most build rules from operating in the
./module directory.

However, the 'dist*' rules will still traverse this directory
because it is listed in DIST_SUBDIRS.  This is correct because we
need to ensure the dist rules package the directory contents
regardless of the configuration for the 'dist' rule.  The correct
way to handle this is to only invoke the kernel build system as
part of the 'clean' rule when CONFIG_KERNEL_TRUE is set.

Initial fix provided by Darik Horn &lt;dajhorn@vanadac.com&gt;.
This commit is a slightly refined form of the original.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
    Making distclean in module
    make[1]: Entering directory `/zfs/module'
    make -C  SUBDIRS=`pwd`  clean
    make: Entering an unknown directory
    make: *** SUBDIRS=/zfs/module: No such file or directory.  Stop.

When using --with-config=user the 'distclean' target would fail
because it assumes the kernel configuration infrastrure is set up.
This is not the case, nor does it need to be, because the
'--with-config=user' option will prune the entire ./module subtree
from SUBDIRS.  This prevents most build rules from operating in the
./module directory.

However, the 'dist*' rules will still traverse this directory
because it is listed in DIST_SUBDIRS.  This is correct because we
need to ensure the dist rules package the directory contents
regardless of the configuration for the 'dist' rule.  The correct
way to handle this is to only invoke the kernel build system as
part of the 'clean' rule when CONFIG_KERNEL_TRUE is set.

Initial fix provided by Darik Horn &lt;dajhorn@vanadac.com&gt;.
This commit is a slightly refined form of the original.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix inflated load average</title>
<updated>2011-04-01T00:07:12+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-04-01T00:07:12+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=bfd214af01dd360417b1331d903655244979fe0f'/>
<id>bfd214af01dd360417b1331d903655244979fe0f</id>
<content type='text'>
Kernel threads which sleep uninterruptibly on Linux are marked in the (D)
state.  These threads are usually in the process of performing IO and are
thus counted against the load average.  The txg_quiesce and txg_sync threads
were always sleeping uninterruptibly and thus inflating the load average.

This change makes them sleep interruptibly.  Some care is required however
because these threads may now be woken early by signals.  In this case the
callers are all careful to check that the required conditions are met after
waking up.  If we're woken early due to a signal they will simply go back
to sleep.  In this case these changes are safe.

Closes #175
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kernel threads which sleep uninterruptibly on Linux are marked in the (D)
state.  These threads are usually in the process of performing IO and are
thus counted against the load average.  The txg_quiesce and txg_sync threads
were always sleeping uninterruptibly and thus inflating the load average.

This change makes them sleep interruptibly.  Some care is required however
because these threads may now be woken early by signals.  In this case the
callers are all careful to check that the required conditions are met after
waking up.  If we're woken early due to a signal they will simply go back
to sleep.  In this case these changes are safe.

Closes #175
</pre>
</div>
</content>
</entry>
<entry>
<title>Linux 2.6.29 compat, .freeze_fs/.unfreeze_fs</title>
<updated>2011-03-22T19:17:24+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-22T18:22:49+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=7a1cdc0775aa4405501e64ebf0bfd998e723f2d7'/>
<id>7a1cdc0775aa4405501e64ebf0bfd998e723f2d7</id>
<content type='text'>
The .freeze_fs/.unfreeze_fs hooks were not added until Linux 2.6.29
Since these hooks are currently unused they are being removed to
allow support of older kernels.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The .freeze_fs/.unfreeze_fs hooks were not added until Linux 2.6.29
Since these hooks are currently unused they are being removed to
allow support of older kernels.
</pre>
</div>
</content>
</entry>
<entry>
<title>Linux 2.6.29 compat, credentials</title>
<updated>2011-03-22T19:15:54+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-22T18:13:41+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=81e97e21872a9c38ad66c37fafe1436ee25abee3'/>
<id>81e97e21872a9c38ad66c37fafe1436ee25abee3</id>
<content type='text'>
As of Linux 2.6.29 a clean credential API was added to the Linux kernel.
Previously the credential was embedded in the task_struct.  Because the
SPL already has considerable support for handling this API change the
ZPL code has been updated to use the Solaris credential API.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As of Linux 2.6.29 a clean credential API was added to the Linux kernel.
Previously the credential was embedded in the task_struct.  Because the
SPL already has considerable support for handling this API change the
ZPL code has been updated to use the Solaris credential API.
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix evict() deadlock</title>
<updated>2011-03-22T19:14:55+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-03-21T17:19:30+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=d6bd8eaae4bdbce8e162414bb6c84ac95fd456b4'/>
<id>d6bd8eaae4bdbce8e162414bb6c84ac95fd456b4</id>
<content type='text'>
Now that KM_SLEEP is not defined as GFP_NOFS there is the possibility
of synchronous reclaim deadlocks.  These deadlocks never existed in the
original OpenSolaris code because all memory reclaim on Solaris is done
asyncronously.  Linux does both synchronous (direct) and asynchronous
(indirect) reclaim.

This commit addresses a deadlock caused by inode eviction.  A KM_SLEEP
allocation may trigger direct memory reclaim and shrink the inode cache.
This can occur while a mutex in the array of ZFS_OBJ_HOLD mutexes is
held.  Through the -&gt;shrink_icache_memory()-&gt;evict()-&gt;zfs_inactive()-&gt;
zfs_zinactive() call path the same mutex may be reacquired resulting
in a deadlock.  To avoid this deadlock the process must not reacquire
the mutex when it is already holding it.

This is a reasonable fix for now but longer term the ZFS_OBJ_HOLD
mutex locking should be reevaluated.  This infrastructure already
prevents us from ever using the Linux lock dependency analysis tools,
and it may limit scalability.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that KM_SLEEP is not defined as GFP_NOFS there is the possibility
of synchronous reclaim deadlocks.  These deadlocks never existed in the
original OpenSolaris code because all memory reclaim on Solaris is done
asyncronously.  Linux does both synchronous (direct) and asynchronous
(indirect) reclaim.

This commit addresses a deadlock caused by inode eviction.  A KM_SLEEP
allocation may trigger direct memory reclaim and shrink the inode cache.
This can occur while a mutex in the array of ZFS_OBJ_HOLD mutexes is
held.  Through the -&gt;shrink_icache_memory()-&gt;evict()-&gt;zfs_inactive()-&gt;
zfs_zinactive() call path the same mutex may be reacquired resulting
in a deadlock.  To avoid this deadlock the process must not reacquire
the mutex when it is already holding it.

This is a reasonable fix for now but longer term the ZFS_OBJ_HOLD
mutex locking should be reevaluated.  This infrastructure already
prevents us from ever using the Linux lock dependency analysis tools,
and it may limit scalability.
</pre>
</div>
</content>
</entry>
</feed>
