<feed xmlns='http://www.w3.org/2005/Atom'>
<title>src/module, branch zfs-0.6.2</title>
<subtitle>FreeBSD source tree</subtitle>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/'/>
<entry>
<title>Use directory xattrs for symlinks</title>
<updated>2013-08-22T20:30:44+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2013-08-22T20:06:33+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=6a7c0ccca44ad02c476a111d8f7911fc8b12fff7'/>
<id>6a7c0ccca44ad02c476a111d8f7911fc8b12fff7</id>
<content type='text'>
There is currently a subtle bug in the SA implementation which
can crop up which prevents us from safely using multiple variable
length SAs in one object.

Fortunately, the only existing use case for this are symlinks with
SA based xattrs.  Therefore, until the root cause in the SA code
can be identified and fixed we prevent adding SA xattrs to symlinks.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Issue #1468
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is currently a subtle bug in the SA implementation which
can crop up which prevents us from safely using multiple variable
length SAs in one object.

Fortunately, the only existing use case for this are symlinks with
SA based xattrs.  Therefore, until the root cause in the SA code
can be identified and fixed we prevent adding SA xattrs to symlinks.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Issue #1468
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "Evict meta data from ghost lists + l2arc headers"</title>
<updated>2013-08-22T19:15:37+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2013-08-22T19:14:26+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=c273d60d80958dea8edc3c6f5702c9c81ffbd8ea'/>
<id>c273d60d80958dea8edc3c6f5702c9c81ffbd8ea</id>
<content type='text'>
This reverts commit fadd0c4da1e2ccd6014800d8b1a0fd117dd323e8 which
introduced a regression in honoring the meta limit.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Close #1660
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit fadd0c4da1e2ccd6014800d8b1a0fd117dd323e8 which
introduced a regression in honoring the meta limit.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Close #1660
</pre>
</div>
</content>
</entry>
<entry>
<title>Linux 3.11 compat: fops-&gt;iterate()</title>
<updated>2013-08-15T23:19:07+00:00</updated>
<author>
<name>Richard Yao</name>
<email>ryao@gentoo.org</email>
</author>
<published>2013-08-07T12:53:45+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=0f37d0c8bed442dd0d2c1b1dddd68653fa6eec66'/>
<id>0f37d0c8bed442dd0d2c1b1dddd68653fa6eec66</id>
<content type='text'>
Commit torvalds/linux@2233f31aade393641f0eaed43a71110e629bb900
replaced -&gt;readdir() with -&gt;iterate() in struct file_operations.
All filesystems must now use the new -&gt;iterate method.

To handle this the code was reworked to use the new -&gt;iterate
interface.  Care was taken to keep the majority of changes
confined to the ZPL layer which is already Linux specific.
However, minor changes were required to the common zfs_readdir()
function.

Compatibility with older kernels was accomplished by adding
versions of the trivial dir_emit* helper functions.  Also the
various *_readdir() functions were reworked in to wrappers
which create a dir_context structure to pass to the new
*_iterate() functions.

Unfortunately, the new dir_emit* functions prevent us from
passing a private pointer to the filldir function.  The xattr
directory code leveraged this ability through zfs_readdir()
to generate the list of xattr names.  Since we can no longer
use zfs_readdir() a simplified zpl_xattr_readdir() function
was added to perform the same task.

Signed-off-by: Richard Yao &lt;ryao@cs.stonybrook.edu&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1653
Issue #1591
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit torvalds/linux@2233f31aade393641f0eaed43a71110e629bb900
replaced -&gt;readdir() with -&gt;iterate() in struct file_operations.
All filesystems must now use the new -&gt;iterate method.

To handle this the code was reworked to use the new -&gt;iterate
interface.  Care was taken to keep the majority of changes
confined to the ZPL layer which is already Linux specific.
However, minor changes were required to the common zfs_readdir()
function.

Compatibility with older kernels was accomplished by adding
versions of the trivial dir_emit* helper functions.  Also the
various *_readdir() functions were reworked in to wrappers
which create a dir_context structure to pass to the new
*_iterate() functions.

Unfortunately, the new dir_emit* functions prevent us from
passing a private pointer to the filldir function.  The xattr
directory code leveraged this ability through zfs_readdir()
to generate the list of xattr names.  Since we can no longer
use zfs_readdir() a simplified zpl_xattr_readdir() function
was added to perform the same task.

Signed-off-by: Richard Yao &lt;ryao@cs.stonybrook.edu&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1653
Issue #1591
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix z_wr_iss_h zio_execute() import hang</title>
<updated>2013-08-15T22:20:36+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2013-08-14T23:18:58+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=34e143323e359b42bc9d06dd19cc4b1f13091283'/>
<id>34e143323e359b42bc9d06dd19cc4b1f13091283</id>
<content type='text'>
Because we need to be more frugal about our stack usage under
Linux.  The __zio_execute() function was modified to re-dispatch
zios to a ZIO_TASKQ_ISSUE thread when we're in a context which
is known to be stack heavy.  Those two contexts are the sync
thread and what ever thread is performing spa initialization.

Unfortunately, this change introduced an unlikely bug which can
result in a zio being re-dispatched indefinitely and never being
executed.  If during spa initialization we handle a zio with
ZIO_PRIORITY_NOW it will be moved to the high priority queue.
When __zio_execute() is called again for the zio it will mis-
interpret the context and re-dispatch it again.  The system
will get stuck spinning re-dispatching the zio and making no
forward progress.

To fix this rare issue __zio_execute() has been updated not
to re-dispatch zios on either the ZIO_TASKQ_ISSUE or
ZIO_TASKQ_ISSUE_HIGH task queues.

In practice this issue was rarely reported and can usually
be fixed by rebooting the system and importing the pool again.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1455
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Because we need to be more frugal about our stack usage under
Linux.  The __zio_execute() function was modified to re-dispatch
zios to a ZIO_TASKQ_ISSUE thread when we're in a context which
is known to be stack heavy.  Those two contexts are the sync
thread and what ever thread is performing spa initialization.

Unfortunately, this change introduced an unlikely bug which can
result in a zio being re-dispatched indefinitely and never being
executed.  If during spa initialization we handle a zio with
ZIO_PRIORITY_NOW it will be moved to the high priority queue.
When __zio_execute() is called again for the zio it will mis-
interpret the context and re-dispatch it again.  The system
will get stuck spinning re-dispatching the zio and making no
forward progress.

To fix this rare issue __zio_execute() has been updated not
to re-dispatch zios on either the ZIO_TASKQ_ISSUE or
ZIO_TASKQ_ISSUE_HIGH task queues.

In practice this issue was rarely reported and can usually
be fixed by rebooting the system and importing the pool again.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1455
</pre>
</div>
</content>
</entry>
<entry>
<title>Illumos #3618 ::zio dcmd does not show timestamp data</title>
<updated>2013-08-12T23:46:50+00:00</updated>
<author>
<name>Matthew Ahrens</name>
<email>mahrens@delphix.com</email>
</author>
<published>2013-03-21T22:47:36+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=cb682a173a84813b2aeb5d18f58cff1a07531fb3'/>
<id>cb682a173a84813b2aeb5d18f58cff1a07531fb3</id>
<content type='text'>
3618 ::zio dcmd does not show timestamp data
Reviewed by: Adam Leventhal &lt;ahl@delphix.com&gt;
Reviewed by: George Wilson &lt;gwilson@zfsmail.com&gt;
Reviewed by: Christopher Siden &lt;christopher.siden@delphix.com&gt;
Reviewed by: Garrett D'Amore &lt;garrett@damore.org&gt;
Approved by: Dan McDonald &lt;danmcd@nexenta.com&gt;

References:
  http://www.illumos.org/issues/3618
  illumos/illumos-gate@c55e05cb35da47582b7afd38734d2f0d9c6deb40

Notes on porting to ZFS on Linux:

The original changeset mostly deals with mdb ::zio dcmd.
However, in order to provide the requested functionality
it modifies vdev and zio structures to keep the timing data
in nanoseconds instead of ticks. It is these changes that
are ported over in the commit in hand.

One visible change of this commit is that the default value
of 'zfs_vdev_time_shift' tunable is changed:

    zfs_vdev_time_shift = 6
        to
    zfs_vdev_time_shift = 29

The original value of 6 was inherited from OpenSolaris and
was subotimal - since it shifted the raw tick value - it
didn't compensate for different tick frequencies on Linux and
OpenSolaris. The former has HZ=1000, while the latter HZ=100.

(Which itself led to other interesting performance anomalies
under non-trivial load. The deadline scheduler delays the IO
according to its priority - the lower priority the further
the deadline is set. The delay is measured in units of
"shifted ticks". Since the HZ value was 10 times higher,
the delay units were 10 times shorter. Thus really low
priority IO like resilver (delay is 10 units) and scrub
(delay is 20 units) were scheduled much sooner than intended.
The overall effect is that resilver and scrub IO consumed
more bandwidth at the expense of the other IO.)

Now that the bookkeeping is done is nanoseconds the shift
behaves correctly for any tick frequency (HZ).

Ported-by: Cyril Plisko &lt;cyril.plisko@mountall.com&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1643
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3618 ::zio dcmd does not show timestamp data
Reviewed by: Adam Leventhal &lt;ahl@delphix.com&gt;
Reviewed by: George Wilson &lt;gwilson@zfsmail.com&gt;
Reviewed by: Christopher Siden &lt;christopher.siden@delphix.com&gt;
Reviewed by: Garrett D'Amore &lt;garrett@damore.org&gt;
Approved by: Dan McDonald &lt;danmcd@nexenta.com&gt;

References:
  http://www.illumos.org/issues/3618
  illumos/illumos-gate@c55e05cb35da47582b7afd38734d2f0d9c6deb40

Notes on porting to ZFS on Linux:

The original changeset mostly deals with mdb ::zio dcmd.
However, in order to provide the requested functionality
it modifies vdev and zio structures to keep the timing data
in nanoseconds instead of ticks. It is these changes that
are ported over in the commit in hand.

One visible change of this commit is that the default value
of 'zfs_vdev_time_shift' tunable is changed:

    zfs_vdev_time_shift = 6
        to
    zfs_vdev_time_shift = 29

The original value of 6 was inherited from OpenSolaris and
was subotimal - since it shifted the raw tick value - it
didn't compensate for different tick frequencies on Linux and
OpenSolaris. The former has HZ=1000, while the latter HZ=100.

(Which itself led to other interesting performance anomalies
under non-trivial load. The deadline scheduler delays the IO
according to its priority - the lower priority the further
the deadline is set. The delay is measured in units of
"shifted ticks". Since the HZ value was 10 times higher,
the delay units were 10 times shorter. Thus really low
priority IO like resilver (delay is 10 units) and scrub
(delay is 20 units) were scheduled much sooner than intended.
The overall effect is that resilver and scrub IO consumed
more bandwidth at the expense of the other IO.)

Now that the bookkeeping is done is nanoseconds the shift
behaves correctly for any tick frequency (HZ).

Ported-by: Cyril Plisko &lt;cyril.plisko@mountall.com&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1643
</pre>
</div>
</content>
</entry>
<entry>
<title>Linux 3.8 compat: Support CONFIG_UIDGID_STRICT_TYPE_CHECKS</title>
<updated>2013-08-09T22:31:52+00:00</updated>
<author>
<name>Richard Yao</name>
<email>ryao@gentoo.org</email>
</author>
<published>2013-07-14T16:59:24+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=570d6edf1d94917aab49c5755027d05b3c7bcd43'/>
<id>570d6edf1d94917aab49c5755027d05b3c7bcd43</id>
<content type='text'>
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.

We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.

Signed-off-by: Richard Yao &lt;ryao@gentoo.org&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1589
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.

We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.

Signed-off-by: Richard Yao &lt;ryao@gentoo.org&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1589
</pre>
</div>
</content>
</entry>
<entry>
<title>Evict meta data from ghost lists + l2arc headers</title>
<updated>2013-08-09T17:06:12+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2013-07-25T17:39:31+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=fadd0c4da1e2ccd6014800d8b1a0fd117dd323e8'/>
<id>fadd0c4da1e2ccd6014800d8b1a0fd117dd323e8</id>
<content type='text'>
When the meta limit is exceeded the ARC evicts some meta data
buffers from the mfu+mru lists.  Unfortunately, for meta data
heavy workloads it's possible for these buffers to accumulate
on the ghost lists if arc_c doesn't exceed arc_size.

To handle this case arc_adjust_meta() has been entended to
explicitly evict meta data buffers from the ghost lists in
proportion to what was evicted from the mfu+mru lists.

If this is insufficient we request that the VFS release
some inodes and dentries.  This will result in the release
of some dnodes which are counted as 'other' metadata.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the meta limit is exceeded the ARC evicts some meta data
buffers from the mfu+mru lists.  Unfortunately, for meta data
heavy workloads it's possible for these buffers to accumulate
on the ghost lists if arc_c doesn't exceed arc_size.

To handle this case arc_adjust_meta() has been entended to
explicitly evict meta data buffers from the ghost lists in
proportion to what was evicted from the mfu+mru lists.

If this is insufficient we request that the VFS release
some inodes and dentries.  This will result in the release
of some dnodes which are counted as 'other' metadata.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Allow arc_evict_ghost() to only evict meta data</title>
<updated>2013-08-09T17:06:08+00:00</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2013-07-25T17:28:45+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=68121a03daf58a7d5b9351f110196b8ce806e1fa'/>
<id>68121a03daf58a7d5b9351f110196b8ce806e1fa</id>
<content type='text'>
The default behavior of arc_evict_ghost() is to start by evicting
data buffers.  Then only if the requested number of bytes to evict
cannot be satisfied by data buffers move on to meta data buffers.

This is ideal for honoring arc_c since it's preferable to keep the
meta data cached.  However, if we're trying to free memory from the
arc to honor the meta limit it's a problem because we will need to
discard all the data to get to the meta data.

To avoid this issue the arc_evict_ghost() is now passed a fourth
argumented describing which buffer type to start with.  The
arc_evict() function already behaves exactly like this for a
same reason so this is consistent with the existing code.

All existing callers have been updated to pass ARC_BUFC_DATA so
this patch introduces no functional change.  New callers may
pass ARC_BUFC_METADATA to skip immediately to evicting meta
data leaving the normal data untouched.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The default behavior of arc_evict_ghost() is to start by evicting
data buffers.  Then only if the requested number of bytes to evict
cannot be satisfied by data buffers move on to meta data buffers.

This is ideal for honoring arc_c since it's preferable to keep the
meta data cached.  However, if we're trying to free memory from the
arc to honor the meta limit it's a problem because we will need to
discard all the data to get to the meta data.

To avoid this issue the arc_evict_ghost() is now passed a fourth
argumented describing which buffer type to start with.  The
arc_evict() function already behaves exactly like this for a
same reason so this is consistent with the existing code.

All existing callers have been updated to pass ARC_BUFC_DATA so
this patch introduces no functional change.  New callers may
pass ARC_BUFC_METADATA to skip immediately to evicting meta
data leaving the normal data untouched.

Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Illumos #3137 L2ARC compression</title>
<updated>2013-08-08T20:27:21+00:00</updated>
<author>
<name>Saso Kiselkov</name>
<email>skiselkov@gmail.com</email>
</author>
<published>2013-08-01T20:02:10+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=3a17a7a99a1a6332d0999f9be68e2b8dc3933de1'/>
<id>3a17a7a99a1a6332d0999f9be68e2b8dc3933de1</id>
<content type='text'>
3137 L2ARC compression
Reviewed by: George Wilson &lt;george.wilson@delphix.com&gt;
Reviewed by: Matthew Ahrens &lt;mahrens@delphix.com&gt;
Approved by: Dan McDonald &lt;danmcd@nexenta.com&gt;

References:
  illumos/illumos-gate@aad02571bc59671aa3103bb070ae365f531b0b62
  https://www.illumos.org/issues/3137
  http://wiki.illumos.org/display/illumos/L2ARC+Compression

Notes for Linux port:

A l2arc_nocompress module option was added to prevent the
compression of l2arc buffers regardless of how a dataset's
compression property is set.  This allows the legacy behavior
to be preserved.

Ported by: James H &lt;james@kagisoft.co.uk&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1379
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3137 L2ARC compression
Reviewed by: George Wilson &lt;george.wilson@delphix.com&gt;
Reviewed by: Matthew Ahrens &lt;mahrens@delphix.com&gt;
Approved by: Dan McDonald &lt;danmcd@nexenta.com&gt;

References:
  illumos/illumos-gate@aad02571bc59671aa3103bb070ae365f531b0b62
  https://www.illumos.org/issues/3137
  http://wiki.illumos.org/display/illumos/L2ARC+Compression

Notes for Linux port:

A l2arc_nocompress module option was added to prevent the
compression of l2arc buffers regardless of how a dataset's
compression property is set.  This allows the legacy behavior
to be preserved.

Ported by: James H &lt;james@kagisoft.co.uk&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1379
</pre>
</div>
</content>
</entry>
<entry>
<title>Return -1 from arc_shrinker_func()</title>
<updated>2013-08-08T16:20:56+00:00</updated>
<author>
<name>Richard Yao</name>
<email>ryao@gentoo.org</email>
</author>
<published>2013-08-04T23:13:15+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=c11a12bc3b2e5ee9a6bd74e26f1a396b6025fbd4'/>
<id>c11a12bc3b2e5ee9a6bd74e26f1a396b6025fbd4</id>
<content type='text'>
This is analogous to SPL commit zfsonlinux/spl@b9b3715.  While
we don't have clear evidence of systems getting caught here
indefinately like in the SPL this ensures that it will never
happen.

Signed-off-by: Richard Yao &lt;ryao@gentoo.org&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1579
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is analogous to SPL commit zfsonlinux/spl@b9b3715.  While
we don't have clear evidence of systems getting caught here
indefinately like in the SPL this ensures that it will never
happen.

Signed-off-by: Richard Yao &lt;ryao@gentoo.org&gt;
Signed-off-by: Brian Behlendorf &lt;behlendorf1@llnl.gov&gt;
Closes #1579
</pre>
</div>
</content>
</entry>
</feed>
