<feed xmlns='http://www.w3.org/2005/Atom'>
<title>src/sys/bsm, branch stable/14</title>
<subtitle>FreeBSD source tree</subtitle>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/'/>
<entry>
<title>New setcred() system call and associated MAC hooks</title>
<updated>2025-04-03T19:31:03+00:00</updated>
<author>
<name>Olivier Certner</name>
<email>olce@FreeBSD.org</email>
</author>
<published>2024-07-18T20:47:43+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=c1d7552dddb5276c8d1cfe2b8c533646164e2f7a'/>
<id>c1d7552dddb5276c8d1cfe2b8c533646164e2f7a</id>
<content type='text'>
This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label.  Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.

Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges).  This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones.  For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials.  Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them.  It
could be started and ended by a special system call.  Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).

We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often.  It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.

The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves.  The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible.  MAC/do is
going to leverage these in a subsequent commit.  A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.

While here, add missing includes and declarations for standalone
inclusion of &lt;sys/ucred.h&gt; both from kernel and userspace (for the
latter, it has been working thanks to &lt;bsm/audit.h&gt; already including
&lt;sys/types.h&gt;).

Reviewed by:    brooks
Approved by:    markj (mentor)
Relnotes:       yes
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D47618

(cherry picked from commit ddb3eb4efe55e57c206f3534263c77b837aff1dc)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label.  Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.

Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges).  This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones.  For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials.  Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them.  It
could be started and ended by a special system call.  Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).

We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often.  It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.

The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves.  The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible.  MAC/do is
going to leverage these in a subsequent commit.  A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.

While here, add missing includes and declarations for standalone
inclusion of &lt;sys/ucred.h&gt; both from kernel and userspace (for the
latter, it has been working thanks to &lt;bsm/audit.h&gt; already including
&lt;sys/types.h&gt;).

Reviewed by:    brooks
Approved by:    markj (mentor)
Relnotes:       yes
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D47618

(cherry picked from commit ddb3eb4efe55e57c206f3534263c77b837aff1dc)
</pre>
</div>
</content>
</entry>
<entry>
<title>timerfd: Move implementation from linux compat to sys/kern</title>
<updated>2023-08-24T20:28:56+00:00</updated>
<author>
<name>Jake Freeland</name>
<email>jfree@freebsd.org</email>
</author>
<published>2023-08-24T04:39:54+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=af93fea710385b2b11f0cabd377e7ed6f3d97c34'/>
<id>af93fea710385b2b11f0cabd377e7ed6f3d97c34</id>
<content type='text'>
Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459
</pre>
</div>
</content>
</entry>
<entry>
<title>sys: Remove $FreeBSD$: two-line .h pattern</title>
<updated>2023-08-16T17:54:11+00:00</updated>
<author>
<name>Warner Losh</name>
<email>imp@FreeBSD.org</email>
</author>
<published>2023-08-16T17:54:11+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=95ee2897e98f5d444f26ed2334cc7c439f9c16c6'/>
<id>95ee2897e98f5d444f26ed2334cc7c439f9c16c6</id>
<content type='text'>
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
</pre>
</div>
</content>
</entry>
<entry>
<title>Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).</title>
<updated>2021-08-05T15:20:42+00:00</updated>
<author>
<name>Ka Ho Ng</name>
<email>khng@FreeBSD.org</email>
</author>
<published>2021-08-05T15:20:42+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=0dc332bff200c940edc36c4715b629a2e1e9f9ae'/>
<id>0dc332bff200c940edc36c4715b629a2e1e9f9ae</id>
<content type='text'>
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
</pre>
</div>
</content>
</entry>
<entry>
<title>Add aio_writev and aio_readv</title>
<updated>2021-01-03T02:57:58+00:00</updated>
<author>
<name>Alan Somers</name>
<email>asomers@FreeBSD.org</email>
</author>
<published>2021-01-02T23:34:20+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=022ca2fc7fe08d51f33a1d23a9be49e6d132914e'/>
<id>022ca2fc7fe08d51f33a1d23a9be49e6d132914e</id>
<content type='text'>
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by:    jhb, kib, bcr
Relnotes:       yes
Differential Revision: https://reviews.freebsd.org/D27743
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by:    jhb, kib, bcr
Relnotes:       yes
Differential Revision: https://reviews.freebsd.org/D27743
</pre>
</div>
</content>
</entry>
<entry>
<title>Expose eventfd in the native API/ABI using a new __specialfd syscall</title>
<updated>2020-12-27T10:57:26+00:00</updated>
<author>
<name>Konstantin Belousov</name>
<email>kib@FreeBSD.org</email>
</author>
<published>2020-12-23T14:14:04+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=7a202823aa54ba18c485bdbcf355269bcfee1ab9'/>
<id>7a202823aa54ba18c485bdbcf355269bcfee1ab9</id>
<content type='text'>
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues.  Unfortunately, kqueues
are not passable between processes.  And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow.  This came up when debugging strange HW video decode
failures in Firefox.  A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues.  Unfortunately, kqueues
are not passable between processes.  And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow.  This came up when debugging strange HW video decode
failures in Firefox.  A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
</pre>
</div>
</content>
</entry>
<entry>
<title>bsm: add AUE_CLOSERANGE</title>
<updated>2020-04-24T01:27:25+00:00</updated>
<author>
<name>Kyle Evans</name>
<email>kevans@FreeBSD.org</email>
</author>
<published>2020-04-24T01:27:25+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=ca9cba33a5202d8b9fb638796fe37b2a974184ee'/>
<id>ca9cba33a5202d8b9fb638796fe37b2a974184ee</id>
<content type='text'>
AUE_CLOSERANGE has been accepted upstream as 43265; AUE_REALPATHAT has now
been upstreamed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
AUE_CLOSERANGE has been accepted upstream as 43265; AUE_REALPATHAT has now
been upstreamed.
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: add realpathat syscall</title>
<updated>2020-02-20T16:58:19+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mjg@FreeBSD.org</email>
</author>
<published>2020-02-20T16:58:19+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=0573d0a9b8ec478348cbf2a7ec370c875eb57dff'/>
<id>0573d0a9b8ec478348cbf2a7ec370c875eb57dff</id>
<content type='text'>
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
</pre>
</div>
</content>
</entry>
<entry>
<title>Jail and capability mode for shm_rename; add audit support for shm_rename</title>
<updated>2019-11-18T13:31:16+00:00</updated>
<author>
<name>David Bright</name>
<email>dab@FreeBSD.org</email>
</author>
<published>2019-11-18T13:31:16+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=2d5603fe6507ec4d773de4f37d61bd6187e42208'/>
<id>2d5603fe6507ec4d773de4f37d61bd6187e42208</id>
<content type='text'>
Co-mingling two things here:

  * Addressing some feedback from Konstantin and Kyle re: jail,
    capability mode, and a few other things
  * Adding audit support as promised.

The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.

Submitted by:	Matthew Bryan &lt;matthew.bryan@isilon.com&gt;
Reviewed by:	kib
Relnotes:	Yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22083
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-mingling two things here:

  * Addressing some feedback from Konstantin and Kyle re: jail,
    capability mode, and a few other things
  * Adding audit support as promised.

The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.

Submitted by:	Matthew Bryan &lt;matthew.bryan@isilon.com&gt;
Reviewed by:	kib
Relnotes:	Yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22083
</pre>
</div>
</content>
</entry>
<entry>
<title>Create new EINTEGRITY error with message "Integrity check failed".</title>
<updated>2019-01-17T06:35:45+00:00</updated>
<author>
<name>Kirk McKusick</name>
<email>mckusick@FreeBSD.org</email>
</author>
<published>2019-01-17T06:35:45+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=88640c0e8b6f503426cce9ea1337098c241d3801'/>
<id>88640c0e8b6f503426cce9ea1337098c241d3801</id>
<content type='text'>
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An integrity check such as a check-hash or a cross-correlation failed.
The integrity error falls between EINVAL that identifies errors in
parameters to a system call and EIO that identifies errors with the
underlying storage media. EINTEGRITY is typically raised by intermediate
kernel layers such as a filesystem or an in-kernel GEOM subsystem when
they detect inconsistencies. Uses include allowing the mount(8) command
to return a different exit value to automate the running of fsck(8)
during a system boot.

These changes make no use of the new error, they just add it. Later
commits will be made for the use of the new error number and it will
be added to additional manual pages as appropriate.

Reviewed by:    gnn, dim, brueffer, imp
Discussed with: kib, cem, emaste, ed, jilles
Differential Revision: https://reviews.freebsd.org/D18765
</pre>
</div>
</content>
</entry>
</feed>
