aboutsummaryrefslogtreecommitdiff
path: root/sys/kern/capabilities.conf
Commit message (Collapse)AuthorAgeFilesLines
* Add aio_writev and aio_readvAlan Somers2021-01-031-0/+2
| | | | | | | | | | | | | | POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but they're an obvious extension. They work just like their synchronous equivalents pwritev and preadv. It isn't yet possible to use vectored aiocbs with lio_listio, but that could be added in the future. Reviewed by: jhb, kib, bcr Relnotes: yes Differential Revision: https://reviews.freebsd.org/D27743
* Expose eventfd in the native API/ABI using a new __specialfd syscallKonstantin Belousov2020-12-271-0/+5
| | | | | | | | | | | | | | | | | | | | eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software. Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does. Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
* Permit cpuset_(get|set)domain() in capability mode.Mark Johnston2020-07-061-0/+2
| | | | | | | | | | | These system calls already perform validation of their parameters when called in capability mode, identical to cpuset_(get|set)affinity(). MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362970
* Implement a close_range(2) syscallKyle Evans2020-04-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627 Notes: svn path=/head/; revision=359836
* capabilities.conf: provide information about capmode permitted syscallsEd Maste2020-03-301-0/+5
| | | | | | | | | | Reviewed by: jhb (earlier) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24118 Notes: svn path=/head/; revision=359451
* Allow getloginclass in capability modeEd Maste2020-02-121-0/+1
| | | | | | | | | | | As with e.g. getgroups and getlogin it allows querying current process credential state. Reported by: sigsys@gmail.com via kevans Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=357838
* Allow fdatasync in capability modeEd Maste2020-02-121-0/+1
| | | | | | | | | | | | fdatasync is essentially a subset of fsync (and may be exactly fsync, depending on filesystem and development effort) and operates only on a provided fd. MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=357831
* Add a way to manage thread signal mask using shared word, instead of syscall.Konstantin Belousov2020-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773 Notes: svn path=/head/; revision=357693
* Add a shm_open2 syscall to support upcoming memfd_createKyle Evans2019-09-251-0/+1
| | | | | | | | | | | | | | | | | | shm_open2 allows a little more flexibility than the original shm_open. shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate shmflag argument that can be expanded later. Currently the only shmflag is to allow file sealing on the returned fd. shm_open and memfd_create will both be implemented in libc to use this new syscall. __FreeBSD_version is bumped to indicate the presence. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21393 Notes: svn path=/head/; revision=352700
* Add sysctlbyname system callMateusz Guzik2019-09-031-0/+1
| | | | | | | | | | | | | | | Previously userspace would issue one syscall to resolve the sysctl and then another one to actually use it. Do it all in one trip. Fallback is provided in case newer libc happens to be running on an older kernel. Submitted by: Pawel Biernacki Reported by: kib, brooks Differential Revision: https://reviews.freebsd.org/D17282 Notes: svn path=/head/; revision=351729
* Enable copy_file_range(2) in capability mode.Mark Johnston2019-07-301-0/+5
| | | | | | | | | | | | | copy_file_range() operates on a pair of file descriptors; it requires CAP_READ for the source descriptor and CAP_WRITE for the destination descriptor. Reviewed by: kevans, oshogbo Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21113 Notes: svn path=/head/; revision=350447
* Introduce funlinkat syscall that always us to check if we are removingMariusz Zaborski2019-04-061-0/+1
| | | | | | | | | | | | the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567 Notes: svn path=/head/; revision=345982
* capsicum: allow ppoll(2) in capability modeMariusz Zaborski2018-11-041-1/+1
| | | | | | | | | | | | We already allow to use poll(2). There is no reason to disallow ppoll(2). PR: 232495 Submitted by: Stefan Grundmann <sg2342@googlemail.com> Reviewed by: cem, oshogbo MFC after: 2 weeks Notes: svn path=/head/; revision=340129
* getrandom(2) should not be restricted in capability mode.Xin LI2018-08-181-0/+5
| | | | Notes: svn path=/head/; revision=337998
* Name the implementation of brk and sbrk sys_break().Brooks Davis2018-06-141-1/+1
| | | | | | | | | | | | | | | The break() system call was renamed (several times) starting in v3 AT&T UNIX when C was invented and break was a language keyword. The last vestage of a need for it to be called something else (eg obreak) was removed in r225617 which consistantly prefixed all syscall implementations. Reviewed by: emaste, kib (older version) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15638 Notes: svn path=/head/; revision=335177
* allow posix_fallocate in capability modeEd Maste2017-10-121-0/+1
| | | | | | | | | | | | | | | posix_fallocate is logically equivalent to writing zero blocks to the desired file size and there is no reason to prevent calling it in capability mode. posix_fallocate already checked for the CAP_WRITE right, so we merely need to list it in capabilities.conf. Reviewed by: allanjude MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D12640 Notes: svn path=/head/; revision=324560
* Correct sysent flags for dynamically loaded syscalls.Konstantin Belousov2017-07-141-0/+4
| | | | | | | | | | | | | | | | | | | Using the https://github.com/google/capsicum-test/ suite, the PosixMqueue.CapModeForked test was failing due to an ECAPMODE after calling kmq_notify(). On further inspection, the dynamically loaded syscall entry was initialized with sy_flags zeroed out, since SYSCALL_INIT_HELPER() left sysent.sy_flags with the default value. Add a new helper SYSCALL{,32}_INIT_HELPER_F() which takes an additional argument to specify the sy_flags value. Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11576 Notes: svn path=/head/; revision=320982
* Allow cpuset_{get,set}affinity in capabilities modeAllan Jude2017-05-241-4/+3
| | | | | | | | | | | | | bhyve was recently sandboxed with capsicum, and needs to be able to control the CPU sets of its vcpu threads Reviewed by: emaste, oshogbo, rwatson MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D10170 Notes: svn path=/head/; revision=318765
* Commit the 64-bit inode project.Konstantin Belousov2017-05-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439 Notes: svn path=/head/; revision=318736
* disallow open(2) in capability modeEd Maste2017-05-221-8/+0
| | | | | | | | | | | | | | | | | | | | Previously open(2) was allowed in capability mode, with a comment that suggested this was likely the case to facilitate debugging. The system call would still fail later on, but it's better to disallow the syscall altogether. We now have the kern.trap_enotcap sysctl or PROC_TRAPCAP_CTL proccontrol to aid in debugging. In any case libc has translated open() to the openat syscall since r277032. Reviewed by: kib, rwatson Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10850 Notes: svn path=/head/; revision=318634
* Update capabilities.conf commentEd Maste2016-09-081-0/+4
| | | | | | | getdtablesize is per-process state, not global state Notes: svn path=/head/; revision=305611
* Allow getdtablesize in capability modeEd Maste2016-08-311-0/+1
| | | | | | | | | | | | | getdtablesize is "trivial global state" and is similar to getrlimit(RLIMIT_NOFILE), so should be permitted in capability mode. Reviewed by: oshogbo MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7719 Notes: svn path=/head/; revision=305140
* Remove unusedd and obsolete openbsd_poll system call. (Phase 1)George V. Neville-Neil2016-08-181-8/+0
| | | | | | | | | Reported by: brooks Reviewed by: brooks,jhb Differential Revision: https://reviews.freebsd.org/D7548 Notes: svn path=/head/; revision=304395
* Garbage collect _umtx_lock(2)/_umtx_unlock(2) references removed in r263318.Bryan Drewery2016-08-171-2/+0
| | | | | | | | | | This has no real impact on the resulting libc.so file. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=304288
* Add futimens and utimensat system calls.Jilles Tjoelker2015-01-231-1/+3
| | | | | | | | | | | | | | | | | The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes Notes: svn path=/head/; revision=277610
* Allow sigwait(2) in capabilities mode.Christian S.J. Peron2014-01-281-0/+1
| | | | | | | | | | | | It's common for multi-threaded processes to create a thread for the purpose of synchronously processing signals. Allow such processes to utilize a capabilities sandbox. Discussed with: rwatson, pjd MFC after: 2 weeks Notes: svn path=/head/; revision=261220
* Allow for pselect(2) in capability mode.Pawel Jakub Dawidek2013-12-151-1/+2
| | | | | | | Noticed by: David Drysdale <drysdale@google.com> Notes: svn path=/head/; revision=259436
* - Remove mac_get_fd/mac_set_fd - those are not syscalls. The __mac_get_fd() andPawel Jakub Dawidek2013-11-061-8/+1
| | | | | | | | | | | | __mac_set_fd() syscalls are listed earlier. - Correct typo in syscall name. It should be sched_rr_get_interval, not sched_rr_getinterval. Submitted by: David Drysdale <drysdale@google.com> MFC after: 3 days Notes: svn path=/head/; revision=257736
* Sort properly.Pawel Jakub Dawidek2013-09-071-1/+1
| | | | Notes: svn path=/head/; revision=255374
* Change the cap_rights_t type from uint64_t to a structure that we can extendPawel Jakub Dawidek2013-09-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=255219
* Add pipe2() system call.Jilles Tjoelker2013-05-011-0/+1
| | | | | | | | | | | | | | | | The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib Notes: svn path=/head/; revision=250159
* Add accept4() system call.Jilles Tjoelker2013-05-011-0/+1
| | | | | | | | | | | | | | | | | | The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode. Notes: svn path=/head/; revision=250154
* Implement chflagsat(2) system call, similar to fchmodat(2), but operates onPawel Jakub Dawidek2013-03-211-0/+1
| | | | | | | | | | file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=248599
* Sort syscalls properly.Pawel Jakub Dawidek2013-03-151-1/+1
| | | | Notes: svn path=/head/; revision=248359
* - Implement two new system calls:Pawel Jakub Dawidek2013-03-021-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des Notes: svn path=/head/; revision=247667
* Merge Capsicum overhaul:Pawel Jakub Dawidek2013-03-021-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib Notes: svn path=/head/; revision=247602
* Allow to use kill(2) in capability mode, but process can send a signal onlyPawel Jakub Dawidek2012-11-271-0/+5
| | | | | | | | | | | | to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT) which was failing in capability mode, so the code was failing back to exit(1). Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks Notes: svn path=/head/; revision=243610
* Add missing system calls.Pawel Jakub Dawidek2012-05-311-0/+5
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=236361
* There is no rmdirat system call. Weird, I know.Pawel Jakub Dawidek2012-05-311-1/+0
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=236360
* Add experimental support for process descriptorsJonathan Anderson2011-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc Notes: svn path=/head/; revision=224987
* Trim some warnings and notes from capabilities.conf -- these are left overRobert Watson2011-08-131-6/+1
| | | | | | | | | | from Capsicum development, and no longer apply. Approved by: re (kib) Sponsored by: Google Inc Notes: svn path=/head/; revision=224852
* Allow openat(2), fstatat(2), etc. in capability mode.Jonathan Anderson2011-08-131-25/+19
| | | | | | | | | | | | | | | | | | | | | | namei() and lookup() can now perform "strictly relative" lookups. Such lookups, performed when in capability mode or when looking up relative to a directory capability, enforce two policies: - absolute paths are disallowed (including symlinks to absolute paths) - paths containing '..' components are disallowed These constraints make it safe to enable openat() and friends. These system calls are instrumental in supporting Capsicum components such as the capability-mode-aware runtime linker. Finally, adjust comments in capabilities.conf to reflect the actual state of the world (e.g. shm_open(2) already has the appropriate constraints, getdents(2) already requires CAP_SEEK). Approved by: re (bz), mentor (rwatson) Sponsored by: Google Inc. Notes: svn path=/head/; revision=224812
* Continue to introduce Capsicum Capability Mode support:Robert Watson2011-03-011-0/+756
Add a new system call flag, SYF_CAPENABLED, which indicates that a particular system call is available in capability mode. Add a new configuration file, kern/capabilities.conf (similar files may be introduced for other ABIs in the future), which enumerates system calls that are available in capability mode. When a new system call is added to syscalls.master, it will also need to be added here (if needed). Teach sysent parts to use this file to set values for SYF_CAPENABLED for the native ABI. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months Notes: svn path=/head/; revision=219131