aboutsummaryrefslogtreecommitdiff
path: root/lib/libc/sys/Symbol.map
Commit message (Collapse)AuthorAgeFilesLines
* Add kcmp(2) userspace bitsKonstantin Belousov2024-02-111-0/+4
| | | | (cherry picked from commit 211bdd601ee51f90da9b123807ef68ac122116b9)
* Add membarrier(2)Konstantin Belousov2023-10-261-0/+1
| | | | (cherry picked from commit 4a69fc16a583face922319c476f3e739d9ce9140)
* Remove $FreeBSD$: one-line .h patternWarner Losh2023-08-231-1/+0
| | | | | | | Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/ Similar commit in main: (cherry picked from commit 42b388439bd3)
* Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSDKonstantin Belousov2023-04-161-1/+1
| | | | (cherry picked from commit dac310248826c37b60306c1b25fb94c35802196d)
* kqueue1(2): export the symbol from libcKonstantin Belousov2023-04-161-0/+1
| | | | (cherry picked from commit 375732cc6e462ca160654886f0411d2950768a8b)
* Export _mmap and __sys_mmap from libc.soAlex Richardson2022-05-071-0/+2
| | | | | | | | | | | | Unlike the other syscalls these two symbols were missing from the version script. I noticed this while looking into the compiler-rt runtime libraries for CHERI. Reviewed by: brooks Obtained from: https://github.com/CTSRD-CHERI/cheribsd/pull/1063 MFC after: 3 days (cherry picked from commit 395db99f32bc615a3df2cd469e9537938d022c88)
* swapoff: add one more variant of the syscallKonstantin Belousov2021-12-201-1/+1
| | | | | | For MFC, COMPAT_FREEBSD13 braces were removed. (cherry picked from commit 5346570276a5ddfd5f530201fcbf24ddcc53033d)
* Add _Fork()Konstantin Belousov2021-08-121-0/+4
| | | | (cherry picked from commit 49ad342cc10cba14b3a40ba26cf8bb2150e2925a)
* libthr: wrap pdfork(2), same as fork(2).Konstantin Belousov2021-01-111-0/+1
| | | | | | | | | | Without wrapping, rtld services and malloc(3) are not guaranteed to operate correctly in the forked child. Reviewed by: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28088
* Add aio_writev and aio_readvAlan Somers2021-01-031-0/+2
| | | | | | | | | | | | | | POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but they're an obvious extension. They work just like their synchronous equivalents pwritev and preadv. It isn't yet possible to use vectored aiocbs with lio_listio, but that could be added in the future. Reviewed by: jhb, kib, bcr Relnotes: yes Differential Revision: https://reviews.freebsd.org/D27743
* Add shm_create_largepage(3) helper for creation and configuration ofKonstantin Belousov2020-09-091-0/+2
| | | | | | | | | | | | | | | | largepage shm objects. And since we can, add memfd_create(MFD_HUGETLB) support, hopefully close enough to the Linux feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652 Notes: svn path=/head/; revision=365524
* Add an entry to Symbol.map for the rpctls_syscall added by r361599.Rick Macklem2020-05-281-0/+1
| | | | | | | | Reviewed by: brooks Differential Revision: https://reviews.freebsd.org/D24949 Notes: svn path=/head/; revision=361603
* Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_rangeKyle Evans2020-04-141-2/+0
| | | | | | | | | | | Include a temporarily compatibility shim as well for kernels predating close_range, since closefrom is used in some critical areas. Reviewed by: markj (previous version), kib Differential Revision: https://reviews.freebsd.org/D24399 Notes: svn path=/head/; revision=359930
* Implement a close_range(2) syscallKyle Evans2020-04-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627 Notes: svn path=/head/; revision=359836
* Add a way to manage thread signal mask using shared word, instead of syscall.Konstantin Belousov2020-02-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773 Notes: svn path=/head/; revision=357693
* Add an shm_rename syscallDavid Bright2019-09-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Add an atomic shm rename operation, similar in spirit to a file rename. Atomically unlink an shm from a source path and link it to a destination path. If an existing shm is linked at the destination path, unlink it as part of the same atomic operation. The caller needs the same permissions as shm_unlink to the shm being renamed, and the same permissions for the shm at the destination which is being unlinked, if it exists. If those fail, EACCES is returned, as with the other shm_* syscalls. truss support is included; audit support will come later. This commit includes only the implementation; the sysent-generated bits will come in a follow-on commit. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jilles (earlier revision) Reviewed by: brueffer (manpages, earlier revision) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21423 Notes: svn path=/head/; revision=352747
* Add linux-compatible memfd_createKyle Evans2019-09-251-0/+1
| | | | | | | | | | | | | | | | | memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional CLOEXEC and file sealing support. This is used by some mesa parts, some linux libs, and qemu can also take advantage of it and uses the sealing to prevent resizing the region. This reimplements shm_open in terms of shm_open2(2) at the same time. shm_open(2) will be moved to COMPAT12 shortly. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21393 Notes: svn path=/head/; revision=352703
* Add sysctlbyname system callMateusz Guzik2019-09-031-0/+1
| | | | | | | | | | | | | | | Previously userspace would issue one syscall to resolve the sysctl and then another one to actually use it. Do it all in one trip. Fallback is provided in case newer libc happens to be running on an older kernel. Submitted by: Pawel Biernacki Reported by: kib, brooks Differential Revision: https://reviews.freebsd.org/D17282 Notes: svn path=/head/; revision=351729
* Add libc support for the copy_file_range(2) syscall added by r350315.Rick Macklem2019-07-251-0/+1
| | | | | | | | | | | copy_file_range.2 is a new man page (content change). Reviewed by: kib, asomers Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584 Notes: svn path=/head/; revision=350317
* Introduce funlinkat syscall that always us to check if we are removingMariusz Zaborski2019-04-061-0/+1
| | | | | | | | | | | | the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567 Notes: svn path=/head/; revision=345982
* Add new file handle system calls.Konstantin Belousov2018-12-071-0/+7
| | | | | | | | | | | | | | | Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2). The syscalls are provided for a NFS userspace server (nfs-ganesha). Submitted by: Jack Halford <jack@gandi.net> Sponsored by: Gandi.net Tested by: pho Feedback from: brooks, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18359 Notes: svn path=/head/; revision=341689
* Get rid of netbsd_lchown and netbsd_msync syscall entries.Brooks Davis2018-07-101-6/+0
| | | | | | | | | | | | | | | No valid FreeBSD binary very called them (they would call lchown and msync directly) and we haven't supported NetBSD binaries in ages. This is a respin of r335983 with a workaround for the ancient BFD linker in the libc stubs. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16193 Notes: svn path=/head/; revision=336171
* Revert r335983.Brooks Davis2018-07-051-0/+6
| | | | | | | | The bfd linker in tree doesn't support multiple names for the same symbol (at least with current flags). Notes: svn path=/head/; revision=335990
* Get rid of netbsd_lchown and netbsd_msync syscall entries.Brooks Davis2018-07-051-6/+0
| | | | | | | | | | | | No valid FreeBSD binary ever called them (they would call lchown and msync directly) and we haven't supported NetBSD binaries in ages. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15814 Notes: svn path=/head/; revision=335983
* Make vadvise compat freebsd11.Brooks Davis2018-05-251-2/+0
| | | | | | | | | | | | | The vadvise syscall (aka ovadvise) is undocumented and has always been implmented as returning EINVAL. Put the syscall under COMPAT11 and provide a userspace implementation. Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15557 Notes: svn path=/head/; revision=334223
* getentropy(3): Fallback to kern.arandom sysctl on older kernelsConrad Meyer2018-03-211-2/+0
| | | | | | | | | | | | | | | | | | | On older kernels, when userspace program disables SIGSYS, catch ENOSYS and emulate getrandom(2) syscall with the kern.arandom sysctl (via existing arc4_sysctl wrapper). Special care is taken to faithfully emulate EFAULT on NULL pointers, because sysctl(3) as used by kern.arandom ignores NULL oldp. (This was caught by getentropy(3) ATF tests.) Reported by: kib Reviewed by: kib Discussed with: delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14785 Notes: svn path=/head/; revision=331334
* Implement getrandom(2) and getentropy(3)Conrad Meyer2018-03-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general idea here is to provide userspace programs with well-defined sources of entropy, in a fashion that doesn't require opening a new file descriptor (ulimits) or accessing paths (/dev/urandom may be restricted by chroot or capsicum). getrandom(2) is the more general API, and comes from the Linux world. Since our urandom and random devices are identical, the GRND_RANDOM flag is ignored. getentropy(3) is added as a compatibility shim for the OpenBSD API. truss(1) support is included. Tests for both system calls are provided. Coverage is believed to be at least as comprehensive as LTP getrandom(2) test coverage. Additionally, instructions for running the LTP tests directly against FreeBSD are provided in the "Test Plan" section of the Differential revision linked below. (They pass, of course.) PR: 194204 Reported by: David CARLIER <david.carlier AT hardenedbsd.org> Discussed with: cperciva, delphij, jhb, markj Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D14500 Notes: svn path=/head/; revision=331279
* Implement 'domainset', a cpuset based NUMA policy mechanism. This allowsJeff Roberson2018-01-121-0/+6
| | | | | | | | | | | | | | | | | | | userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403 Notes: svn path=/head/; revision=327895
* Remove some private symbols from librtAlan Somers2017-07-201-9/+0
| | | | | | | | | | | | | | | Private functions like __aio_read and _aio_read were exposed in FBSDprivate_1.0 by r169090, even though they've never been used outside of librt. Also, remove some weak references from r156136 that have never resolved. Reviewed by: kib MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D11649 Notes: svn path=/head/; revision=321295
* Add abstime kqueue(2) timers and expand struct kevent members.Konstantin Belousov2017-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which specifies that the data field contains absolute time to fire the event. To make this useful, data member of the struct kevent must be extended to 64bit. Using the opportunity, I also added ext members. This changes struct kevent almost to Apple struct kevent64, except I did not changed type of ident and udata, the later would cause serious API incompatibilities. The type of ident was kept uintptr_t since EVFILT_AIO returns a pointer in this field, and e.g. CHERI is sensitive to the type (discussed with brooks, jhb). Unlike Apple kevent64, symbol versioning allows us to claim ABI compatibility and still name the new syscall kevent(2). Compat shims are provided for both host native and compat32. Requested by: bapt Reviewed by: bapt, brooks, ngie (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D11025 Notes: svn path=/head/; revision=320043
* Commit the 64-bit inode project.Konstantin Belousov2017-05-231-28/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439 Notes: svn path=/head/; revision=318736
* Make space style consistent with earlier entries.Xin LI2017-03-201-2/+2
| | | | | | | X-MFC with: r315526 Notes: svn path=/head/; revision=315615
* Add clock_nanosleep()Eric van Gyzen2017-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Reviewed by: kib, ngie, jilles MFC after: 3 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10020 Notes: svn path=/head/; revision=315526
* Garbage collect _umtx_lock(2)/_umtx_unlock(2) references removed in r263318.Bryan Drewery2016-08-171-6/+0
| | | | | | | | | | This has no real impact on the resulting libc.so file. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=304288
* Add an implementation of fdatasync(2).Konstantin Belousov2016-08-151-0/+6
| | | | | | | | | | | | | | | | | | | | The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing code with fsync(2). For all filesystems, this commit provides the implementation which delegates the work of VOP_FDATASYNC() to VOP_FSYNC(). This is functionally correct but not efficient. This is not yet POSIX-compliant implementation, because it does not ensure that queued AIO requests are completed before returning. Reviewed by: mckusick Discussed with: avg (ZFS), jhb (AIO part) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471 Notes: svn path=/head/; revision=304176
* Remove Symbol.map entries for old AIO system calls for FreeBSD 6 compat.John Baldwin2016-03-121-9/+0
| | | | | | | | | | | | | | These entries should have never been present since they only exist for compat with FreeBSD 6.x (and older) binaries. This was missed in r296572. Technically this breaks the ABI by removing versioned symbols. However, no binaries should be linked against these symbols. No release has shipped with a header that contained a prototype for these functions. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5615 Notes: svn path=/head/; revision=296714
* Add implementations of sendmmsg(3) and recvmmsg(3) functions whichKonstantin Belousov2016-01-291-0/+2
| | | | | | | | | | | | | | | | | | | | | wraps sendmsg(2) and recvmsg(2) into batch send and receive operation. The goal of this implementation is only to provide API compatibility with Linux. The cancellation behaviour of the functions is not quite right, but due to relative rare use of cancellation it is considered acceptable comparing with the complexity of the correct implementation. If functions are reimplemented as syscalls, the fix would come almost trivial. The direct use of the syscall trampolines instead of libc wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on cancellation. Submitted by: Boris Astardzhiev <boris.astardzhiev@gmail.com> Discussed with: jilles (cancellation behaviour) MFC after: 1 month Notes: svn path=/head/; revision=295039
* Remove a stale comment and clarify the original where it was taken fromPedro F. Giffuni2015-08-141-2/+2
| | | | | | | | | | | | The comment in the libc/sys symbol map referenced the generated symbols for the syscall trampolines. Such comment was out of place in the secure symbol map so remove the stale comment and attempt to clarify the old one to avoid risks of confusion. Pointed out by: kib Notes: svn path=/head/; revision=286782
* Move the stack protector to a new "secure" directoryPedro F. Giffuni2015-08-141-3/+0
| | | | | | | | | | | | | As part of the code refactoring to support FORTIFY_SOURCE we want a new subdirectory "secure" to keep the files related to security. Move the stack protector functions to this new directory. No functional change. Differential Review: https://reviews.freebsd.org/D3333 Notes: svn path=/head/; revision=286760
* Add an initial NUMA affinity/policy configuration for threads and processes.Adrian Chadd2015-07-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell Notes: svn path=/head/; revision=285387
* Add futimens and utimensat system calls.Jilles Tjoelker2015-01-231-0/+2
| | | | | | | | | | | | | | | | | The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes Notes: svn path=/head/; revision=277610
* Avoid calling internal libc function through PLT or accessing dataKonstantin Belousov2015-01-051-2/+1
| | | | | | | | | | | | though GOT, by staticizing and hiding. Add setter for __error_selector to hide it as well. Suggested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=276681
* Fix known issues which blow up the process after dlopen("libthr.so")Konstantin Belousov2015-01-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (or loading a dso linked to libthr.so into process which was not linked against threading library). - Remove libthr interposers of the libc functions, including __error(). Instead, functions calls are indirected through the interposing table, similar to how pthread stubs in libc are already done. Libc by default points either to syscall trampolines or to existing libc implementations. On libthr load, libthr rewrites the pointers to the cancellable implementations already in libthr. The interposition table is separate from pthreads stubs indirection table to not pull pthreads stubs into static binaries. - Postpone the malloc(3) internal mutexes initialization until libthr is loaded. This avoids recursion between calloc(3) and static pthread_mutex_t initialization. - Reinstall signal handlers with wrapper on libthr load. The _rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2) when libthr is statically referenced from the main binary. In the process, fix openat(2), swapcontext(2) and setcontext(2) interposing. The libc symbols were exported at different versions than libthr interposers. Export both libc and libthr versions from libc now, with default set to the higher version from libthr. Remove unused and disconnected swapcontext(3) userspace implementation from libc/gen. No objections from: deischen Tested by: pho, antoine (exp-run) (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=276630
* Add the ppoll() system call.Dmitry Chagin2014-11-131-0/+6
| | | | | | | | | | | Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month Notes: svn path=/head/; revision=274462
* Extend the support for exempting processes from being killed when swap isJohn Baldwin2013-09-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month Notes: svn path=/head/; revision=255708
* Change the cap_rights_t type from uint64_t to a structure that we can extendPawel Jakub Dawidek2013-09-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=255219
* Add new system call - aio_mlock(). The name speaks for itself. It allowsGleb Smirnoff2013-06-081-0/+1
| | | | | | | | | | | to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=251526
* Add pipe2() system call.Jilles Tjoelker2013-05-011-0/+1
| | | | | | | | | | | | | | | | The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib Notes: svn path=/head/; revision=250159
* Add accept4() system call.Jilles Tjoelker2013-05-011-0/+3
| | | | | | | | | | | | | | | | | | The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode. Notes: svn path=/head/; revision=250154
* Implement chflagsat(2) system call, similar to fchmodat(2), but operates onPawel Jakub Dawidek2013-03-211-0/+1
| | | | | | | | | | file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=248599