path: root/sys/sys/proc.h
Commit message (Collapse)AuthorAgeFilesLines
* Use atomic loads/stores when updating td->td_stateAlex Richardson2021-02-181-11/+23
| | | | | | | | | | | | | | | KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569
* jobc: rework detection of orphaned groups.Konstantin Belousov2021-01-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Instead of trying to maintain pg_jobc counter on each process group update (and sometimes before), just calculate the counter when needed. Still, for the benefit of the signal delivery code, explicitly mark orphaned groups as such with the new process group flag. This way we prevent bugs in the corner cases where updates to the counter were missed due to complicated configuration of p_pptr/p_opptr/real_parent (debugger). Since we need to iterate over all children of the process on exit, this change mostly affects the process group entry and leave, where we need to iterate all process group members to detect orpaned status. (For MFC, keep pg_jobc around but unused). Reported by: jhb Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
* pgrp: Prevent use after free.Konstantin Belousov2021-01-101-1/+1
| | | | | | | | | | | | | | | | Often, we have a process locked and need to get locked process group. In this case, because progress group lock is before process lock, unlocking process allows the group to be freed. See for instance tty_wait_background(). Make pgrp structures allocated from nofree zone, and ensure type stability of the pgrp mutex. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
* proc.h: Reformat P_ and P2_ definitions.Konstantin Belousov2020-12-181-45/+66
| | | | | | | | | | | | Use traditional explicit leading zero format for hex numbers. Align P2_ hex values. Wrap long lines by splitting comments. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=368771
* Add a kstack_contains() helper function.John Baldwin2020-12-011-0/+7
| | | | | | | | | | | | | This is useful for stack unwinders which need to avoid out-of-bounds reads of a kernel stack which can trigger kernel faults. Reviewed by: kib, markj Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27356 Notes: svn path=/head/; revision=368240
* thread: staticize thread_reap and move td_allocdomainMateusz Guzik2020-11-261-1/+0
| | | | | | | | thread_init is a much better fit as the the value is constant after initialization. Notes: svn path=/head/; revision=368048
* thread: stash domain id to work around vtophys problems on ppc64Mateusz Guzik2020-11-231-0/+1
| | | | | | | | | | | | | Adding to zombie list can be perfomed by idle threads, which on ppc64 leads to panics as it requires a sleepable lock. Reported by: alfredo Reviewed by: kib, markj Fixes: r367842 ("thread: numa-aware zombie reaping") Differential Revision: https://reviews.freebsd.org/D27288 Notes: svn path=/head/; revision=367961
* linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILESConrad Meyer2020-11-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | The two flags are distinct and it is impossible to correctly handle clone(2) without the assistance of fork1(). This change depends on the pwddesc split introduced in r367777. I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd should be treated the opposite way p_fd is (based on RFFDG flag). This is a little ugly, but the benefit is that existing RFFDG API is preserved. Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are copied, while !RFFDG indicates both should be cloned. In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects independent fd tables. The previous conflation of CLONE_FS and CLONE_FILES was introduced in r163371 (2006). Discussed with: markj, trasz (earlier version) Differential Revision: https://reviews.freebsd.org/D27016 Notes: svn path=/head/; revision=367778
* Split out cwd/root/jail, cmask state from filedesc tableConrad Meyer2020-11-171-0/+1
| | | | | | | | | | | | | | | | No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037 Notes: svn path=/head/; revision=367777
* sys/proc.h: improve comment for new TDP2 flagKyle Evans2020-11-171-1/+1
| | | | | | | | This was suggested by kib and integrated locally, but somehow did not make it into the committed version. Notes: svn path=/head/; revision=367745
* umtx_op: reduce redundancy required for compat32Kyle Evans2020-11-171-0/+1
| | | | | | | | | | | | | | | | | | All of the compat32 variants are substantially the same, save for copyin/copyout (mostly). Apply the same kind of technique used with kevent here by having the syscall routines supply a umtx_copyops describing the operations needed. umtx_copyops carries the bare minimum needed- size of timespec and _umtx_time are used for determining if copyout is needed in the sem2_wait case. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27222 Notes: svn path=/head/; revision=367744
* thread: lockless zombie list manipulationMateusz Guzik2020-11-111-1/+4
| | | | | | | | | | This gets rid of the most contended spinlock seen when creating/destroying threads in a loop. (modulo kstack) Tested by: alfredo (ppc64), bdragon (ppc64) Notes: svn path=/head/; revision=367597
* thread: rework tidhash vs proc lock interactionMateusz Guzik2020-11-111-4/+0
| | | | | | | | Apart from minor clean up this gets rid of proc unlock/lock cycle on thread exit to work around LOR against tidhash lock. Notes: svn path=/head/; revision=367584
* thread: fix thread0 tid allocationMateusz Guzik2020-11-111-0/+1
| | | | | | | | | | Startup code hardcodes the value instead of allocating it. The first spawned thread would then be a duplicate. Pointy hat: mjg Notes: svn path=/head/; revision=367583
* thread: retire thread_findMateusz Guzik2020-11-101-1/+0
| | | | | | | tdfind should be used instead. Notes: svn path=/head/; revision=367544
* sys: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365223
* Fix several issues with process group orphanage.Konstantin Belousov2020-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt of adding assertions that pgrp->pg_jobc counters do not underflow in r361967, reverted in r362910, points out bugs in the handling of job control. Peter Holm was able to narrow down the problem to very easy reproduction with timeout(1) which uses reaping. The following list of problems with calculation of pg_jobs which directs SIGHUP/SIGCONT delivery for orphaned process group was identified: - Re-calculation of the orphaned status for children of exiting parent was wrong, but mostly unnoticed when all children were reparented to init(8). When child can be reparented to a different process which could affect the child' job control state, it was not properly accounted for in pg_jobc. - Lockless check for exiting process' parent process group is racy because nothing prevents the parent from changing its group membership. - Exited process is left in the process group, until waited. This affects other calculations of pg_jobc. Split handling of job control status on process changing its process group, and process exiting. Calculate increments and decrements for pg_jobs by exact checking the orphanage instead of assuming process group membership for children and parent. Move the call to killjobc() later under the proctree_lock. Mark exiting process in killjobc() with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc calculations after the flag is set. Add checker that independently recalculates pg_jobc value and compares it with the memoized process group state. This is enabled under INVARIANTS. Reviewed by: jilles Discussed with: kevans Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D26116 Notes: svn path=/head/; revision=364495
* cred: distribute reference count per threadMateusz Guzik2020-06-091-1/+3
| | | | | | | | | | | This avoids dirtying creds in the common case, see the comment in kern_prot.c for details. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D24007 Notes: svn path=/head/; revision=361993
* Use a single VM object for kernel stacks.Mark Johnston2020-04-261-1/+0
| | | | | | | | | | | | | | | | | | | | Previously we allocated a separate VM object for each kernel stack. However, fully constructed kernel stacks are cached by UMA, so there is no harm in using a single global object for all stacks. This reduces memory consumption and makes it easier to define a memory allocation policy for kernel stack pages, with the aim of reducing physical memory fragmentation. Add a global kstack_object, and use the stack KVA address to index into the object like we do with kernel_object. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24473 Notes: svn path=/head/; revision=360354
* Remove the old NFS lock device driver that uses Giant.Rick Macklem2020-04-091-2/+0
| | | | | | | | | | | | | | | | | This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933 Notes: svn path=/head/; revision=359745
* Retire procfs-based process debugging.John Baldwin2020-04-011-21/+0
| | | | | | | | | | | | | | | | | | Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837 Notes: svn path=/head/; revision=359530
* Fix NFS client deadlock when read reports truncated node.Konstantin Belousov2020-02-221-0/+2
| | | | | | | | | | | | | | | | | | | | | If node attribute returned in the reply for read rpc indicate truncation, and it happens that the vnode is exclusively locked, update of the node attributes would try to shrink vnode size. Since during the read some vnode pages were busied by the reading thread, vnode_pager_setsize() deadlocks waiting for the busy state owned by the caller. Use a thread-local flag to indicate that NFS read owns some (s)busy pages states and postpone the call to vnode_pager_setsize() until the thread relinguishes the ownership. Diagnosed by: rlibby Tested by: pho, rlibby Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=358252
* Add td_pflags2, yet another thread-private flags word.Konstantin Belousov2020-02-221-0/+20
| | | | | | | | | | | There is no more free bits in td_pflags. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=358251
* Add a way to manage thread signal mask using shared word, instead of syscall.Konstantin Belousov2020-02-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773 Notes: svn path=/head/; revision=357693
* vfs: prealloc vnodes in getnewvnode_reserveMateusz Guzik2020-01-111-1/+1
| | | | | | | | | | | | Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations. Notes: svn path=/head/; revision=356643
* Rename umtxq_check_susp() to thread_check_susp()Konstantin Belousov2020-01-021-0/+1
| | | | | | | | | | | | and make it usable outside of kern_umtx.c. To be used in several future changes. Discussed with: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=356293
* sleep(9), sleepqueue(9): const'ify wchan pointersConrad Meyer2019-12-241-1/+1
| | | | | | | | | | | | | | | | | | _sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the channel pointers provided in any way; they are merely used as intptrs into a dictionary structure to match waiters with wakers. Correctly annotate this such that _sleep() and wakeup() may be used on const pointers without invoking ugly patterns like __DECONST(). Plumb const through all of the underlying sleepqueue bits. No functional change. Reviewed by: rlibby Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D22914 Notes: svn path=/head/; revision=356057
* schedlock 4/4Jeff Roberson2019-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778 Notes: svn path=/head/; revision=355784
* schedlock 2/4Jeff Roberson2019-12-151-1/+1
| | | | | | | | | | | Do all sleepqueue post-processing in sleepq_remove_thread() so that we do not require the thread lock after a context switch. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D22745 Notes: svn path=/head/; revision=355781
* schedlock 1/4Jeff Roberson2019-12-151-4/+21
| | | | | | | | | | | | | | | Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626 Notes: svn path=/head/; revision=355779
* Augment macros that manipulate td_no_sleeping with assertions to checkGleb Smirnoff2019-10-291-5/+10
| | | | | | | | | underleak and overflow of the counter. Reviewed by: kib Notes: svn path=/head/; revision=354147
* Remove epoch tracker from struct thread. It was an ugly crutch to emulateGleb Smirnoff2019-10-211-1/+0
| | | | | | | locking semantics for if_addr_rlock() and if_maddr_rlock(). Notes: svn path=/head/; revision=353869
* Since EPOCH_TRACE had been moved to opt_global.h, we don't need to wasteGleb Smirnoff2019-10-141-0/+2
| | | | | | | extra space in struct thread. Notes: svn path=/head/; revision=353485
* rfork(2): add RFSPAWN flagKyle Evans2019-09-251-0/+2
| | | | | | | | | | | | | | When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058 Notes: svn path=/head/; revision=352711
* Add debugging facility EPOCH_TRACE that checks that epochs entered areGleb Smirnoff2019-09-251-0/+1
| | | | | | | | | | | | | | properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the intent of tracer is to help to review complex epoch-protected code paths, and we mean the network stack here. Reviewed by: hselasky Sponsored by: Netflix Pull Request: https://reviews.freebsd.org/D21610 Notes: svn path=/head/; revision=352707
* Add procctl(PROC_STACKGAP_CTL)Konstantin Belousov2019-09-031-0/+2
| | | | | | | | | | | | | | It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352 Notes: svn path=/head/; revision=351773
* proc: eliminate the zombproc listMateusz Guzik2019-08-281-2/+0
| | | | | | | | | | | | It is not needed by anything in the kernel and it slightly drives up contention on both proctree and allproc locks. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21447 Notes: svn path=/head/; revision=351572
* proc: remove zpfindMateusz Guzik2019-08-281-1/+0
| | | | | | | | | | It is not used by anything. If someone wants it back it should be reimplemented to use the proc hash. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351559
* proc: introduce the proc_add_orphan functionMariusz Zaborski2019-08-051-0/+1
| | | | | | | | | | This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month Notes: svn path=/head/; revision=350611
* proc: make clear_orphan an public APIMariusz Zaborski2019-07-291-0/+1
| | | | | | | | | | This will be useful for other patches with process descriptors. Change its name as well. Reviewed by: markj, kib Notes: svn path=/head/; revision=350429
* Always set td_errno to the error value of a system call.John Baldwin2019-07-151-2/+1
| | | | | | | | | | | | | | Early errors prior to a system call did not set td_errno. This commit sets td_errno for all errors during syscallenter(). As a result, syscallret() can now always use td_errno without checking TDP_NERRNO. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20898 Notes: svn path=/head/; revision=350012
* Control implicit PROT_MAX() using procctl(2) and the FreeBSD noteKonstantin Belousov2019-07-021-0/+2
| | | | | | | | | | | | | | | | | feature bit. In particular, allocate the bit to opt-out the image from implicit PROTMAX enablement. Provide procctl(2) verbs to set and query implicit PROTMAX handling. The knobs mimic the same per-image flag and per-process controls for ASLR. Reviewed by: emaste, markj (previous version) Discussed with: brooks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D20795 Notes: svn path=/head/; revision=349609
* Extract eventfilter declarations to sys/_eventfilter.hConrad Meyer2019-05-201-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped. Notes: svn path=/head/; revision=347984
* Remove p_code from struct proc.John Baldwin2019-04-251-1/+0
| | | | | | | | | | | | | Contrary to the comments, it was never used by core dumps or debuggers. Instead, it used to hold the signal code of a pending signal, but that was replaced by the 'ksi_code' member of ksiginfo_t when signal information was reworked in 7.0. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20047 Notes: svn path=/head/; revision=346695
* Fix the NFSv4 client to safely find processes.Rick Macklem2019-04-151-0/+3
| | | | | | | | | | | | | | | | | | r340744 broke the NFSv4 client, because it replaced pfind_locked() with a call to pfind(), since pfind() acquires the sx lock for the pid hash and the NFSv4 already holds a mutex when it does the call. The patch fixes the problem by recreating a pfind_any_locked() and adding the functions pidhash_slockall() and pidhash_sunlockall to acquire/release all of the pid hash locks. These functions are then used by the NFSv4 client instead of acquiring the allproc_lock and calling pfind(). Reviewed by: kib, mjg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19887 Notes: svn path=/head/; revision=346217
* amd64 KPTI: add control from procctl(2).Konstantin Belousov2019-03-161-0/+2
| | | | | | | | | | | | | | | | | Add the infrastructure to allow MD procctl(2) commands, and use it to introduce amd64 PTI control and reporting. PTI mode cannot be modified for existing pmap, the knob controls PTI of the new vmspace created on exec. Requested by: jhb Reviewed by: jhb, markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19514 Notes: svn path=/head/; revision=345228
* amd64: Add md process flags and first P_MD_PTI flag.Konstantin Belousov2019-03-161-0/+1
| | | | | | | | | | | | | | | | | | | | PTI mode for the process pmap on exec is activated iff P_MD_PTI is set. On exec, the existing vmspace can be reused only if pti mode of the pmap matches the P_MD_PTI flag of the process. Add MD cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can vetoed reuse of the existing vmspace. MFC note: md_flags change struct proc KBI. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19514 Notes: svn path=/head/; revision=345227
* Deanonymize thread and proc state enums, so that a userland app canGleb Smirnoff2019-03-151-2/+2
| | | | | | | | | | | | use them without redefining the value names. New clang no longer allows to redefine a enum value name to the same value. Bump __FreeBSD_version, since ports depend on that. Discussed with: jhb Notes: svn path=/head/; revision=345196
* Implement Address Space Layout Randomization (ASLR)Konstantin Belousov2019-02-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603 Notes: svn path=/head/; revision=343964
* Add support for the Clang Coverage Sanitizer in the kernel (KCOV).Andrew Turner2019-01-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building with KCOV enabled the compiler will insert function calls to probes allowing us to trace the execution of the kernel from userspace. These probes are on function entry (trace-pc) and on comparison operations (trace-cmp). Userspace can enable the use of these probes on a single kernel thread with an ioctl interface. It can allocate space for the probe with KIOSETBUFSIZE, then mmap the allocated buffer and enable tracing with KIOENABLE, with the trace mode being passed in as the int argument. When complete KIODISABLE is used to disable tracing. The first item in the buffer is the number of trace event that have happened. Userspace can write 0 to this to reset the tracing, and is expected to do so on first use. The format of the buffer depends on the trace mode. When in PC tracing just the return address of the probe is stored. Under comparison tracing the comparison type, the two arguments, and the return address are traced. The former method uses on entry per trace event, while the later uses 4. As such they are incompatible so only a single mode may be enabled. KCOV is expected to help fuzzing the kernel, and while in development has already found a number of issues. It is required for the syzkaller system call fuzzer [1]. Other kernel fuzzers could also make use of it, either with the current interface, or by extending it with new modes. A man page is currently being worked on and is expected to be committed soon, however having the code in the kernel now is useful for other developers to use. [1] https://github.com/google/syzkaller Submitted by: Mitchell Horne <mhorne063@gmail.com> (Earlier version) Reviewed by: kib Testing by: tuexen Sponsored by: DARPA, AFRL Sponsored by: The FreeBSD Foundation (Mitchell Horne) Differential Revision: https://reviews.freebsd.org/D14599 Notes: svn path=/head/; revision=342962