aboutsummaryrefslogtreecommitdiff
path: root/sys/kern/subr_trap.c
Commit message (Collapse)AuthorAgeFilesLines
* Deinline racct throttling out of syscall exit path.Mateusz Guzik2018-11-291-10/+2
| | | | | | | | | | | | racct is not enabled by default and even when it is enabled processes are typically not throttled. The order of checks is left unchanged since racct_enable will be annotated as __read_frequently, while checking for the flag in the processes would probably require an extra fetch. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341181
* hwpmc: support sampling both kernel and user stacks when interrupted in kernelMatt Macy2018-06-041-0/+5
| | | | | | | | | | | | | | | | This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks Notes: svn path=/head/; revision=334595
* sx: port over writer starvation prevention measures from rwlockMateusz Guzik2018-05-221-0/+3
| | | | | | | | | | | | | | | | | | A constant stream of readers could completely starve writers and this is not a hypothetical scenario. The 'poll2_threads' test from the will-it-scale suite reliably starves writers even with concurrency < 10 threads. The problem was run into and diagnosed by dillon@backplane.com There was next to no change in lock contention profile during -j 128 pkg build, despite an sx lock being at the top. Tested by: pho Notes: svn path=/head/; revision=334024
* Add simple preempt safe epoch APIMatt Macy2018-05-101-0/+2
| | | | | | | | | | | | | | | | | | Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@ Notes: svn path=/head/; revision=333466
* Account the size of the vslock-ed memory by the thread.Konstantin Belousov2018-03-241-0/+2
| | | | | | | | | | | | | Assert that all such memory is unwired on return to usermode. The count of the wired memory will be used to detect the copyout mode. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=331490
* spdx: initial adoption of licensing ID tags.Pedro F. Giffuni2017-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133 Notes: svn path=/head/; revision=325966
* - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeterGleb Smirnoff2017-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156 Notes: svn path=/head/; revision=317061
* Do not leak mount references for dying threads.Konstantin Belousov2017-02-251-3/+3
| | | | | | | | | | | | | | | | | | | | Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=314253
* The assertion re-added in r302614 was triggered when stopping signalKonstantin Belousov2016-07-181-10/+18
| | | | | | | | | | | | | | | | | | is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=302999
* Revive the check, disabled in r197963.Konstantin Belousov2016-07-121-10/+37
| | | | | | | | | | | | | | | | | | | | Despite the implication (process has pending signals -> the current thread marked for AST and has TDF_NEEDSIGCHK set) is not true due to other thread might manipulate its signal blocking mask, it should still hold for the single-threaded processes. Enable check for the condition for single-threaded case, and replicate it from userret() to ast() as well, where we check that ast indeed has no signal to deliver. Note that the check is under DIAGNOSTIC, it is not enabled for INVARIANTS but !DIAGNOSTIC since it imposes too heavy-weight locking for day-to-day used debugging kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=302614
* Add assert to complement r302328.Konstantin Belousov2016-07-121-1/+3
| | | | | | | | | | | | | AST must not execute with TDF_SBDRY or TDF_SEINTR/TDF_SERESTART thread flags set, which is asserted in userret(). As the consequence, -1 return from cursig() must not be possible. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=302613
* Rewrite sigdeferstop(9) and sigallowstop(9) into more flexibleKonstantin Belousov2016-06-261-1/+1
| | | | | | | | | | | | | | | | framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Notes: svn path=/head/; revision=302215
* Add four new RCTL resources - readbps, readiops, writebps and writeiops,Edward Tomasz Napierala2016-04-071-3/+7
| | | | | | | | | | | | | | | | | | for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080 Notes: svn path=/head/; revision=297633
* racct: perform a lockless check for p_throttledMateusz Guzik2015-07-131-1/+1
| | | | | | | | | This reduces proc lock contention. Reviewed by: trasz Notes: svn path=/head/; revision=285511
* Generalised support for copy-on-write structures shared by threads.Mateusz Guzik2015-06-101-2/+2
| | | | | | | | | | | | Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary. Notes: svn path=/head/; revision=284214
* Currently, softupdate code detects overstepping on the workitemsKonstantin Belousov2015-05-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=283600
* Remove support for Xen PV domU kernels. Support for HVM domU kernelsJohn Baldwin2015-04-301-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes Notes: svn path=/head/; revision=282274
* Add kern.racct.enable tunable and RACCT_DISABLED config option.Edward Tomasz Napierala2015-04-291-5/+8
| | | | | | | | | | | | | | The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=282213
* Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmemKonstantin Belousov2015-01-121-2/+0
| | | | | | | | | | | does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=277055
* Fix two issues with /dev/mem access on amd64, both causing kernel pageKonstantin Belousov2014-03-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=263475
* Update kernel inclusions of capability.h to use capsicum.h instead; someRobert Watson2014-03-161-1/+1
| | | | | | | | | | | further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks Notes: svn path=/head/; revision=263233
* - Assert for not leaking readers rw locks counter on userland return.Attilio Rao2013-12-171-0/+3
| | | | | | | | | - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division Notes: svn path=/head/; revision=259509
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingAttilio Rao2013-11-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
* Partially revert r195702. Deferring stops is now implemented via a set ofJohn Baldwin2013-03-181-1/+1
| | | | | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags. Notes: svn path=/head/; revision=248470
* When throttling a process to enforce RACCT limits, do not use neitherEdward Tomasz Napierala2013-03-141-9/+2
| | | | | | | | | | | PBDRY (which simply doesn't make any sense) nor PCATCH (which could be used by a malicious process to work around the PCPU limit). Submitted by: Rudo Tomori Reviewed by: kib Notes: svn path=/head/; revision=248300
* Replace the TDP_NOSLEEPING flag with a counter so that theJohn Baldwin2013-03-011-1/+1
| | | | | | | | | THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio Notes: svn path=/head/; revision=247588
* Further refine the handling of stop signals in the NFS client. TheJohn Baldwin2013-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_*() and VOP_*() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month Notes: svn path=/head/; revision=247116
* Fixup r240246: hwpmc needs to retain the pinning until ASTs are notAttilio Rao2012-10-301-1/+6
| | | | | | | | | | | | | | | | executed. This means past the point where userret() is generally executed. Skip the td_pinned check if a callchain tracing is currently happening and add a more robust check to pmc_capture_user_callchain() in order to catch td_pinned leak past ast() in hwpmc case. Reported and tested by: fabient MFC after: 1 week X-MFC: r240246 Notes: svn path=/head/; revision=242361
* Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".Edward Tomasz Napierala2012-10-261-0/+13
| | | | | | | It was implemented by Rudolf Tomori during Google Summer of Code 2012. Notes: svn path=/head/; revision=242139
* Add a KPI to allow to reserve some amount of space in the numvnodesKonstantin Belousov2012-10-141-0/+2
| | | | | | | | | | | | | | | | counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week Notes: svn path=/head/; revision=241556
* Move the checks for td_pinned, td_critnest, TDP_NOFAULTING andAttilio Rao2012-09-081-1/+14
| | | | | | | | | | | | | TDP_NOSLEEPING leaking from syscallret() to userret() so that also trap handling is covered. Also, the check on td_locks is not duplicated between the two functions. Reported by: avg Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240246
* Move PT_UPDATED_FLUSH() before td_locks check in order to have moreAttilio Rao2012-09-081-3/+3
| | | | | | | | | | coverage also in the XEN case. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240245
* userret() already checks for td_locks when INVARIANTS is enabled, soAttilio Rao2012-09-081-1/+0
| | | | | | | | | | there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240244
* Remove redundant include.Pawel Jakub Dawidek2012-06-101-1/+0
| | | | | | | MFC after: 1 month Notes: svn path=/head/; revision=236859
* Include the associated wait channel message for context switch ktraceJohn Baldwin2012-04-201-2/+2
| | | | | | | | | | records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week Notes: svn path=/head/; revision=234494
* Add software PMC support.Fabien Thomas2012-03-281-0/+10
| | | | | | | | | | | | | | | | New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month Notes: svn path=/head/; revision=233628
* Assert that exiting process does not return to usermode.Konstantin Belousov2011-10-031-0/+2
| | | | | | | | Reviewed by: avg, jhb MFC after: 1 week Notes: svn path=/head/; revision=225942
* In order to maximize the re-usability of kernel code in user space thisKip Macy2011-09-161-2/+2
| | | | | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz) Notes: svn path=/head/; revision=225617
* Inline the syscallenter() and syscallret(). This reduces the time measuredKonstantin Belousov2011-09-111-162/+0
| | | | | | | | | | | by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks Notes: svn path=/head/; revision=225474
* We may split today's CAPABILITIES into CAPABILITY_MODE (which hasJonathan Anderson2011-06-291-2/+2
| | | | | | | | | | | | | | | | to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd Notes: svn path=/head/; revision=223668
* Continue introducing Capsicum capability mode support:Robert Watson2011-03-011-0/+15
| | | | | | | | | | | | | | If a system call wasn't listed in capabilities.conf, return ECAPMODE at syscall entry. Reviewed by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months Notes: svn path=/head/; revision=219133
* Mfp4 CH=177256:Bjoern A. Zeeb2011-02-141-0/+11
| | | | | | | | | | | | | | | | | | Catch a set vnet upon return to user space. This usually means return paths with CURVNET_RESTORE() missing. If VNET_DEBUG is turned on we can even tell the function that did the CURVNET_SET() which is really helpful; else we print "N/A". Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 11 days Notes: svn path=/head/; revision=218688
* Allow debugger to specify that children of the traced process should beKonstantin Belousov2011-01-251-2/+2
| | | | | | | | | | | automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks Notes: svn path=/head/; revision=217819
* Remove extra braces for style(9) (found while cleaning up an old work tree).Ed Maste2010-09-281-2/+1
| | | | Notes: svn path=/head/; revision=213236
* Call the systrace_probe_func() when the error value.Rui Paulo2010-08-221-2/+2
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=211617
* Retire td_syscalls now that it is no longer needed.John Baldwin2010-07-151-1/+0
| | | | Notes: svn path=/head/; revision=210138
* Obey sv_syscallnames bounds in syscallname().Konstantin Belousov2010-07-041-2/+4
| | | | | | | Reported and tested by: pho Notes: svn path=/head/; revision=209697
* Move prototypes for kern_sigtimedwait() and kern_sigprocmask() toJohn Baldwin2010-06-301-0/+1
| | | | | | | <sys/syscallsubr.h> where all other kern_<syscall> prototypes live. Notes: svn path=/head/; revision=209613
* Count number of threads that enter and leave dynamically registeredKonstantin Belousov2010-06-281-0/+4
| | | | | | | | | | | | | syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month Notes: svn path=/head/; revision=209579
* Remove the support for int13 FPU exception reporting on i386. It isKonstantin Belousov2010-06-231-21/+0
| | | | | | | | | | | | believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version) Notes: svn path=/head/; revision=209461