aboutsummaryrefslogtreecommitdiff
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* lockmgr: add adaptive spinningMateusz Guzik2020-07-221-18/+111
| | | | | | | | | | | | | | | | | | | | | | It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on exclusive lock and never when any waiters are present. buffer cache is remains not spinning. This reduces total sleep times during buildworld etc., but it does not shorten total real time (culprits are contention in the vm subsystem along with slock + upgrade which is not covered). For microbenchmarks: open3_processes -t 52 (open/close of the same file for writing) ops/s: before: 258845 after: 801638 Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25753 Notes: svn path=/head/; revision=363415
* INTRNG: only shuffle for !EARLY_AP_STARTUPMitchell Horne2020-07-211-3/+9
| | | | | | | | | | | | | | | | | | | | | | | During device attachment, all interrupt sources will bind to the BSP, as it is the only processor online. This means interrupts must be redistributed ("shuffled") later, during SI_SUB_SMP. For the EARLY_AP_STARTUP case, this is no longer true. SI_SUB_SMP will execute much earlier, meaning APs will be online and available before devices begin attachment, and there will therefore be nothing to shuffle. All PIC-conforming interrupt controllers will handle this early distribution properly, except for RISC-V's PLIC. Make the necessary tweak to the PLIC driver. While here, convert irq_assign_cpu from a boolean_t to a bool. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D25693 Notes: svn path=/head/; revision=363404
* lockmgr: denote recursion with a bit in lock valueMateusz Guzik2020-07-211-4/+6
| | | | | | | | | This reduces excessive reads from the lock. Tested by: pho Notes: svn path=/head/; revision=363394
* lockmgr: rewrite upgrade to stop always dropping the lockMateusz Guzik2020-07-211-36/+34
| | | | | | | This matches rw and sx locks. Notes: svn path=/head/; revision=363393
* lockmgr: add a helper for reading the lock valueMateusz Guzik2020-07-211-17/+17
| | | | Notes: svn path=/head/; revision=363392
* [net80211] Add new privileges; restrict what can be done in a jail.Adrian Chadd2020-07-191-4/+2
| | | | | | | | | | | | | | | | | | | | | | Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP. + VAP_MANAGE is everything but setting the MAC and creating a VAP. + VAP_SETMAC is setting the MAC address of the VAP. Typically you wouldn't want the jail to be able to modify this. + CREATE_VAP is to create a new VAP. Again, you don't want to be doing this in a jail, but this DOES stop being able to run some corner cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this bit out later. This allows me to run wpa_supplicant in a jail after transferring a STA VAP into it. I unfortunately can't currently set the wlan debugging inside the jail; that would be super useful! Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D25630 Notes: svn path=/head/; revision=363325
* Short-circuit tdfind when looking for the calling thread.Mateusz Guzik2020-07-181-0/+8
| | | | | | | Common occurence with cpuset and other places. Notes: svn path=/head/; revision=363297
* vfs: fix vn_poll performance with either MAC or AUDITMateusz Guzik2020-07-161-6/+8
| | | | | | | | | | | | | | | | | | | | The code would unconditionally lock the vnode to audit or call the mac hoook, even if neither want to do anything. Pre-check the state to avoid locking in the common case of nothing to do. Note this code should not be normally executed anyway as vnodes are always return ready. However, poll1/2 from will-it-scale use regular files for benchmarking, presumably to focus on the interface itself as the vnode handler is not supposed to do almost anything. This in particular fixes poll2 which passes 128 fds. $ ./poll2_processes -s 10 before: 134411 after: 271572 Notes: svn path=/head/; revision=363249
* vfs: fix MAC/AUDIT mismatch in vn_pollMateusz Guzik2020-07-161-3/+3
| | | | | | | Auditing would not be performed without MAC compiled in. Notes: svn path=/head/; revision=363247
* poll: factor fd lookup out of scan and rescanMateusz Guzik2020-07-151-43/+70
| | | | Notes: svn path=/head/; revision=363215
* fd: remove fd_lastfileMateusz Guzik2020-07-155-82/+83
| | | | | | | | | | | It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead. Notes: svn path=/head/; revision=363214
* fd: add obvious branch predictions to fdallocMateusz Guzik2020-07-151-2/+2
| | | | Notes: svn path=/head/; revision=363213
* cache: make negative shrinker round robin on all lists every timeMateusz Guzik2020-07-141-12/+8
| | | | | | | | | | | | | Previously it would check 4, 3, 2, 1 lists. In practice by the time it is getting called all lists have some elements and consequently this does not result in new evictions. Nonetheless, the code is clearer. Tested by: pho Notes: svn path=/head/; revision=363202
* cache: remove numcallsMateusz Guzik2020-07-141-3/+0
| | | | | | | | The counter is not very useful and if necessary the value can be found by summing up other counters. Notes: svn path=/head/; revision=363201
* cache: count dropped entriesMateusz Guzik2020-07-141-0/+2
| | | | Notes: svn path=/head/; revision=363200
* cache: remove neg_locked argument from cache_zap_lockedMateusz Guzik2020-07-141-28/+31
| | | | | | | Tested by: pho Notes: svn path=/head/; revision=363199
* cache: remove a useless argument from cache_negative_insertMateusz Guzik2020-07-141-9/+4
| | | | Notes: svn path=/head/; revision=363198
* cache: create a dedicate struct for negative entriesMateusz Guzik2020-07-141-17/+49
| | | | | | | | | | | | | | | .. and stuff if into the unused target vnode field This gets rid of concurrent nc_flag modifications racing with the shrinker and consequently fixes a bug where such a change could have been missed when cache_ncp_invalidate was being issued.. Reported by: zeising Tested by: pho, zeising Fixes: r362828 ("cache: lockless forward lookup with smr") Notes: svn path=/head/; revision=363196
* fd: stop looping in pwd_holdMateusz Guzik2020-07-111-5/+9
| | | | | | | | | | | We don't expect to fail acquiring the reference unless running into a corner case. Just in case ensure forward progress by taking the lock. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D25616 Notes: svn path=/head/; revision=363112
* vfs: fix early termination of kern_getfsstatMateusz Guzik2020-07-101-1/+1
| | | | | | | | | | | The kernel would unlock already unlocked mutex if the buffer got filled up before the mount list ended. Reported by: pho Fixes: r363069 ("vfs: depessimize getfsstat when only the count is requested") Notes: svn path=/head/; revision=363072
* vfs: fix trivial whitespace issues which don't interefere with blameMateusz Guzik2020-07-1010-20/+19
| | | | | | | .. even without the -w switch Notes: svn path=/head/; revision=363071
* vfs: depessimize getfsstat when only the count is requestedMateusz Guzik2020-07-101-38/+79
| | | | | | | This avoids relocking mountlist_mtx for each entry. Notes: svn path=/head/; revision=363069
* vfs: avoid spurious memcpy in vfs_statfsMateusz Guzik2020-07-101-1/+2
| | | | | | | It is quite often called for the very same buffer. Notes: svn path=/head/; revision=363068
* shm_open2: Implement SHM_GROW_ON_WRITEKyle Evans2020-07-101-5/+32
| | | | | | | | | | | | | Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests, so go ahead and implement it. A future change will make memfd_create always set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests on -CURRENT. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25502 Notes: svn path=/head/; revision=363065
* Apply the logic from r363051 to semctl(2) and __sem_base field.Mark Johnston2020-07-091-0/+7
| | | | | | | | | | Reported by: Jeffball <jeffball@grimm-co.com> MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25600 Notes: svn path=/head/; revision=363055
* Avoid copying out kernel pointers from msgctl(IPC_STAT).Mark Johnston2020-07-091-0/+7
| | | | | | | | | | | | | | | | | While this behaviour is harmless, it is really just an artifact of the fact that the msgctl(2) implementation uses a user-visible structure as part of the internal implementation, so it is not deliberate and these pointers are not useful to userspace. Thus, NULL them out before copying out, and remove references to them from the manual page. Reported by: Jeffball <jeffball@grimm-co.com> Reviewed by: emaste, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25600 Notes: svn path=/head/; revision=363051
* Regenerate.Mark Johnston2020-07-061-2/+2
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362971
* Permit cpuset_(get|set)domain() in capability mode.Mark Johnston2020-07-061-0/+2
| | | | | | | | | | | These system calls already perform validation of their parameters when called in capability mode, identical to cpuset_(get|set)affinity(). MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362970
* kern.tty_info_kstacks: set compact format as defaultPawel Biernacki2020-07-061-1/+1
| | | | Notes: svn path=/head/; revision=362969
* Allow accesses of the caller's CPU and domain sets in capability mode.Mark Johnston2020-07-061-1/+3
| | | | | | | | | | | | | | | | | | | | cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling thread or process' CPU and domain set in capability mode, but only when the thread or process ID is specified as -1. Extend this to cover the case where the ID actually matches the caller's TID or PID, since some code, such as our pthread_attr_get_np() implementation, always provides an explicit ID. It was not and still is not permitted to access CPU and domain sets for other threads in the same process when the process is in capability mode. This might change in the future. Submitted by: Greg V <greg@unrelenting.technology> (original version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25552 Notes: svn path=/head/; revision=362968
* kern.tty_info_kstacks: add a compact formatPawel Biernacki2020-07-062-12/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a more compact display format for kern.tty_info_kstacks inspired by procstat -kk. Set it as a default one. # sysctl kern.tty_info_kstacks=1 kern.tty_info_kstacks: 0 -> 1 # sleep 2 ^T load: 0.17 cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k #0 0xffffffff80c4443e at mi_switch+0xbe #1 0xffffffff80c98044 at sleepq_catch_signals+0x494 #2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12 #3 0xffffffff80c43af3 at _sleep+0x193 #4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1 #5 0xffffffff80c5119b at sys_nanosleep+0x3b #6 0xffffffff810ffc69 at amd64_syscall+0x119 #7 0xffffffff810d5520 at fast_syscall_common+0x101 sleep: about 1 second(s) left out of the original 2 ^C # sysctl kern.tty_info_kstacks=2 kern.tty_info_kstacks: 1 -> 2 # sleep 2 ^T load: 0.24 cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12 sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b amd64_syscall+0x119 fast_syscall_common+0x101 sleep: about 1 second(s) left out of the original 2 ^C Suggested by: avg Reviewed by: mjg Relnotes: yes Sponsored by: Mysterious Code Ltd. Differential Revision: https://reviews.freebsd.org/D25487 Notes: svn path=/head/; revision=362967
* Lift cpuset Capsicum checks into a subroutine.Mark Johnston2020-07-061-36/+31
| | | | | | | | | | | | Otherwise the same checks are duplicated across four different system call implementations, cpuset_(get|set)(affinity|domain)(). No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362966
* vfs: expand on vhold_smr commentMateusz Guzik2020-07-061-1/+14
| | | | Notes: svn path=/head/; revision=362951
* lockf: elide avoidable locking in lf_advlockasyncMateusz Guzik2020-07-051-8/+9
| | | | | | | While here assert on ls_threads state. Notes: svn path=/head/; revision=362950
* Fix typo.Konstantin Belousov2020-07-051-1/+1
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=362948
* Rerun kernel ifunc resolvers after all CPUs have startedAndrew Turner2020-07-051-0/+14
| | | | | | | | | | | | | | | | | | | On architectures that use RELA relocations it is safe to rerun the ifunc resolvers on after all CPUs have started, but while they are sill parked. On arm64 with big.LITTLE this is needed as some SoCs have shipped with different ID register values the big and little clusters meaning we were unable to rely on the register values from the boot CPU. Add support for rerunning the resolvers on arm64 and amd64 as these are both RELA using architectures. Reviewed by: kib Sponsored by: Innovate UK Differential Revision: https://reviews.freebsd.org/D25455 Notes: svn path=/head/; revision=362944
* Add char and short types to kcsanMateusz Guzik2020-07-041-0/+32
| | | | Notes: svn path=/head/; revision=362921
* ifdef out pg_jobc assertions added in r361967Mateusz Guzik2020-07-031-0/+4
| | | | | | | | | They trigger for some people, the bug is not obvious, there are no takers for fixing it, the issue already had to be there for years beforehand and is low priority. Notes: svn path=/head/; revision=362910
* cred: add a prediction to crfree for td->td_realucred == crMateusz Guzik2020-07-021-1/+1
| | | | | | | This matches crhold and eliminates an assembly maze in the common case. Notes: svn path=/head/; revision=362890
* cache: add missing call to cache_ncp_invalid for negative hitsMateusz Guzik2020-07-021-14/+14
| | | | | | | | Note the dtrace probe can fire even the entry is gone, but I don't think that's worth fixing. Notes: svn path=/head/; revision=362889
* cache: fix misplaced fence in cache_ncp_invalidateMateusz Guzik2020-07-021-9/+20
| | | | | | | | | | The intent was to mark the entry as invalid before cache_zap starts messing with it. While here add some comments. Notes: svn path=/head/; revision=362888
* Use tdfind() in pget().Konstantin Belousov2020-07-021-25/+4
| | | | | | | | | | Reviewed by: jhb, hselasky Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25532 Notes: svn path=/head/; revision=362885
* Simplify the flow when getting/setting an isrcAndrew Turner2020-07-011-10/+6
| | | | | | | | | | | Rather than unlocking and returning we can just perform the needed action only when the interrupt source is valid and reuse the unlock in both the valid irq and invalid irq cases. Sponsored by: Innovate UK Notes: svn path=/head/; revision=362834
* cache: lockless forward lookup with smrMateusz Guzik2020-07-011-35/+113
| | | | | | | | | | | | | | This eliminates the need to take bucket locks in the common case. Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing and locking path components, this will be taken care of separately. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23913 Notes: svn path=/head/; revision=362828
* vfs: protect vnodes with smrMateusz Guzik2020-07-011-11/+92
| | | | | | | | | | | | | | vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr section, allowing consumers to get away without locking. See vhold_smr and vdropl for comments explaining caveats. Reviewed by: kib Testec by: pho Differential Revision: https://reviews.freebsd.org/D23913 Notes: svn path=/head/; revision=362827
* Fix a panic when unloading firmwareAndrew Gallatin2020-06-291-8/+7
| | | | | | | | | | | | | | | | LIST_FOREACH_SAFE() is not safe in the presence of other threads removing list entries when a mutex is released. This is not in the critical path, so just restart the scan each time we drop the lock, rather than using a marker. Reviewed by: jhb, markj Sponsored by: Netflix Notes: svn path=/head/; revision=362789
* Use zfree() instead of explicit_bzero() and free().John Baldwin2020-06-253-23/+8
| | | | | | | | | | | | | | | In addition to reducing lines of code, this also ensures that the full allocation is always zeroed avoiding possible bugs with incorrect lengths passed to explicit_bzero(). Suggested by: cem Reviewed by: cem, delphij Approved by: csprng (cem) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25435 Notes: svn path=/head/; revision=362624
* Call swap_pager_freespace() from vm_object_page_remove().Mark Johnston2020-06-251-5/+0
| | | | | | | | | | | | | | | | | All vm_object_page_remove() callers, except linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when removing a range of pages from an object. The LinuxKPI case appears to be an unintentional omission that could result in leaked swap blocks, so unconditionally free swap space in vm_object_page_remove() to protect against similar bugs in the future. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25329 Notes: svn path=/head/; revision=362613
* Add `kern.features.witness`Enji Cooper2020-06-241-0/+2
| | | | | | | | | | | | | | | | | | Adding `kern.features.witness` helps expose whether or not the kernel has `options WITNESS` enabled, so the `feature_present(3)` API can be used to query whether or not witness(9) is built into the kernel. This support is helpful with userspace applications (generally speaking, tests), as it can be queried to determine whether or not tests related to WITNESS should be run. MFC after: 1 week Reviewed by: cem, darrick.freebsd_gmail.com Differential Revision: https://reviews.freebsd.org/D25302 Sponsored by: DellEMC Isilon Notes: svn path=/head/; revision=362591
* vfs: track sequential reads and writes separatelyThomas Munro2020-06-213-20/+29
| | | | | | | | | | | | | | | | | | For software like PostgreSQL and SQLite that sometimes reads sequentially while also writing sequentially some distance behind with interleaved syscalls on the same fd, performance is better on UFS if we do sequential access heuristics separately for reads and writes. Patch originally by Andrew Gierth in 2008, updated and proposed by me with his permission. Reviewed by: mjg, kib, tmunro Approved by: mjg (mentor) Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk> Differential Revision: https://reviews.freebsd.org/D25024 Notes: svn path=/head/; revision=362460