path: root/sys/sys
Commit message (Collapse)AuthorAgeFilesLines
* Add Chacha20-Poly1305 as a KTLS cipher suite.John Baldwin2021-02-181-0/+1
| | | | | | | | | | | | Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the server and client IVs as implicit nonces xored with the record sequence number to generate the per-record nonce matching the construction used with AES-GCM for TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27839
* Use atomic loads/stores when updating td->td_stateAlex Richardson2021-02-181-11/+23
| | | | | | | | | | | | | | | KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569
* mips: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-2/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. At the time libc did not support TLS on MIPS (I believe), but TLS support was added to libc (at least _set_tp.c) for MIPS about a month after __NO_TLS was added, but __NO_TLS was still left around. Reviewed by: imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28713
* riscv: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-1/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. The initial RISC-V import set this for RISC-V presumably due to immaturity in the toolchains at the time. However, TLS via _Thread works fine in both GCC and clang on RISC-V. Reviewed by: mhorne, imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28712
* lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK)Konstantin Belousov2021-02-172-0/+3
| | | | | | | | | | | | | or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
* Bump __FreeBSD_version after f2583be110caMitchell Horne2021-02-171-1/+1
| | | | | | Provide a compatibility point around the ABI-breaking change. Sponsored by: The FreeBSD Foundation
* Update the LRO processing code so that we can supportRandall Stewart2021-02-171-0/+1
| | | | | | | | | | | | | | | | a further CPU enhancements for compressed acks. These are acks that are compressed into an mbuf. The transport has to be aware of how to process these, and an upcoming update to rack will do so. You need the rack changes to actually test and validate these since if the transport does not support mbuf compression, then the old code paths stay in place. We do in this commit take out the concept of logging if you don't have a lock (which was quite dangerous and was only for some early debugging but has been left in the code). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28374
* jail: Handle a possible race between jail_remove(2) and fork(2)Jamie Gritton2021-02-161-0/+1
| | | | | | | | | | | | | | | jail_remove(2) includes a loop that sends SIGKILL to all processes in a jail, but skips processes in PRS_NEW state. Thus it is possible the a process in mid-fork(2) during jail removal can survive the jail being removed. Add a prison flag PR_REMOVE, which is checked before the new process returns. If the jail is being removed, the process will then exit. Also check this flag in jail_attach(2) which has a similar issue. Reported by: trasz Approved by: kib MFC after: 3 days
* efirt: add hooks for diverging EFI implementationsRoger Pau Monné2021-02-161-11/+95
| | | | | | | | | | | | | | Introduce a set of hooks for MI EFI public functions, so that a new implementation can be done. This will be used to implement the Xen PV EFI interface that's used when running FreeBSD as a Xen dom0 from UEFI firmware. Also make the efi_status_to_errno non-static since it will be used to evaluate status return values from the PV interface. No functional change indented. Sponsored by: Citrix Systems R&D Reviewed by: kib, imp Differential revision: https://reviews.freebsd.org/D28620
* stand/multiboot2: add support for booting a Xen dom0 in UEFI modeRoger Pau Monné2021-02-161-0/+1
| | | | | | | | | | | | | | | | | | | | Add some basic multiboot2 infrastructure to the EFI loader in order to be capable of booting a FreeBSD/Xen dom0 when booted from UEFI. Only a very limited subset of the multiboot2 protocol is implemented in order to support enough to boot into Xen, the implementation doesn't intend to be a full multiboot2 capable implementation. Such multiboot2 functionality is hooked up into the amd64 EFI loader, which is the only architecture that supports Xen dom0 on FreeBSD. The options to boot a FreeBSD/Xen dom0 system are exactly the same as on BIOS, and requires setting the xen_kernel and xen_cmdline options in loader.conf. Sponsored by: Citrix Systems R&D Reviewed by: tsoome, imp Differential revision: https://reviews.freebsd.org/D28497
* lockmgr: shrink struct lock by 8 bytes on LP64Mateusz Guzik2021-02-152-5/+6
| | | | | | | | | | | | | | | | | | Currently the struct has a 4 byte padding stemming from 3 ints. 1. prio comfortably fits in short, unfortunately there is no dedicated type for it and plumbing it throughout the codebase is not worth it right now, instead an assert is added which covers also flags for safety 2. lk_exslpfail can in principle exceed u_short, but the count is already not considered reliable and it only ever gets modified straight to 0. In other words it can be incrementing with an upper bound of USHRT_MAX With these in place struct lock shrinks from 48 to 40 bytes. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D28680
* procstat: distinguish vm map guards in procstat vm output.Konstantin Belousov2021-02-141-0/+1
| | | | | | | Requested and reviewed by: rwatson (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28658
* Stop ignoring ERELOOKUP from VOP_INACTIVE()Konstantin Belousov2021-02-121-1/+2
| | | | | | | | | | | | | When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* buf SU hooks: track buf_start() calls with B_IOSTARTED flagKonstantin Belousov2021-02-121-5/+11
| | | | | | | | | | | | | | | | | and only call buf_complete() if previously started. Some error paths, like CoW failire, might skip buf_start() and do bufdone(), which itself call buf_complete(). Various SU handle_written_XXX() functions check that io was started and incomplete parts of the buffer data reverted before restoring them. This is a useful invariant that B_IO_STARTED on buffer layer allows to keep instead of changing check and panic into check and return. Reported by: pho Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundations
* loader: remove BORDER_PIXELSToomas Soome2021-02-091-1/+0
| | | | | | | | BORDER_PIXELS is left over from picking up the source from illumos port. Since FreeBSD VT does not use border in terminal size calculation, there is no reason why should loader use it. MFC after: 1 week
* Revert "SO_RERROR indicates that receive buffer overflows should be handled ↵Alexander V. Chernikov2021-02-082-6/+1
| | | | | | | | as errors." Wrong version of the change was pushed inadvertenly. This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
* SO_RERROR indicates that receive buffer overflows should be handled as errors.Alexander V. Chernikov2021-02-082-1/+6
| | | | | | | | | | | | Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
* libkern: use compiler builtins for strcpy, strcmp and strlenMateusz Guzik2021-02-081-0/+4
* opencrypto: Introduce crypto_dispatch_async()Mark Johnston2021-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, OpenCrypto consumers can request asynchronous dispatch by setting a flag in the cryptop. (Currently only IPSec may do this.) I think this is a bit confusing: we (conditionally) set cryptop flags to request async dispatch, and then crypto_dispatch() immediately examines those flags to see if the consumer wants async dispatch. The flag names are also confusing since they don't specify what "async" applies to: dispatch or completion. Add a new KPI, crypto_dispatch_async(), rather than encoding the requested dispatch type in each cryptop. crypto_dispatch_async() falls back to crypto_dispatch() if the session's driver provides asynchronous dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER(). Similarly, add crypto_dispatch_batch() to request processing of a tailq of cryptops, rather than encoding the scheduling policy using cryptop flags. Convert GELI, the only user of this interface (disabled by default) to use the new interface. Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine whether crypto requests will be dispatched synchronously. This is just a helper macro. Use it instead of looking at cap flags directly. Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and just check the relevant queues directly. This could result in some unnecessary wakeups but I think it's very uncommon to be using more than one queue per worker in a given workload, so checking all three queues is a waste of cycles. Reviewed by: jhb Sponsored by: Ampere Computing Submitted by: Klara, Inc. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28194
* Add a VM flag to prevent reclaim on a failed contig allocationRyan Stone2021-02-031-0/+1
| | | | | | | | | | | | | | | | If a M_WAITOK contig alloc fails, the VM subsystem will try to reclaim contiguous memory twice before actually failing the request. On a system with 64GB of RAM I've observed this take 400-500ms before it finally gives up, and I believe that this will only be worse on systems with even more memory. In certain contexts this delay is extremely harmful, so add a flag that will skip reclaim for allocation requests to allow those paths to opt-out of doing an expensive reclaim. Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28422 Reviewed by: markj, kib
* Expose clang's alignment builtins and use them for roundup2/rounddown2Alex Richardson2021-02-032-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes roundup2/rounddown2 type- and const-preserving and allows using it on pointer types without casting to uintptr_t first. Not performing pointer-to-integer conversions also helps the compiler's optimization passes and can therefore result in better code generation. When using it with integer values there should be no change other than the compiler checking that the alignment value is a valid power-of-two. I originally implemented these builtins for CHERI a few years ago and they have been very useful for CheriBSD. However, they are also useful for non-CHERI code so I was able to upstream them for Clang 10.0. Rationale from the clang documentation: Clang provides builtins to support checking and adjusting alignment of pointers and integers. These builtins can be used to avoid relying on implementation-defined behavior of arithmetic on integers derived from pointers. Additionally, these builtins retain type information and, unlike bitwise arithmetic, they can perform semantic checking on the alignment value. There is also a feature request for GCC, so GCC may also support it in the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641 Reviewed By: brooks, jhb, imp Differential Revision: https://reviews.freebsd.org/D28332
* tests/sys/kern/crc32: Check for SSE4.2 before using itAlex Richardson2021-02-021-1/+6
| | | | | | | | | | | This avoids a SIGILL when running these tests on QEMU (which defaults to a basic amd64 CPU without SSE4.2). This commit also tests the table-based implementations in addition to testing the hw-accelerated crc32 versions. Reviewed By: cem, kib, markj Differential Revision: https://reviews.freebsd.org/D28395
* fd: add fget_only_userMateusz Guzik2021-01-291-0/+12
| | | | | | | | | This can be used by single-threaded processes which don't share a file descriptor table to access their file objects without having to reference them. For example select consumers tend to match the requirement and have several file descriptors to inspect.
* __FreeBSD_version: update the references to the doc treeBjoern A. Zeeb2021-01-291-1/+1
| | | | | Update the reference of which file to update in the doc tree when bumping __FreeBSD_version.
* Bump __FreeBSD_version for multiple LinuxKPI updates conflictingBjoern A. Zeeb2021-01-281-1/+1
| | | | with DRM. Be sure to update your drm-kmod port to after the update.
* firmware(9): extend firmware_get() by a "no warn" flag.Bjoern A. Zeeb2021-01-271-0/+5
| | | | | | | | | | | | | | | | | | With the upcoming usage from LinuxKPI but also from drivers ported natively we are seeing more probing of various firmware (names). Add the ability to firmware(9) to silence the "firmware image loading/registering errors" by adding a new firmware_get_flags() functions extending firmware_get() and taking a flags argument as firmware_put() already does. Requested-by: zeising (for future LinuxKPI/DRM) Sponsored-by: The FreeBSD Foundation Sponsored-by: Rubicon Communications, LLC ("Netgate") MFC after: 3 days Reviewed-by: markj Differential Revision: https://reviews.freebsd.org/D27413
* KDB: remove obsolete KDB_WHY_NDISMarius Strobl2021-01-261-1/+0
| | | | ndis(4) has been removed in bfc99943b04b46a6c1c885ce7bcc6f235b7422aa.
* Bump __FreeBSD_version after e63539f3059728ff58328ac0ecb2a7bf4e2f08e8Dimitry Andric2021-01-261-1/+1
| | | | | | | | This is to allow ports to detect the clang fix applied in the above commit. PR: 252892 MFC after: 3 days
* vfs: use atomic_load_consume_ptr in vn_load_v_data_smrMateusz Guzik2021-01-251-1/+1
* atomic: add stub atomic_load_consume_ptrMateusz Guzik2021-01-252-0/+13
* atomic: make atomic_store_ptr type-awareMateusz Guzik2021-01-252-2/+5
* Rename kern_mmap_req to kern_mmapBrooks Davis2021-01-251-3/+1
| | | | | | | | | | | | Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap. Reand rename kern_mmap_req to kern_mmap . The helper saved some code churn initially, but having multiple interfaces is sub-optimal. Obtained from: CheriBSD Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28292
* qeueue.h: Add {SLIST,STAILQ,LIST,TAILQ}_END()Alex Richardson2021-01-251-0/+10
| | | | | | | | | | | | | | | | | | | We provide these for compat with other queue.h headers since some software assumes it exists (e.g. the libevent contrib code), but we are not encouraging their use (NULL should be used instead). This fixes the following warning (which should arguable be an error since it results in a function call to an undefined function): .../contrib/libevent/buffer.c:495:16: warning: implicit declaration of function 'LIST_END' is invalid in C99 [-Wimplicit-function-declaration] cbent != LIST_END(&buffer->callbacks); ^ .../contrib/libevent/buffer.c:495:13: warning: comparison between pointer and integer ('struct evbuffer_cb_entry *' and 'int') [-Wpointer-integer-compare] cbent != LIST_END(&buffer->callbacks); ~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed By: jhb Differential Revision: https://reviews.freebsd.org/D27151
* cache: add symlink support to lockless lookupMateusz Guzik2021-01-233-3/+12
| | | | | | Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488
* Bump CURRENT to 14.0Glen Barber2021-01-221-1/+1
| | | | | | | This one goes to 14. Approved by: re (implicit) Sponsored by: Rubicon Communications, LLC ("Netgate")
* elf: add some definitions for i386 and amd64 relocationsKonstantin Belousov2021-01-211-0/+14
| | | | | | | | | | | | I believe that rtld does not need to implement them, they are mostly for the static linker. 'Mostly' because for amd64 our kernel linker loads object files, and amd64 relocation types could be observed. Defines were taken from glibc sources. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28205
* jail: Use refcount(9) for prison references.Jamie Gritton2021-01-201-4/+5
| | | | | | | | | | | | Use refcount(9) for both pr_ref and pr_uref in struct prison. This allows prisons to held and freed without requiring the prison mutex. An exception to this is that dropping the last reference will still lock the prison, to keep the guarantee that a locked prison remains valid and alive (provided it was at the time it was locked). Among other things, this honors the promise made in a comment in crcopy(9), that it will not block, which hasn't been true for two decades.
* aio: micro-optimize the lio_opcode assignmentsAlan Somers2021-01-201-5/+6
| | | | | | | | | | | This allows slightly more efficient opcode testing in-kernel. It is transparent to userland, except to applications that sneakily submit aio fsync or aio mlock operations via lio_listio, which has never been documented, requires the use of deliberately undefined constants (LIO_SYNC and LIO_MLOCK), and is arguably a bug. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D27942
* make maximum interrupt number tunable on ARM, ARM64, MIPS, and RISC-VOleksandr Tymoshenko2021-01-192-5/+3
| | | | | | | | | | | | Use a machdep.nirq tunable intead of compile-time constant NIRQ as a value for maximum number of interrupts. It allows keep a system footprint small by default with an option to increase the limit for large systems like server-grade ARM64 Reviewd by: mhorne Differential Revision: https://reviews.freebsd.org/D27844 Submitted by: Klara, Inc. Sponsored by: Ampere Computing
* jail: Add prison_isvalid() and prison_isalive()Jamie Gritton2021-01-181-0/+2
| | | | | | | | | | | | | | | | | | | | prison_isvalid() checks if a prison record can be used at all, i.e. pr_ref > 0. This filters out prisons that aren't fully created, and those that are either in the process of being dismantled, or will be at the next opportunity. While the check for pr_ref > 0 is simple enough to make without a convenience function, this prepares the way for other measures of prison validity. prison_isalive() checks not only validity as far as the useablity of the prison structure, but also whether the prison is visible to user space. It replaces a test for pr_uref > 0, which is currently only used within kern_jail.c, and not often there. Both of these functions also assert that either the prison mutex or allprison_lock is held, since it's generally the case that unlocked prisons aren't guaranteed to remain useable for any length of time. This isn't entirely true, for example a thread can assume its own prison is good, but most exceptions will exist inside of kern_jail.c.
* Implement malloc_domainset_aligned(9).Konstantin Belousov2021-01-171-0/+3
| | | | | | | | | | | | | | Change the power-of-two malloc zones to require alignment equal to the size [*]. Current uma allocator already provides such alignment, so in fact this change does not change anything except providing future-proof setup. Suggested by: markj [*] Reviewed by: andrew, jah, markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28147
* Bump __FreeBSD_version after linuxkpi changesEmmanuel Vadot2021-01-171-1/+1
* fd: add refcount argument to falloc_noinstallMateusz Guzik2021-01-131-1/+3
| | | | | | | | This lets callers avoid atomic ops by initializing the count to required value from the get go. While here add falloc_abort to backpedal from this without having to fdrop.
* fd: add finstall_refedMateusz Guzik2021-01-131-0/+2
| | | | | Can be used to consume an already existing reference and consequently avoid atomic ops.
* fd: provide a dedicated closef variant for unix socket codeMateusz Guzik2021-01-131-0/+1
| | | | This avoids testing for td != NULL.
* vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callersMateusz Guzik2021-01-121-0/+2
| | | | Check the comment above the routine for reasoning.
* Bump __FreeBSD_version after linuxkpi changesEmmanuel Vadot2021-01-121-1/+1
* lio_listio: validate aio_lio_opcodeAlan Somers2021-01-121-1/+1
| | | | | | | | | | | | | | | Previously, we would accept any kind of LIO_* opcode, including ones that were intended for in-kernel use only like LIO_SYNC (which is not defined in userland). The situation became more serious with 022ca2fc7fe08d51f33a1d23a9be49e6d132914e. After that revision, setting aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion. Note that POSIX does not specify what should happen if aio_lio_opcode is invalid. MFC-with: 022ca2fc7fe08d51f33a1d23a9be49e6d132914e Reviewed by: jhb, tmunro, 0mp Differential Revision: <https://reviews.freebsd.org/D28078
* jobc: rework detection of orphaned groups.Konstantin Belousov2021-01-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Instead of trying to maintain pg_jobc counter on each process group update (and sometimes before), just calculate the counter when needed. Still, for the benefit of the signal delivery code, explicitly mark orphaned groups as such with the new process group flag. This way we prevent bugs in the corner cases where updates to the counter were missed due to complicated configuration of p_pptr/p_opptr/real_parent (debugger). Since we need to iterate over all children of the process on exit, this change mostly affects the process group entry and leave, where we need to iterate all process group members to detect orpaned status. (For MFC, keep pg_jobc around but unused). Reported by: jhb Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871
* pgrp: Prevent use after free.Konstantin Belousov2021-01-101-1/+1
| | | | | | | | | | | | | | | | Often, we have a process locked and need to get locked process group. In this case, because progress group lock is before process lock, unlocking process allows the group to be freed. See for instance tty_wait_background(). Make pgrp structures allocated from nofree zone, and ensure type stability of the pgrp mutex. Reviewed by: jilles Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27871