aboutsummaryrefslogtreecommitdiff
path: root/sys/sys
Commit message (Collapse)AuthorAgeFilesLines
* proc: remove zpfindMateusz Guzik2019-08-281-1/+0
| | | | | | | | | | It is not used by anything. If someone wants it back it should be reimplemented to use the proc hash. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351559
* Introduce <sys/qmath.h>, a fixed-point math library from Netflix.
Edward Tomasz Napierala2019-08-271-0/+632
| | | | | | | | | | | | | | | | | | | | | | | | | | This makes it possible to perform mathematical operations
on fractional values without using floating point. It operates on Q numbers, which are integer-sized, opaque structures initialized to hold a chosen number of integer and fractional
bits.
 For a general description of the Q number system, see the "Fixed Point Representation & Fractional Math" whitepaper[1]; for the actual API see the qmath(3) man page. This is one of dependencies for the upcoming stats(3) framework[2] that will be applied to the TCP stack in a later commit. 1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf 2. https://reviews.freebsd.org/D20477 Reviewed by: bcr (man pages, earlier version), sef (earlier version) Discussed with: cem, dteske, imp, lstewart Sponsored By: Klara Inc, Netflix Obtained from: Netflix Differential Revision: https://reviews.freebsd.org/D20116 Notes: svn path=/head/; revision=351544
* Add kernel-side support for in-kernel TLS.John Baldwin2019-08-274-3/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277 Notes: svn path=/head/; revision=351522
* vfs: swap vop_unlock_post and vop_unlock_pre definitions to the logical orderMateusz Guzik2019-08-251-2/+2
| | | | | | | | | The change is no-op. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351499
* Remove zlib 1.0.4 from kernel.Xin LI2019-08-253-1249/+1
| | | | | | | | | PR: 229763 Reviewed by: emaste, Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21375 Notes: svn path=/head/; revision=351480
* vfs: add vholdnz (for already held vnodes)Mateusz Guzik2019-08-251-0/+1
| | | | | | | | | Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358 Notes: svn path=/head/; revision=351471
* vfs: assert the lock held in MNT_REF/MNT_RELMateusz Guzik2019-08-231-1/+5
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351438
* De-commision the MNTK_NOINSMNTQ kernel mount flag.Konstantin Belousov2019-08-231-8/+5
| | | | | | | | | | | | | | After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=351435
* Add lockmgr(9) probes to the lockstat DTrace provider.Mark Johnston2019-08-211-0/+7
| | | | | | | | | | | | | | | | | They follow the conventions set by rw and sx lock probes. There is an additional lockstat:::lockmgr-disown probe. Update lockstat(1) to report on contention and hold events for lockmgr locks. Document the new probes in dtrace_lockstat.4, and deduplicate some of the existing probe descriptions. Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21355 Notes: svn path=/head/; revision=351361
* seqc: predict false for _in_modify and type fixes for _consistent_*Mateusz Guzik2019-08-211-3/+3
| | | | | | | | | | | | | seqc_consistent_* return bool, not seqc. [0] While here annotate the rarely true condition - it is expected to run into it on vare occasion (compared to the other case). Reported by: oshogbo [0] Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351323
* libkern: Implement strchrnul(3)Conrad Meyer2019-08-191-0/+1
| | | | Notes: svn path=/head/; revision=351237
* Fix an issue with executing tmpfs binary.Konstantin Belousov2019-08-182-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=351195
* Add a blocking wait bit to refcount. This allows refs to be used as a simpleJeff Roberson2019-08-181-26/+46
| | | | | | | | | | | | barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254 Notes: svn path=/head/; revision=351188
* Delete sys/dir.h which was deprecated since 1997.Xin LI2019-08-162-54/+1
| | | | | | | | | | PR: 21519 Submitted by: Yoshihiro Ota <ota j email ne jp> Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20479 Notes: svn path=/head/; revision=351140
* md(4): remove the unused and unusable MDIOCLIST ioctl.Brooks Davis2019-08-161-2/+1
| | | | | | | | | | | | | | | | | | It is unused, the ABI was broken in r322969, and it is broken by design (more than MDNPAD md devices can exist and there is no way to retreive them with this interface). mdconfig(8) was converted to use libgeom to obtain this information in r157160 and any other consumers of MDIOCLIST should likewise be converted. Reviewed by: emaste Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18936 Notes: svn path=/head/; revision=351132
* Remove deprecated GEOM classesConrad Meyer2019-08-132-128/+0
| | | | | | | | | | | | | | Follow-up on r322318 and r322319 and remove the deprecated modules. Shift some now-unused kernel files into userspace utilities that incorporate them. Remove references to removed GEOM classes in userspace utilities. Reviewed by: imp (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21249 Notes: svn path=/head/; revision=351001
* Move scheduler state into the per-cpu area where it can be allocated on theJeff Roberson2019-08-131-0/+1
| | | | | | | | | | | correct NUMA domain. Reviewed by: markj, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19315 Notes: svn path=/head/; revision=350972
* sbuf(9): Add sbuf_nl_terminate() APIConrad Meyer2019-08-071-0/+2
| | | | | | | | | | | | | | The API is used to gracefully terminate text line(s) with a single \n. If the formatted buffer was empty or already ended in \n, it is unmodified. Otherwise, a newline character is appended to it. The API, like other sbuf-modifying routines, is only valid while the sbuf is not FINISHED. Reviewed by: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21030 Notes: svn path=/head/; revision=350693
* sbuf(9): Add NOWAIT dynamic buffer extension modeConrad Meyer2019-08-071-0/+1
| | | | | | | | | | | | The goal is to avoid some kinds of low-memory deadlock when formatting heap-allocated buffers. Reviewed by: vangyzen Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21015 Notes: svn path=/head/; revision=350691
* fusefs: merge from projects/fuse2Alan Somers2019-08-071-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit imports the new fusefs driver. It raises the protocol level from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and adds many new features. New features include: * Optional kernel-side permissions checks (-o default_permissions) * Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK * Allow interrupting FUSE operations * Support named pipes and unix-domain sockets in fusefs file systems * Forward UTIME_NOW during utimensat(2) to the daemon * kqueue support for /dev/fuse * Allow updating mounts with "mount -u" * Allow exporting fusefs file systems over NFS * Server-initiated invalidation of the name cache or data cache * Respect RLIMIT_FSIZE * Try to support servers as old as protocol 7.4 Performance enhancements include: * Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags * Cache file attributes * Cache lookup entries, both positive and negative * Server-selectable cache modes: writethrough, writeback, or uncached * Write clustering * Readahead * Use counter(9) for statistical reporting PR: 199934 216391 233783 234581 235773 235774 235775 PR: 236226 236231 236236 236291 236329 236381 236405 PR: 236327 236466 236472 236473 236474 236530 236557 PR: 236560 236844 237052 237181 237588 238565 Reviewed by: bcr (man pages) Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit review on project branch) MFC after: 3 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation Pull Request: https://reviews.freebsd.org/D21110 Notes: svn path=/head/; revision=350665
| * Bump __FreeBSD_versionAlan Somers2019-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | r350437 presents a merge conflict with r350115, which raised __FreeBSD_version due to the addition of fusefs's intr/nointr mount options. Sponsored by: The FreeBSD Foundation Notes: svn path=/projects/fuse2/; revision=350456
* | Cache kernel stacks in UMA. This gives us NUMA support, better concurrency,Jeff Roberson2019-08-061-49/+0
| | | | | | | | | | | | | | | | | | | | | | | | and more statistics. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20931 Notes: svn path=/head/; revision=350663
* | proc: introduce the proc_add_orphan functionMariusz Zaborski2019-08-051-0/+1
| | | | | | | | | | | | | | | | | | | | This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month Notes: svn path=/head/; revision=350611
* | Add necessary bits for Linux KPI to work correctly on powerpcJustin Hibbits2019-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PowerPC, and possibly other architectures, use different address ranges for PCI space vs physical address space, which is only mapped at resource activation time, when the BAR gets written. The DRM kernel modules do not activate the rman resources, soas not to waste KVA, instead only mapping parts of the PCI memory at a time. This introduces a BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI driver, to perform this necessary translation without activating the resource. In addition to system KPI changes, LinuxKPI is updated to handle a big-endian host, by adding proper endian swaps to the I/O functions. Submitted by: mmacy Reported by: hselasky Differential Revision: https://reviews.freebsd.org/D21096 Notes: svn path=/head/; revision=350570
* | Allow Kernel to link in both legacy libkern/zlib and new sys/contrib/zlib,Xin LI2019-08-011-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with an eventual goal to convert all legacl zlib callers to the new zlib version: * Move generic zlib shims that are not specific to zlib 1.0.4 to sys/dev/zlib. * Connect new zlib (1.2.11) to the zlib kernel module, currently built with Z_SOLO. * Prefix the legacy zlib (1.0.4) with 'zlib104_' namespace. * Convert sys/opencrypto/cryptodeflate.c to use new zlib. * Remove bundled zlib 1.2.3 from ZFS and adapt it to new zlib and make it depend on the zlib module. * Fix Z_SOLO build of new zlib. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Reviewed by: markm (sys/dev/zlib/zlib_kmod.c) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19706 Notes: svn path=/head/; revision=350496
* | Make randomized stack gap between strings and pointers to argv/envs.Konstantin Belousov2019-07-313-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This effectively makes the stack base on the csu _start entry randomized. The gap is enabled if ASLR is for the ABI is enabled, and then kern.elf{64,32}.aslr.stack_gap specify the max percentage of the initial stack size that can be wasted for gap. Setting it to zero disables the gap, and max is capped at 50%. Only amd64 for now. Reviewed by: cem, markj Discussed with: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21081 Notes: svn path=/head/; revision=350484
* | Use VNASSERT() in checked VOP wrappers.Mark Johnston2019-07-301-4/+7
|/ | | | | | | | | | Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21120 Notes: svn path=/head/; revision=350458
* Handle refcount(9) wraparound.Mark Johnston2019-07-301-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt to mitigate the security risks around refcount overflows by introducing a "saturated" state for the counter. Once a counter reaches INT_MAX+1, subsequent acquire and release operations will blindly set the counter value to INT_MAX + INT_MAX/2, ensuring that the protected resource will not be freed; instead, it will merely be leaked. The approach introduces a small race: if a refcount value reaches INT_MAX+1, a subsequent release will cause the releasing thread to set the counter to the saturation value after performing the decrement. If in the intervening window INT_MAX refcount releases are performed by a different thread, a use-after-free is possible. This is very difficult to trigger in practice, and any situation where it could be triggered would likely be vulnerable to reference count wraparound problems to begin with. An alternative would be to use atomic_cmpset to acquire and release references, but this would introduce a larger performance penalty, particularly when the counter is contended. Note that refcount_acquire_checked(9) maintains its previous behaviour; code which must accurately track references should use it instead of refcount_acquire(9). Reviewed by: kib, mjg MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21089 Notes: svn path=/head/; revision=350446
* Bump __FreeBSD_version after removal of gzip'ed a.out support.Xin LI2019-07-301-1/+1
| | | | Notes: svn path=/head/; revision=350437
* Remove gzip'ed a.out support.Xin LI2019-07-301-53/+0
| | | | | | | | | | | | | | | The current implementation of gzipped a.out support was based on a very old version of InfoZIP which ships with an ancient modified version of zlib, and was removed from the GENERIC kernel in 1999 when we moved to an ELF world. PR: 205822 Reviewed by: imp, kib, emaste, Yoshihiro Ota <ota at j.email.ne.jp> Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21099 Notes: svn path=/head/; revision=350436
* seqc: add man pageMariusz Zaborski2019-07-291-49/+0
| | | | | | | | | Reviewed by: markj Earlier version reviewed by: emaste, mjg, bcr, 0mp Differential Revision: https://reviews.freebsd.org/D16744 Notes: svn path=/head/; revision=350430
* proc: make clear_orphan an public APIMariusz Zaborski2019-07-291-0/+1
| | | | | | | | | | This will be useful for other patches with process descriptors. Change its name as well. Reviewed by: markj, kib Notes: svn path=/head/; revision=350429
* Decode some more IDENTIFY DEVICE bits.Alexander Motin2019-07-281-2/+5
| | | | | | | MFC after: 2 weeks Notes: svn path=/head/; revision=350393
* Add v_inval_buf_range, like vtruncbuf but for a range of a fileAlan Somers2019-07-281-0/+2
| | | | | | | | | | | | | | v_inval_buf_range invalidates all buffers within a certain LBA range of a file. It will be used by fusefs(5). This commit is a partial merge of r346162, r346606, and r346756 from projects/fuse2. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21032 Notes: svn path=/head/; revision=350386
* Make `camcontrol sanitize` support also ATA devices.Alexander Motin2019-07-251-0/+3
| | | | | | | | | | | | | ATA sanitize is functionally identical to SCSI, just uses different initiation commands and status reporting mechanism. While there, make kernel better handle sanitize commands and statuses. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=350331
* r350320 committed the wrong version of generated syscall.mk.Rick Macklem2019-07-251-2/+2
| | | | | | | | | | This commit is for the correct version. (The incorrect one had the order of the last two entries reversed, due to my testing with copy_file_range at 568 instead of 569.) This misordering should not have been a problem, but is now fixed. Notes: svn path=/head/; revision=350321
* Update the generated syscall.mk for copy_file_range(2).Rick Macklem2019-07-251-0/+1
| | | | | | | I missed this file for commit r350316. Notes: svn path=/head/; revision=350320
* Update the generated syscall files for copy_file_range(2) added byRick Macklem2019-07-252-1/+12
| | | | | | | r350315. Notes: svn path=/head/; revision=350316
* Add kernel support for a Linux compatible copy_file_range(2) syscall.Rick Macklem2019-07-252-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584 Notes: svn path=/head/; revision=350315
* Fix the turnstile_lock() KPI.Mark Johnston2019-07-241-1/+2
| | | | | | | | | | | | | | | | | | | turnstile_{lock,unlock}() were added for use in epoch. turnstile_lock() returned NULL to indicate that the calling thread had lost a race and the turnstile was no longer associated with the given lock, or the lock owner. However, reader-writer locks may not have a designated owner, in which case turnstile_lock() would return NULL and epoch_block_handler_preempt() would leak spinlocks as a result. Apply a minimal fix: return the lock owner as a separate return value. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21048 Notes: svn path=/head/; revision=350310
* Remove cap_random(3).Mark Johnston2019-07-241-1/+1
| | | | | | | | | | | | | | | | Now that we have a way to obtain entropy in capability mode (getrandom(2)), libcap_random is obsolete. Remove it. Bump __FreeBSD_version in case anything happens to use it, though I've found no consumers. Reviewed by: delphij, emaste, oshogbo Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21033 Notes: svn path=/head/; revision=350307
* Switch the rest of the refcount(9) functions to bool return type.Konstantin Belousov2019-07-211-9/+9
| | | | | | | | | | | | | There are some explicit comparisions of refcount_release(9) result with 0/1, which are fine. Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21014 Notes: svn path=/head/; revision=350204
* Fix userspace build after r350199.Konstantin Belousov2019-07-211-0/+1
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=350200
* Check and avoid overflow when incrementing fp->f_count inKonstantin Belousov2019-07-212-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947 Notes: svn path=/head/; revision=350199
* Add Accessible Max Address Configuration support to camcontrol.Alexander Motin2019-07-191-0/+7
| | | | | | | | | | | | | | | | AMA replaced HPA in ACS-3 specification. It allows to limit size of the disk alike to HPA, but declares inaccessible data as indeterminate. One of its practical use cases is to under-provision SATA SSDs for better reliability and performance. While there, fix HPA Security detection/reporting. MFC after: 2 weeks Relnotes: yes Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=350149
* Add ptrace op PT_GET_SC_RET.John Baldwin2019-07-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | This ptrace operation returns a structure containing the error and return values from the current system call. It is only valid when a thread is stopped during a system call exit (PL_FLAG_SCX is set). The sr_error member holds the error value from the system call. Note that this error value is the native FreeBSD error value that has _not_ been translated to an ABI-specific error value similar to the values logged to ktrace. If sr_error is zero, then the return values of the system call will be set in sr_retval[0] and sr_retval[1]. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20901 Notes: svn path=/head/; revision=350017
* Always set td_errno to the error value of a system call.John Baldwin2019-07-151-2/+1
| | | | | | | | | | | | | | Early errors prior to a system call did not set td_errno. This commit sets td_errno for all errors during syscallenter(). As a result, syscallret() can now always use td_errno without checking TDP_NERRNO. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20898 Notes: svn path=/head/; revision=350012
* Add arm_sync_icache() and arm_drain_writebuf() sysarch syscall wrappers.Ian Lepore2019-07-131-1/+1
| | | | | | | | | | | | | | | | NetBSD and OpenBSD have libc wrapper functions for the ARM_SYNC_ICACHE and ARM_DRAIN_WRITEBUF sysarch operations. This change adds compatible functions to our library. This should make it easier for various upstream sources to support *BSD operating systems with a single variation of cache maintence code in tools like interpreters and JIT compilers. I consider the argument types passed to arm_sync_icache() to be especially unfortunate, but this is intended to match the other BSDs. Differential Revision: https://reviews.freebsd.org/D20906 Notes: svn path=/head/; revision=349972
* This commit updates rack to what is basically being used at NF asRandall Stewart2019-07-101-0/+1
| | | | | | | | | | | | | | well as sets in some of the groundwork for committing BBR. The hpts system is updated as well as some other needed utilities for the entrance of BBR. This is actually part 1 of 3 more needed commits which will finally complete with BBRv1 being added as a new tcp stack. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20834 Notes: svn path=/head/; revision=349893
* Merge the vm_page hold and wire mechanisms.Mark Johnston2019-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247 Notes: svn path=/head/; revision=349846