path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
* Save context switch per I/O for iSCSI and IOCTL frontends.Alexander Motin2021-02-196-25/+65
| | | | | | | | | | | | | | | Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller context instead of scheduling another thread just for that. This call may sleep, that is not acceptable for some frontends like the original CAM/FC one, but iSCSI already has separate sleepable per-connection RX threads, and another thread scheduling is mostly just a waste of time. IOCTL frontend actually waits for the I/O completion in the caller thread, so the use of another thread for this has even less sense. With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os to ZFS. MFC after: 1 month
* nvdimm(4): Export NVDIMM health flags via sysctlRavi Pokala2021-02-183-1/+77
| | | | | | | | | | | | | | | The ACPI NFIT specification defines a set of "NVDIMM State Flags". These flags are already reported by `acpidump -t', but this change makes them available on a per-device basis, in a format that is more easily parsed. To simplify this, introduce acpi_nfit_get_memory_maps_by_dimm(), which locates the (ACPI_NFIT_MEMORY_MAP)s associated with a given (nfit_handle_t). Reviewed by: mav, cem Tested by: mav, rpokala (version for stable/12) MFC after: 3 days Sponsored by: Panasas
* Move XPT_IMMEDIATE_NOTIFY handling out of periph lock.Alexander Motin2021-02-181-1/+2
| | | | | | It is a rare, but still better to not have lock dependencies. MFC after: 1 month
* cgem: improve usage of busdma(9) KPIMitchell Horne2021-02-181-8/+4
| | | | | | | | | | | | | | | BUS_DMA_NOCACHE should only be used when one needs to guarantee the created mapping has uncached memory attributes, usually as a result of buggy hardware. Normal use cases should pass BUS_DMA_COHERENT, to create an appropriate mapping based on the flags passed to bus_dma_tag_create(). This should have no functional change, since the DMA tags in this driver are created without the BUS_DMA_COHERENT flag. Reported by: mmel Reviewed by: mmel, Thomas Skibo <thomas-bsd@skibo.net> MFC after: 3 days
* cryptosoft: Support per-op keys for AES-GCM and AES-CCM.John Baldwin2021-02-181-0/+6
| | | | | | Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D28752
* Add Chacha20-Poly1305 support in the OCF backend for KTLS.John Baldwin2021-02-181-21/+95
| | | | | | | | | This supports Chacha20-Poly1305 for both send and receive for TLS 1.2 and for send in TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27841
* Add Chacha20-Poly1305 as a KTLS cipher suite.John Baldwin2021-02-182-14/+63
| | | | | | | | | | | | Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the server and client IVs as implicit nonces xored with the record sequence number to generate the per-record nonce matching the construction used with AES-GCM for TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27839
* Add an implementation of CHACHA20_POLY1305 to cryptosoft.John Baldwin2021-02-185-5/+335
| | | | | | | | | | This uses the chacha20 IETF and poly1305 implementations from libsodium. A seperate auth_hash is created for the auth side whose Setkey method derives the poly1305 key from the AEAD key and nonce as described in RFC 8439. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27837
* Add an OCF algorithm for ChaCha20-Poly1305 AEAD.John Baldwin2021-02-187-7/+65
| | | | | | | | Note that this algorithm implements the mode defined in RFC 8439. Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27836
* ddb: fix show devmap output on 32-bit armThomas Skibo2021-02-181-1/+3
| | | | | | | | | The output has been broken since 1b6dd6d772ca. Casting to uintmax_t before the call to printf is necessary to ensure that 32-bit addresses are interpreted correctly. PR: 243236 MFC after: 3 days
* arm64: Include NUMA locality info in the CPU topologyMark Johnston2021-02-181-1/+28
| | | | | | | | | | | | | The scheduler uses this topology to try and preserve locality when migrating threads between CPUs and when performing work stealing. Ensure that on NUMA systems it will at least take the NUMA topology into account. Reviewed by: mmel Submitted by: Klara, Inc. Sponsored by: Ampere Computing MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28579
* ipmi_ssif: Fix inverted for the end of multi-part readsAllan Jude2021-02-181-1/+1
| | | | | | | | | | As per Intelligent Platform Management Interface Specification v2.0 rev. 1.1, section 12.5: SSIF Multi-part Read Transactions Sponsored by: Ampere Computing LLC Submitted by: Klara Inc. Reviewed by: manu Differential Revision: https://reviews.freebsd.org/D28749
* ig4(4): Increase timeout to about 1 secondAllan Jude2021-02-181-1/+1
| | | | | | | | | | | | | | | | | | Per the i2c spec, a slave device can stretch SCL idefinitely, so 25ms is a bit arbitrary in general. smbus does specify an optional timeout recovery mechanism to be done at about 25~35ms, but the IPMI SSIF spec says that BMCs don't have any obligation to implement that. The BMC on Altra seems to mostly respond within 25ms, but occasionally will stretch SCL for ~300 msec. Also, the count_us mechanism seems to actually timeout around 25% earlier than it would claim (timeout really happening around 19ms instead of 25ms). Sponsored by: Ampere Computing LLC Submitted by: Klara Inc. Reviewed by: manu, imp Differential Revision: https://reviews.freebsd.org/D28747
* vn_printf: handle VI_FOPENINGKonstantin Belousov2021-02-181-1/+3
| | | | | | | Noted by: mjg Sponsored by: The FreeBSD Foundation MFC after: 6 days Fixes: fa3bd463cee
* zfs: bump version and install new share filesMartin Matuska2021-02-181-2/+2
| | | | | | | | - bump version to 2.0.0-FreeBSD_gbf156c966 - install definition files for the new "-o compatibility" option to "zpool create" MFC after: 2 weeks
* zfs: merge OpenZFS master-bf156c966Martin Matuska2021-02-1853-2855/+5510
| | | | | | | | | | | | | | | | | | Notable upstream changes: bf156c966 Remove unused abd_alloc_scatter_offset_chunkcnt 658fb8020 Add "compatibility" property for zpool feature sets This update introduces a new pool property called "compatibility" that can be used to enable a limited set of pool features on pool creation and "stick" to it, so the "zpool upgrade" does not accidentally enable features that are not desired. The value of this property may then be changed later. See zpool-features(5) for more information about the "compatibility" pool property. Obtained from: OpenZFS MFC after: 2 weeks
* Use atomic loads/stores when updating td->td_stateAlex Richardson2021-02-1814-37/+49
| | | | | | | | | | | | | | | KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569
* Allocate BAR for ENA MSIx vector tableMichal Krawczyk2021-02-182-1/+24
| | | | | | | | | | | | | | | | | | | | In the new ENA-based instances like c6gn, the vector table moved to a new PCIe bar - BAR1. Previously it was always located on the BAR0, so the resources were already allocated together with the registers. As the FreeBSD isn't doing any resource allocation behind the scenes, the driver is responsible to allocate them explicitly, before other parts of the OS (like the PCI code allocating MSIx) will be able to access them. To determine dynamically BAR on which the MSIx vector table is present the pci_msix_table_bar() is being used and the new BAR is allocated if needed. Submitted by: Michal Krawczyk <mk@semihalf.com> Obtained from: Semihalf Sponsored by: Amazon, Inc MFC after: 3 days
* fix Navdeeps LINT_NOINET error.Randall Stewart2021-02-181-0/+2
* cxgbe(4): Break up t4_read_chip_settings.Navdeep Parhar2021-02-186-44/+62
| | | | | | | | | | Read the PF-only hardware settings directly in get_params__post_init. Split the rest into two routines used by both the PF and VF drivers: one that reads the SGE rx buffer configuration and another that verifies miscellaneous hardware configuration. MFC after: 1 week Sponsored by: Chelsio Communications
* pf: Fix osfp configurationKristof Provost2021-02-181-1/+1
| | | | | | | | pf_rule_to_krule() incorrectly converted the rule osfp configuration to the krule structure. Reported by: delphij@ MFC after: 3 days
* Fix another pesky missing #ifdef TCPHPTSRandall Stewart2021-02-181-0/+2
* mips: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-2/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. At the time libc did not support TLS on MIPS (I believe), but TLS support was added to libc (at least _set_tp.c) for MIPS about a month after __NO_TLS was added, but __NO_TLS was still left around. Reviewed by: imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28713
* riscv: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-1/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. The initial RISC-V import set this for RISC-V presumably due to immaturity in the toolchains at the time. However, TLS via _Thread works fine in both GCC and clang on RISC-V. Reviewed by: mhorne, imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28712
* Add a VA_IS_CLEANMAP() macro.John Baldwin2021-02-188-17/+16
| | | | | | | | | | | | | | This macro returns true if a provided virtual address is contained in the kernel's clean submap. In CHERI kernels, the buffer cache and transient I/O map are allocated as separate regions. Abstracting this check reduces the diff relative to FreeBSD. It is perhaps slightly more readable as well. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28710
* lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK)Konstantin Belousov2021-02-174-3/+38
| | | | | | | | | | | | | or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
* uart: only use MSI on devices that advertise 1 MSI vectorWarner Losh2021-02-171-3/+3
| | | | | | | | | | This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be used. Since we're just attaching one instance, the meaning of more than one vector is not clear and seems to cause problems. Fall back to old methods for these cards. PR: 235016 Submitted by: David Cross
* gicv3_its: Don't restrict target CPUs based on SRATD Scott Phillips2021-02-171-17/+34
| | | | | | | | | | | | | | | | | | | | | | ACPI Sec (SRAT, GIC Interrupt Translation Service (ITS) Affinity Structure) says: > The GIC ITS Affinity Structure provides the association between > a GIC ITS and a proximity domain. This enables the OSPM to > discover the memory that is closest to the ITS, and use that in > allocating its management tables and command queue. Previously the ITS driver was using the proximity domain to restrict which CPUs can be targeted by an LPI. We keep that logic just for the original dual socket ThunderX which cannot forward LPIs between sockets. We also use the SRAT entry for its intended purpose of attempting to allocate ITS table structures near the ITS. Reviewed by: andrew Sponsored by: Ampere Computing LLC Differential Revision: https://reviews.freebsd.org/D28340
* Giant: move back Giant removal until 14Warner Losh2021-02-171-1/+1
| | | | | | | Update the Giant Lock warning message to FreeBSD 14. It's growing increasling clear that this won't be done before 13.0. MFC: Insta (re@'s request)
* Handle negative return values from syncache_expand().John Baldwin2021-02-171-5/+15
| | | | | | | | | | | | | | | These errors do not clear so to NULL, so the existing check was treating these failures as success. The rest of do_pass_establish() then tried to use the listen socket as if it was a connection socket newly created by syncache_expand(). In addition, for negative return values, do not send a RST to the peer. Reported by: Sony Arpita Das @ Chelsio Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D28243
* fwohci: Cast bitfield to uint32_t before passing it to roundup2().John Baldwin2021-02-171-1/+1
| | | | | | | | | | The fallback for __align_up() used by roundup2() uses __typeof__() which doesn't work for bitfields. This fixes the build on GCC which uses the fallback. Reviewed by: arichardson, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D28599
* gicv3_its: Leave LPI interrupts enabled during handlingD Scott Phillips2021-02-171-2/+0
| | | | | | | | | | | | This follows the behavior on x86 where edge triggered interrupts are not disabled when executing the handler. Because the ITS is a shared resource, contention for the command queue lock can be substantial. Suggested by: gallatin Reviewed by: andrew Tested by: gallatin Sponsored by: Ampere Computing LLC Differential Revision: https://reviews.freebsd.org/D28709
* Add ifdef TCPHPTS around build_ack_entry and do_bpf_and_csum to avoidRandall Stewart2021-02-171-0/+2
| | | | | | warnings when HPTS is not included Thanks to Gary Jennejohn for pointing this out.
* arm64: use macros to access special register valuesMitchell Horne2021-02-171-2/+4
* Bump __FreeBSD_version after f2583be110caMitchell Horne2021-02-171-1/+1
| | | | | | Provide a compatibility point around the ABI-breaking change. Sponsored by: The FreeBSD Foundation
* arm64: extend struct db_reg to include watchpoint registersMitchell Horne2021-02-174-16/+92
| | | | | | | | | | | | | | | | | | | The motivation is to provide access to these registers from userspace via ptrace(2) requests PT_GETDBREGS and PT_SETDBREGS. This change breaks the ABI of these particular requests, but is justified by the fact that the intended consumers (debuggers) have not been taught to use them yet. Making this change now enables active upstream work on lldb to begin using this interface, and take advantage of the hardware debugging registers available on the platform. PR: 252860 Reported by: Michał Górny (mgorny@gentoo.org) Reviewed by: andrew, markj (earlier version) Tested by: Michał Górny (mgorny@gentoo.org) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28415
* arm64: handle watchpoint exceptions from EL0Mitchell Horne2021-02-172-0/+7
| | | | | | | | | | | | | | | | | This is a prerequisite to allowing the use of hardware watchpoints for userspace debuggers. This is also a slight departure from the x86 behaviour, since `si_addr` returns the data address that triggered the watchpoint, not the address of the instruction that was executed. Otherwise, there is no straightforward way for the application to determine which watchpoint was triggered. Make a note of this in the siginfo(3) man page. Reviewed by: jhb, markj (earlier version) Tested by: Michał Górny (mgorny@gentoo.org) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28561
* arm64: validate breakpoint registersMitchell Horne2021-02-172-4/+50
| | | | | | | | | | | In particular, we want to disallow setting breakpoints on kernel addresses from userspace. The control register fields are validated or ignored as appropriate. Reviewed by: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28560
* Update the LRO processing code so that we can supportRandall Stewart2021-02-176-123/+829
| | | | | | | | | | | | | | | | a further CPU enhancements for compressed acks. These are acks that are compressed into an mbuf. The transport has to be aware of how to process these, and an upcoming update to rack will do so. You need the rack changes to actually test and validate these since if the transport does not support mbuf compression, then the old code paths stay in place. We do in this commit take out the concept of logging if you don't have a lock (which was quite dangerous and was only for some early debugging but has been left in the code). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28374
* pf: Assert that pfil_link() calls succeedKristof Provost2021-02-171-4/+9
| | | | | | | | These should only fail if we use them incorrectly, so assert that they succeed. MFC after: 1 week Sponsored by: Rubicon Communications, LLC (“Netgate”’)
* arm64: rpi4: gpio: Add brcm,bcm2711-gpio compatibleEmmanuel Vadot2021-02-171-0/+1
| | | | | | | Looks like we never enabled the main gpio controller on the RPI4 board. Now gpio are usable. MFC after: 3 days
* arm64: rpi4: firmware: Attach at BUS_PASS_BUS + BUS_PASS_ORDER_LATEEmmanuel Vadot2021-02-171-1/+1
| | | | | | | | The node have now a compatible with simple-mfd so we need to attach at the same pass so the specific driver will be used. MFC after: 3 days PR: 252971
* pf: Remove unused return value from (de)hook_pf()Kristof Provost2021-02-171-31/+9
| | | | | | | | | | | These functions always return 0, which is good, because the code calling them doesn't handle this error gracefully. As the functions always succeed remove their return value, and the code handling their errors (because it was never executed anyway). MFC after: 1 week Sponsored by: Rubicon Communications, LLC (“Netgate”’)
* OpenSSL: Regen assembly files for OpenSSL 1.1.1jJung-uk Kim2021-02-171-4/+7
* cxgbe(4): Save proper zone index on low memory in refill_fl().Alexander Motin2021-02-171-5/+6
| | | | | | | | | | | | | | | | When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it falls back to safe (4KB) ones. But it still saved into sd->zidx the original fl->zidx instead of fl->safe_zidx. It caused problems with the later use of that cluster, including memory and/or data corruption. While there, make refill_fl() to use the safe zone for all following clusters for the call, since it is unlikely that large succeed. MFC after: 3 days Sponsored by: iXsystems, Inc. Reviewed by: np, jhb Differential Revision: https://reviews.freebsd.org/D28716
* linux: Update the i386/linux vdso deinitialization routineMark Johnston2021-02-161-1/+2
| | | | | | | | This was missed in commit 0fc8a796722 ("linux: Unmap the VDSO page when unloading"). Reported by: Mark Millard MFC with: 0fc8a796722
* Fix NOINET6 build broken by 2fe5a79425c7.Alexander V. Chernikov2021-02-161-0/+8
| | | | Reported by: mjg
* Fix dst/netmask handling in routing socket code.Alexander V. Chernikov2021-02-161-6/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally routing socket code did almost zero checks on the input message except for the most basic size checks. This resulted in the unclear KPI boundary for the routing system code (`rtrequest*` and now `rib_action()`) w.r.t message validness. Multiple potential problems and nuances exists: * Host bits in RTAX_DST sockaddr. Existing applications do send prefixes with hostbits uncleared. Even `route(8)` does this, as they hope the kernel would do the job of fixing it. Code inside `rib_action()` needs to handle it on its own (see `rt_maskedcopy()` ugly hack). * There are multiple way of adding the host route: it can be DST without netmask or DST with /32(/128) netmask. Also, RTF_HOST has to be set correspondingly. Currently, these 2 options create 2 DIFFERENT routes in the kernel. * no sockaddr length/content checking for the "secondary" fields exists: nothing stops rtsock application to send sockaddr_in with length of 25 (instead of 16). Kernel will accept it, install to RIB as is and propagate to all rtsock consumers, potentially triggering bugs in their code. Same goes for sin_port, sin_zero, etc. The goal of this change is to make rtsock verify all sockaddr and prefix consistency. Said differently, `rib_action()` or internals should NOT require to change any of the sockaddrs supplied by `rt_addrinfo` structure due to incorrectness. To be more specific, this change implements the following: * sockaddr cleanup/validation check is added immediately after getting sockaddrs from rtm. * Per-family dst/netmask checks clears host bits in dst and zeros all dst/netmask "secondary" fields. * The same netmask checking code converts /32(/128) netmasks to "host" route case (NULL netmask, RTF_HOST), removing the dualism. * Instead of allowing ANY "known" sockaddr families (0<..<AF_MAX), allow only actually supported ones (inet, inet6, link). * Automatically convert `sockaddr_sdl` (AF_LINK) gateways to `sockaddr_sdl_short`. Reported by: Guy Yur <guyyur at gmail.com> Reviewed By: donner Differential Revision: https://reviews.freebsd.org/D28668 MFC after: 3 days
* Add ifa_try_ref() to simplify ifa handling inside epoch.Alexander V. Chernikov2021-02-162-1/+12
| | | | | | | | | | | | | | | | | | | | | | | More and more code migrates from lock-based protection to the NET_EPOCH umbrella. It requires some logic changes, including, notably, refcount handling. When we have an `ifa` pointer and we're running inside epoch we're guaranteed that this pointer will not be freed. However, the following case can still happen: * in thread 1 we drop to 0 refcount for ifa and schedule its deletion. * in thread 2 we use this ifa and reference it * destroy callout kicks in * unhappy user reports bug To address it, new `ifa_try_ref()` function is added, allowing to return failure when we try to reference `ifa` with 0 refcount. Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide cleaner error in such scenarious. Reviewed By: rstone, donner Differential Revision: https://reviews.freebsd.org/D28639 MFC after: 1 week
* Make in_localip_more() fib-aware.Alexander V. Chernikov2021-02-161-12/+12
| | | | | | | | | | It fixes loopback route installation for the interfaces in the different fibs using the same prefix. Reviewed By: donner PR: 189088 Differential Revision: https://reviews.freebsd.org/D28673 MFC after: 1 week