* jail: Improve locking when removing prisonsJamie Gritton2021-02-201-28/+41
| | | | | | | | | Change the flow of prison_deref() so it doesn't let go of allprison_lock until it's completely done using it (except for a possible drop as part of an upgrade on its first try). Differential Revision: https://reviews.freebsd.org/D28458 MFC after: 3 days
* PRR: use accurate rfc6675_pipe when enabledRichard Scheffenegger2021-02-201-3/+9
| | | | | | | Reviewed By: #transport, tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28816
* Fix setting static entries for arp/ndp.Alexander V. Chernikov2021-02-202-0/+27
| | | | | | | | | rtsock message validation changes committed in 2fe5a79425c7 did not take llinfo messages into account. Add a special validation case for RTA_GATEWAY llinfo messages. MFC after: 2 days
* cxgbe(4): Use the correct filter width for T5+.Navdeep Parhar2021-02-193-1/+6
| | | | | | | | | T5 and above have extra bits for the optional filter fields. This is a correctness issue and not just a waste because a filter mode valid on a T4 (36b) may not be valid on a T5+ (40b). MFC after: 2 weeks Sponsored by: Chelsio Communications
* cxgbe(4): Add a driver ioctl to set the filter mask.Navdeep Parhar2021-02-194-0/+48
| | | | | | | | | | Allow the filter mask (aka the hashfilter mode when hashfilters are in use) to be set any time it is safe to do so. The requested mask must be a subset of the filter mode already. The driver will not change the mode or ingress config just to support a new mask. MFC after: 2 weeks Sponsored by: Chelsio Communications
* cxgbe(4): Use firmware commands to get/set filter configuration.Navdeep Parhar2021-02-196-129/+287
| | | | | | | | | | | | | | | | | | | | 1. Query the firmware for filter mode, mask, and related ingress config instead of trying to figure them out from hardware registers. Read configuration from the registers only when the firmware does not support this query. 2. Use the firmware to set the filter mode. This is the correct way to do it and is more flexible as well. The filter mode (and associated ingress config) can now be changed any time it is safe to do so. The user can specify a subset of a valid mode and the driver will enable enough bits to make sure that the mode is maxed out -- that is, it is not possible to set another bit without exceeding the total width for optional filter fields. This is a hardware requirement that was not enforced by the driver previously. MFC after: 2 weeks Sponsored by: Chelsio Communications
* jail: Change both root and working directories in jail_attach(2)Jamie Gritton2021-02-193-4/+44
| | | | | | | | | | | | | jail_attach(2) performs an internal chroot operation, leaving it up to the calling process to assure the working directory is inside the jail. Add a matching internal chdir operation to the jail's root. Also ignore kern.chroot_allow_open_directories, and always disallow the operation if there are any directory descriptors open. Reported by: mjg Approved by: markj, kib MFC after: 3 days
* iflib: Fix detach of pseudo interfacesMark Johnston2021-02-191-5/+3
| | | | | | | | | | | | | | | | | | | | | | In commit 38bfc6dee33b we added an IFDI_DETACH() call to iflib_pseudo_deregister() since it looked like it was missing. One is present in the error-handling path of iflib_pseudo_register(). However, the detach actually comes from the DEVICE_DETACH() method for the above-mentioned device_t, so now we're calling IFDI_DETACH() twice when destroying a pseudo interface. Fix the problem by not calling IFDI_DETACH() from the device detach routine. This way we can ensure that iflib de-initialization always happens in a consistent order. It also ensures that you can't do silly things like "devctl detach <pseudo ifnet>", which would previously detach the driver without tearing down the corresponding ifnet. PR: 253541 Reviewed by: erj MFC after: 1 week Fixes: 38bfc6dee33b Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28774
* Fix arp/ndp deletion broken by 2fe5a79425c7.Alexander V. Chernikov2021-02-191-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Changes in the 2fe5a79425c7 moved dst sockaddr masking from the routing control plane to the rtsock code. It broke arp/ndp deletion. It turns out, arp/ndp perform RTM_GET request first to get an interface index necessary for the deletion. Then they simply stamp the reply with RTF_LLDATA and set the command to RTM_DELETE. As a result, kernel receives request with non-empty RTA_NETMASK and clears RTA_DST host bits before passing the message to the lla code. De facto, the only needed bits are RTA_DST, RTA_GATEWAY and the subset of rtm_flags. With that in mind, fix the interace by clearing RTA_NETMASK for every messages with RTF_LLDATA. While here, cleanup arp/ndp code a bit. MFC after: 1 day Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D28804
* fbio: Use appropriate types for the physical and virtual framebuffer addressAlfredo Dal'Ava Junior2021-02-191-2/+2
| | | | | | | | | | | | Use appropriate types for the physical and virtual framebuffer address. Fixes framebuffers mapped above 4G physical on 32-bit systems that support physical address extensions like i386 and Book-E powerpc. Patch developed by bdragon Reviewed by: bdragon, luporl Relnotes: yes Differential Revision: https://reviews.freebsd.org/D28604
* iflib: Cast the result of iflib_netmap_txq_init() to void.John Baldwin2021-02-191-1/+1
| | | | | | | | This fixes a warning from GCC for kernels without netmap since the return value is never used. Reviewed by: vmaffione, erj Differential Revision: https://reviews.freebsd.org/D28598
* Microoptimize CTL I/O queues.Alexander Motin2021-02-196-80/+81
| | | | | | | | | | | | Switch OOA queue from TAILQ to LIST and change its direction, so that we traverse it forward, not backward. There is only one place where we really need other direction, and it is not critical. Use STAILQ_REMOVE_HEAD() instead of STAILQ_REMOVE() in backends. Replace few impossible conditions with assertions. MFC after: 1 month
* Reimplement the arm64 dtrace_gethrtime(), which provides theRobert Watson2021-02-191-10/+13
| | | | | | | | | | | | | high-resolution nanosecond timestamp used for the DTrace 'timestamp' built-in variable. The new implementation uses the EL0 cycle counter and frequency registers in ARMv8-A. This replaces a previous implementation that relied on an instrumentation-safe implementation of getnanotime(), which provided only timer resolution. MFC after: 3 days Reviewed by: andrew, bsdimp (older version) Useful comments appreciated: jrtc27, emaste
* ofwfb: fix incorrect colors on powerpc* and add new tunable parametersAlfredo Dal'Ava Junior2021-02-191-42/+123
| | | | | | | | | | | | | | | | | | | | | | - Implements little-endian support (powerpc64le) - Adds 'hw.ofwfb.physaddr' kernel parameter so user can manually provide correct address if it's not detected correctly - Adds 'hw.ofwfb.argb32_pixel' so user can set it manually if colors are inverted due to incorrect pixel format (default = 1) - Automatically selects RGBA32 pixel format if NVidia graphic adapter is detected (sets hw.ofwfb.argb32_pixel=0) Machines equipped with NVidia graphic adapters tend to use RGBA32 pixel format. By default ARGB32 pixel format is used, proved to work on machines equipped with ATI graphic adapter and the onboard adapter used on Talos II and Blackbird machines from Raptor Computing Systems. Original patch developed by bdragon Reviewed by: bdragon, luporl MFC after: 3 days Relnotes: yes Differential Revision: https://reviews.freebsd.org/D28604
* Remove __XSCALE__ checks from the arm codeAndrew Turner2021-02-192-11/+0
| | | | | | | XScale support was removed over 2 years ago, remove the last __XSCALE__ checks from the arm MD code. Sponsored by: Innovate UK
* Ensure cwnd doesn't shrink to zero with PRRRichard Scheffenegger2021-02-191-2/+2
| | | | | | | | | | | Under some circumstances, PRR may end up with a fully collapsed cwnd when finalizing the loss recovery. Reviewed By: #transport, kbowling Reported by: Liang Tian MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28780
* kern: net: remove TCP_LINGERTIMEKyle Evans2021-02-193-6/+0
| | | | | | | | | | | | | | | | | | | | | | TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in exactly the same form that it appears here modulo slightly different context. It used to be the case that there was a single pr_usrreq method with requests dispatched to it; these exact two lines appeared in tcp_usrreq's PRU_ATTACH handling. The only purpose of this that I can find is to cause surprising behavior on accepted connections. Newly-created sockets will never hit these paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is set on a listening socket and inherited, one would expect the timeout to be inherited rather than changed arbitrarily like this -- noting that SO_LINGER is nonsense on a listening socket beyond inheritance, since they cannot be 'connected' by definition. Neither Illumos nor Linux reset the timer like this based on testing and inspection of Illumos, and testing of Linux. Reviewed by: rscheff, tuexen Differential Revision: https://reviews.freebsd.org/D28265
* Save context switch per I/O for iSCSI and IOCTL frontends.Alexander Motin2021-02-196-25/+65
| | | | | | | | | | | | | | | Introduce new CTL core KPI ctl_run(), preprocessing I/Os in the caller context instead of scheduling another thread just for that. This call may sleep, that is not acceptable for some frontends like the original CAM/FC one, but iSCSI already has separate sleepable per-connection RX threads, and another thread scheduling is mostly just a waste of time. IOCTL frontend actually waits for the I/O completion in the caller thread, so the use of another thread for this has even less sense. With this change I can measure ~5% IOPS improvement on 4KB iSCSI I/Os to ZFS. MFC after: 1 month
* nvdimm(4): Export NVDIMM health flags via sysctlRavi Pokala2021-02-183-1/+77
| | | | | | | | | | | | | | | The ACPI NFIT specification defines a set of "NVDIMM State Flags". These flags are already reported by `acpidump -t', but this change makes them available on a per-device basis, in a format that is more easily parsed. To simplify this, introduce acpi_nfit_get_memory_maps_by_dimm(), which locates the (ACPI_NFIT_MEMORY_MAP)s associated with a given (nfit_handle_t). Reviewed by: mav, cem Tested by: mav, rpokala (version for stable/12) MFC after: 3 days Sponsored by: Panasas
* Move XPT_IMMEDIATE_NOTIFY handling out of periph lock.Alexander Motin2021-02-181-1/+2
| | | | | | It is a rare, but still better to not have lock dependencies. MFC after: 1 month
* cgem: improve usage of busdma(9) KPIMitchell Horne2021-02-181-8/+4
| | | | | | | | | | | | | | | BUS_DMA_NOCACHE should only be used when one needs to guarantee the created mapping has uncached memory attributes, usually as a result of buggy hardware. Normal use cases should pass BUS_DMA_COHERENT, to create an appropriate mapping based on the flags passed to bus_dma_tag_create(). This should have no functional change, since the DMA tags in this driver are created without the BUS_DMA_COHERENT flag. Reported by: mmel Reviewed by: mmel, Thomas Skibo <thomas-bsd@skibo.net> MFC after: 3 days
* cryptosoft: Support per-op keys for AES-GCM and AES-CCM.John Baldwin2021-02-181-0/+6
| | | | | | Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D28752
* Add Chacha20-Poly1305 support in the OCF backend for KTLS.John Baldwin2021-02-181-21/+95
| | | | | | | | | This supports Chacha20-Poly1305 for both send and receive for TLS 1.2 and for send in TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27841
* Add Chacha20-Poly1305 as a KTLS cipher suite.John Baldwin2021-02-182-14/+63
| | | | | | | | | | | | Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the server and client IVs as implicit nonces xored with the record sequence number to generate the per-record nonce matching the construction used with AES-GCM for TLS 1.3. Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27839
* Add an implementation of CHACHA20_POLY1305 to cryptosoft.John Baldwin2021-02-185-5/+335
| | | | | | | | | | This uses the chacha20 IETF and poly1305 implementations from libsodium. A seperate auth_hash is created for the auth side whose Setkey method derives the poly1305 key from the AEAD key and nonce as described in RFC 8439. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27837
* Add an OCF algorithm for ChaCha20-Poly1305 AEAD.John Baldwin2021-02-187-7/+65
| | | | | | | | Note that this algorithm implements the mode defined in RFC 8439. Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27836
* ddb: fix show devmap output on 32-bit armThomas Skibo2021-02-181-1/+3
| | | | | | | | | The output has been broken since 1b6dd6d772ca. Casting to uintmax_t before the call to printf is necessary to ensure that 32-bit addresses are interpreted correctly. PR: 243236 MFC after: 3 days
* arm64: Include NUMA locality info in the CPU topologyMark Johnston2021-02-181-1/+28
| | | | | | | | | | | | | The scheduler uses this topology to try and preserve locality when migrating threads between CPUs and when performing work stealing. Ensure that on NUMA systems it will at least take the NUMA topology into account. Reviewed by: mmel Submitted by: Klara, Inc. Sponsored by: Ampere Computing MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28579
* ipmi_ssif: Fix inverted for the end of multi-part readsAllan Jude2021-02-181-1/+1
| | | | | | | | | | As per Intelligent Platform Management Interface Specification v2.0 rev. 1.1, section 12.5: SSIF Multi-part Read Transactions Sponsored by: Ampere Computing LLC Submitted by: Klara Inc. Reviewed by: manu Differential Revision: https://reviews.freebsd.org/D28749
* ig4(4): Increase timeout to about 1 secondAllan Jude2021-02-181-1/+1
| | | | | | | | | | | | | | | | | | Per the i2c spec, a slave device can stretch SCL idefinitely, so 25ms is a bit arbitrary in general. smbus does specify an optional timeout recovery mechanism to be done at about 25~35ms, but the IPMI SSIF spec says that BMCs don't have any obligation to implement that. The BMC on Altra seems to mostly respond within 25ms, but occasionally will stretch SCL for ~300 msec. Also, the count_us mechanism seems to actually timeout around 25% earlier than it would claim (timeout really happening around 19ms instead of 25ms). Sponsored by: Ampere Computing LLC Submitted by: Klara Inc. Reviewed by: manu, imp Differential Revision: https://reviews.freebsd.org/D28747
* vn_printf: handle VI_FOPENINGKonstantin Belousov2021-02-181-1/+3
| | | | | | | Noted by: mjg Sponsored by: The FreeBSD Foundation MFC after: 6 days Fixes: fa3bd463cee
* zfs: bump version and install new share filesMartin Matuska2021-02-181-2/+2
| | | | | | | | - bump version to 2.0.0-FreeBSD_gbf156c966 - install definition files for the new "-o compatibility" option to "zpool create" MFC after: 2 weeks
* zfs: merge OpenZFS master-bf156c966Martin Matuska2021-02-1853-2855/+5510
| | | | | | | | | | | | | | | | | | Notable upstream changes: bf156c966 Remove unused abd_alloc_scatter_offset_chunkcnt 658fb8020 Add "compatibility" property for zpool feature sets This update introduces a new pool property called "compatibility" that can be used to enable a limited set of pool features on pool creation and "stick" to it, so the "zpool upgrade" does not accidentally enable features that are not desired. The value of this property may then be changed later. See zpool-features(5) for more information about the "compatibility" pool property. Obtained from: OpenZFS MFC after: 2 weeks
* Use atomic loads/stores when updating td->td_stateAlex Richardson2021-02-1814-37/+49
| | | | | | | | | | | | | | | KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569
* Allocate BAR for ENA MSIx vector tableMichal Krawczyk2021-02-182-1/+24
| | | | | | | | | | | | | | | | | | | | In the new ENA-based instances like c6gn, the vector table moved to a new PCIe bar - BAR1. Previously it was always located on the BAR0, so the resources were already allocated together with the registers. As the FreeBSD isn't doing any resource allocation behind the scenes, the driver is responsible to allocate them explicitly, before other parts of the OS (like the PCI code allocating MSIx) will be able to access them. To determine dynamically BAR on which the MSIx vector table is present the pci_msix_table_bar() is being used and the new BAR is allocated if needed. Submitted by: Michal Krawczyk <mk@semihalf.com> Obtained from: Semihalf Sponsored by: Amazon, Inc MFC after: 3 days
* fix Navdeeps LINT_NOINET error.Randall Stewart2021-02-181-0/+2
* cxgbe(4): Break up t4_read_chip_settings.Navdeep Parhar2021-02-186-44/+62
| | | | | | | | | | Read the PF-only hardware settings directly in get_params__post_init. Split the rest into two routines used by both the PF and VF drivers: one that reads the SGE rx buffer configuration and another that verifies miscellaneous hardware configuration. MFC after: 1 week Sponsored by: Chelsio Communications
* pf: Fix osfp configurationKristof Provost2021-02-181-1/+1
| | | | | | | | pf_rule_to_krule() incorrectly converted the rule osfp configuration to the krule structure. Reported by: delphij@ MFC after: 3 days
* Fix another pesky missing #ifdef TCPHPTSRandall Stewart2021-02-181-0/+2
* mips: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-2/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. At the time libc did not support TLS on MIPS (I believe), but TLS support was added to libc (at least _set_tp.c) for MIPS about a month after __NO_TLS was added, but __NO_TLS was still left around. Reviewed by: imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28713
* riscv: Don't set __NO_TLS to disable some uses of TLS.John Baldwin2021-02-181-1/+1
| | | | | | | | | | | | __NO_TLS was originally added to disable use of _Thread in the locale code in libc in 82dd5016bd749d1d9e1531bd1703aebeecceab34. The initial RISC-V import set this for RISC-V presumably due to immaturity in the toolchains at the time. However, TLS via _Thread works fine in both GCC and clang on RISC-V. Reviewed by: mhorne, imp Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28712
* Add a VA_IS_CLEANMAP() macro.John Baldwin2021-02-188-17/+16
| | | | | | | | | | | | | | This macro returns true if a provided virtual address is contained in the kernel's clean submap. In CHERI kernels, the buffer cache and transient I/O map are allocated as separate regions. Abstracting this check reduces the diff relative to FreeBSD. It is perhaps slightly more readable as well. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28710
* lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK)Konstantin Belousov2021-02-174-3/+38
| | | | | | | | | | | | | or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
* uart: only use MSI on devices that advertise 1 MSI vectorWarner Losh2021-02-171-3/+3
| | | | | | | | | | This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be used. Since we're just attaching one instance, the meaning of more than one vector is not clear and seems to cause problems. Fall back to old methods for these cards. PR: 235016 Submitted by: David Cross
* gicv3_its: Don't restrict target CPUs based on SRATD Scott Phillips2021-02-171-17/+34
| | | | | | | | | | | | | | | | | | | | | | ACPI Sec (SRAT, GIC Interrupt Translation Service (ITS) Affinity Structure) says: > The GIC ITS Affinity Structure provides the association between > a GIC ITS and a proximity domain. This enables the OSPM to > discover the memory that is closest to the ITS, and use that in > allocating its management tables and command queue. Previously the ITS driver was using the proximity domain to restrict which CPUs can be targeted by an LPI. We keep that logic just for the original dual socket ThunderX which cannot forward LPIs between sockets. We also use the SRAT entry for its intended purpose of attempting to allocate ITS table structures near the ITS. Reviewed by: andrew Sponsored by: Ampere Computing LLC Differential Revision: https://reviews.freebsd.org/D28340
* Giant: move back Giant removal until 14Warner Losh2021-02-171-1/+1
| | | | | | | Update the Giant Lock warning message to FreeBSD 14. It's growing increasling clear that this won't be done before 13.0. MFC: Insta (re@'s request)
* Handle negative return values from syncache_expand().John Baldwin2021-02-171-5/+15
| | | | | | | | | | | | | | | These errors do not clear so to NULL, so the existing check was treating these failures as success. The rest of do_pass_establish() then tried to use the listen socket as if it was a connection socket newly created by syncache_expand(). In addition, for negative return values, do not send a RST to the peer. Reported by: Sony Arpita Das @ Chelsio Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D28243
* fwohci: Cast bitfield to uint32_t before passing it to roundup2().John Baldwin2021-02-171-1/+1
| | | | | | | | | | The fallback for __align_up() used by roundup2() uses __typeof__() which doesn't work for bitfields. This fixes the build on GCC which uses the fallback. Reviewed by: arichardson, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D28599
* gicv3_its: Leave LPI interrupts enabled during handlingD Scott Phillips2021-02-171-2/+0
| | | | | | | | | | | | This follows the behavior on x86 where edge triggered interrupts are not disabled when executing the handler. Because the ITS is a shared resource, contention for the command queue lock can be substantial. Suggested by: gallatin Reviewed by: andrew Tested by: gallatin Sponsored by: Ampere Computing LLC Differential Revision: https://reviews.freebsd.org/D28709
* Add ifdef TCPHPTS around build_ack_entry and do_bpf_and_csum to avoidRandall Stewart2021-02-171-0/+2
| | | | | | warnings when HPTS is not included Thanks to Gary Jennejohn for pointing this out.