path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
* kinst: fix kinst_probe_md field indentationChristos Margiolis18 hours1-9/+9
| | | | | | | Reviewed by: markj Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40411
* kinst: use bool where appropriateChristos Margiolis18 hours3-12/+12
| | | | | | | Reviewed by: markj Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40412
* kinst: simplify trampoline fill definitionsChristos Margiolis21 hours2-5/+6
| | | | | | | | | | | Centralize KINST_TRAMP_FILL_PATTERN and KINST_TRAMP_FILL_SIZE to reduce redefinitions, and use the architecture-dependent kinst_patchval_t as their size. Reviewed by: markj Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40406
* fbt: simplify arm64 function-prologue parsingChristos Margiolis21 hours1-26/+16
| | | | | | | Reviewed by: markj Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40364
* dtrace: deduplicate arm64 breakpoint definitionChristos Margiolis21 hours2-4/+7
| | | | | | | Reviewed by: markj Approved by: markj (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D40363
* Add gve, the driver for Google Virtual NIC (gVNIC)Shailend Chand40 hours15-0/+5296
| | | | | | | | | | | | | | | | | gVNIC is a virtual network interface designed specifically for Google Compute Engine (GCE). It is required to support per-VM Tier_1 networking performance, and for using certain VM shapes on GCE. The NIC supports TSO, Rx and Tx checksum offloads, and RSS. It does not currently do hardware LRO, and thus the software-LRO in the host is used instead. It also supports jumbo frames. For each queue, the driver negotiates a set of pages with the NIC to serve as a fixed bounce buffer, this precludes the use of iflib. Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39873
* ossl: Compile newly added files into the kernel if so requestedMark Johnston44 hours1-0/+4
| | | | | Fixes: 9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64") Fixes: 9b1d87286c78 ("ossl: Add a fallback AES-GCM implementation using AES-NI")
* ipsec: Make algorithm tables read-onlyMark Johnston44 hours1-3/+3
| | | | | | No functional change intended. MFC after: 1 week
* x86: Mark the CPU idle function table as constMark Johnston44 hours1-3/+4
| | | | | | No functional change intended. MFC after: 1 week
* kevent: Make references to filter definitions constMark Johnston44 hours2-11/+10
| | | | | | | | | Follow-up revisions can make individual filter definitions const. No functional change intended. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35842
* <sys/memrange.h>: Include <sys/ioccom.h>.John Baldwin44 hours1-0/+2
| | | | | | | | | This makes this header more self-contained. Reviewed by: imp, markj Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D40387
* Fix panic in nfs bootp/diskless after 0785c323f3.Alexander Motin45 hours1-8/+3
| | | | | If there is no interface, count won't be initialized, while cnt is not even relevant. Check ifp, that really matters, and delete count.
* nlsysevent: Fix the EXPORT_SYMS definitionMark Johnston46 hours1-1/+1
| | | | | | EXPORT_SYMS=YES has a special meaning, EXPORT_SYMS=yes does not. Fixes: 8a2af0b469b6 ("nlsysevent: add a genetlink(4) module to report kernel events")
* ossl: Add a fallback AES-GCM implementation using AES-NIMark Johnston46 hours3-3/+481
| | | | | | | | | | | | | | | This lets one use ossl(4) for AES-GCM operations on contemporary amd64 platforms. A kernel benchmark indicates that this gives roughly equivalent throughput to aesni(4) for various buffer sizes. Bulk processing is done in aesni-gcm-x86_64.S, the rest is handled in a C wrapper ported from OpenSSL's gcm128.c. Sponsored by: Stormshield Sponsored by: Klara, Inc. Reviewed by: jhb MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D39967
* ossl: Add a VAES-based AES-GCM implementation for amd64Mark Johnston46 hours8-10/+136616
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | aes-gcm-avx512.S is generated from OpenSSL 3.1 and implements AES-GCM. ossl_x86.c detects whether the CPU implements the required AVX512 instructions; if not, the ossl(4) module does not provide an AES-GCM implementation. The VAES implementation increases throughput for all buffer sizes in both directions, up to 2x for sufficiently large buffers. The "process" implementation is in two parts: a generic OCF layer in ossl_aes.c that calls a set of MD functions to do the heavy lifting. The intent there is to make it possible to add other implementations for other platforms, e.g., to reduce the diff required for D37421. A follow-up commit will add a fallback path to legacy AES-NI, so that ossl(4) can be used in preference to aesni(4) on all amd64 platforms. In the long term we would like to replace aesni(4) and armv8crypto(4) with ossl(4). Note, currently this implementation will not be selected by default since aesni(4) and ossl(4) return the same probe priority for crypto sessions, and the opencrypto framework selects the first registered implementation to break a tie. Since aesni(4) is compiled into the kernel, aesni(4) wins. A separate change may modify ossl(4) to have priority. Sponsored by: Stormshield Sponsored by: Klara, Inc. Reviewed by: jhb MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D39783
* ossl: Expose more CPUID bits in OPENSSL_ia32cap_PMark Johnston46 hours1-1/+2
| | | | | | | | | | | | | This is needed to let OpenSSL 3.1 routines detect VAES and VPCLMULQDQ extensions. The intent is to import ASM routines which implement AES-GCM using VEX-prefixed AES-NI instructions. No functional change intended. Sponsored by: Stormshield Sponsored by: Klara, Inc. MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D39782
* netlink: fix compilation withous INET6Gleb Smirnoff46 hours1-2/+0
| | | | Fixes: a77facd27368f618520d25391cfce11149879a41
* arm64: Fix the definition of ID_AA64DFR1_EL1Andrew Turner46 hours1-1/+1
* Add more arm64 ID registersAndrew Turner46 hours1-0/+16
| | | | | | These will be used by bhyve to emulate these registers. Sponsored by: Arm Ltd
* arm64: Correct a pmap unlock in pmap_stage2_faultAndrew Turner46 hours1-1/+1
| | | | | This is used by bhyve so was not an issue as it is still in development. Sponsored by: Arm Ltd
* pf: fix log messageKristof Provost48 hours1-1/+1
| | | | | | Use __func__ so we log the correct function name. Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: carry over rule actions from route-to rulesKristof Provost48 hours3-11/+20
| | | | | | | | | | | | | | | | | | | | If we route-to (or dup-to/reply-to) we re-run pf_test(), which will also create states for the connection. This means that we may end up matching a different (i.e. not the state that was created by the route-to rule) state, without the attributes (such as dummynet pipes/queues) set by the route-to rule. Address this by inheriting the pf_rule_actions from the route-to rule while evaluating the connection again in pf_test(). That is, we set default pf_rule_actions based on the route-to rule for the new evaluation. The new rule may still overrule these, but if it does not have such actions the route-to actions are applied. Do the same for IPv6 rules in pf_test6()/pf_route6(). See also: https://redmine.pfsense.org/issues/14039 Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D40340
* netlink: use netlink mbufs in the mbuf chains.Alexander V. Chernikov2 days5-12/+290
| | | | | | | | | Continue D40356 and switch the remaining parts of mbuf-related code to the Netlink mbufs. Reviewed By: gallatin Differential Revision: https://reviews.freebsd.org/D40368 MFC after: 2 weeks
* nlsysevent: add default command to the eventsBaptiste Daroussin2 days2-5/+14
* nlsysevent: deduplicate the code and split into smaller functionsBaptiste Daroussin2 days1-15/+28
| | | | | | No functional changes intended Suggested by: melifaro
* nlsysevent: rename variables for clarity of the codeBaptiste Daroussin2 days1-18/+18
| | | | Suggested by: melifaro
* nlsysevent: specify all netlink header the same wayBaptiste Daroussin2 days1-2/+1
* sysarch: Add includes required for ktrcapfail() calls to be compiledMark Johnston3 days3-0/+6
| | | | | Reported by: jfree MFC after: 1 week
* ktrace: Make the data lengths table constMark Johnston3 days1-1/+1
| | | | | | No functional change intended. MFC after: 1 week
* signal: Make the signal disposition table constMark Johnston3 days1-1/+1
| | | | | | No functional change intended. MFC after: 1 week
* ktrace: Make sys/ktrace.h self-containedMark Johnston3 days1-0/+2
| | | | MFC after: 2 weeks
* nlsysevent: add a genetlink(4) module to report kernel eventsBaptiste Daroussin3 days4-0/+220
| | | | | | | | Hooked to devctl_notify, this allows consumers to received events by subscribing to a system over a generic netlink protocol Reviewed by: imp, melifaro Differential Revision: https://reviews.freebsd.org/D37574
* devctl: allow to register a hook to receive the eventsBaptiste Daroussin3 days2-0/+66
| | | | | | | | | | | | In preparation for netlink sysvent add a function that allow registering a function to hook the events and also send it via another kernel module (nlsysvent will be that module). Prepare a static list of known existing events in the kernel that will be used to prepopulate nlsysvent multicast group (one per event) Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D37573
* cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt.Cheng Cui3 days2-43/+50
| | | | | | | | | | This improves TCP friendly cwnd in cases of low latency high drop rate networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps cases. Reported by: Bhaskar Pardeshi from VMware. Reviewed By: rscheff, tuexen Approved by: rscheff (mentor), tuexen (mentor)
* netinet6: make IPv6 fragment TTL per-VNET configurable.Alexander V. Chernikov3 days3-7/+61
| | | | | | | | | | Having it configurable adds more flexibility, especially for the systems with low amount of memory. Additionally, it allows to speedup frag6/ tests execution. Reviewed by: kp, markj, bz Differential Revision: https://reviews.freebsd.org/D35755 MFC after: 2 weeks
* ifnet: consistently call hooks when the interface gets up.Alexander V. Chernikov3 days3-32/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some context on the current IPv6 interface setup & address management: There are two data path for IPv6 initialisation in context of assigning LL addresses: 1) Userland explicitly requests IFF_UP for the interface w/o any addresses. if_up() then calls in6_if_up(), which calls in6_ifattach(). The latter sets up some initial ND/IN6 state and disables IPv6 for the interface if it’s not loopback. If the interface is loopback, then it adds ::1/128 and LL addresses via in6_ifattach_loopback(). Then, devd notification is generated (if the VNET is the default one), which triggers rc.network ifconfig_up(), causing ifdisabled to be removed via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler calls in6_if_up() once again and it assigns the interface link-local address. 2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6] kernel handler calls IPv4/IPv6 protocol handler to add the address. Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code compares old and new interface flags and, if IFF_UP is added, it explicitly calls in6_if_up(), which adds link-local address if either the original address is IPv6 or the interface is loopback. In the latter case, “formal” interface-up notifications are missing. The kernel does not trigger event handler event, does not call carp hook and does not provide any userland notification. This diff unifies the event handling in both scenarios, providing the necessary notifications to the kernel and userland. Reviewed By: kp Differential Revision: https://reviews.freebsd.org/D40332 MFC after: 2 weeks
* bridge: fix lookup for untagged packets in bridge_transmit()Ben Wilber3 days1-1/+2
| | | | | | | | | | | | b0e38a1373 improved if_bridge's ability to cope with different VLANs, but it failed to update bridge_transmit() to cope with the new rule that untagged packets are treated as having VLAN ID 0 (rather than 1, as used to be the case). Fix that oversight. PR: 270559 Reviewed by: kp
* netlink: use custom uma zone for the mbuf storage.Alexander V. Chernikov3 days3-9/+81
| | | | | | | | | | | | | | | | | | | | | | Netlink communicates with userland via sockets, utilising MCLBYTES-sized mbufs to append data to the socket buffers. These mbufs are never transmitted via logical or physical network. It may be possible that the 2k mbuf zone is temporary exhausted due to the DDoS-style traffic, leading to Netlink failure to respond to the requests. To address it, this change introduces a custom Netlink-specific zone for the mbuf storage. It has the following benefits: * no precious memory from UMA_ZONE_CONTIG zones is utilized for Netlink * Netlink becomes (more) independent from the traffic spikes and other related network "corner" conditions. * Netlink allocations are now isolated within a specific zone, making it easier to track Netlink mbuf usage and attribute mbufs. Reviewed by: gallatin, adrian Differential Revision: https://reviews.freebsd.org/D40356 MFC after: 2 weeks
* tcp: Refactor tcp_get_srtt()Jonathan T. Looney4 days1-22/+27
| | | | | | | | | Refactor tcp_get_srtt() into its two component operations: unit conversion and shifting. No functional change is intended. Reviewed by: cc, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D40304
* pf: fix pf_nv##_array() size checkKristof Provost4 days1-1/+1
| | | | | | | | We want to set the maximum number of elements we'll accept, not the exact number we need. MFC after: 3 weeks Sponsored by: Orange Business Services
* tap(4): allow full-duplex and non-zero speedAlexandre Snarskii4 days1-1/+1
| | | | | | | | | | | | | | | | | tap(4) devices advertise themselves as just 'ethernet autoselect', without duplex or speed capabilities. This advertisement makes them unable to be aggregated into lacp-based lagg(4): - lacp code requires underlying interfaces to be full-duplex, else interface will not participate in lacp at all - lacp code requires underlying interface to have non-zero speed, else this interface can not be selected as active aggregator PR: 217374 Reported-by: Alexandre Snarskii <snar@snar.spb.ru> Co-authored-by: Mina Galić <freebsd@igalic.co> Reviewed-by: imp,karles Pull-request: https://github.com/freebsd/freebsd-src/pull/745
* gicv3: Use an offset to find the redist registersAndrew Turner4 days3-23/+33
| | | | | | | | | | | | | | | | | To find the redistributor registers use the resource we have already found and add an offset. This removed the need to create a per-redistributor resource as it can now be a pointer to the resource found in attach. While here check the offset is within the bounds of the resource. Some ACPI tables list each redistributor as a separate memory range, even if they are physically contiguous. In this case we may not have each resource virtually contiguous with neighbouring resources. This can lead to a data abort when reading past the resource range. Reviewed by: kevans Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D40263
* netlink: fix ifconfig P2P inet ADDR ADDR netmask additionAlexander V. Chernikov4 days1-56/+76
| | | | | | | | | | | | | | | | | | | | | | Adding P2P addresses is complex in both ioctl and Netlink. In the ioctl interface, "broadcast" field is the same field as the "peer". In is possible to specify non-p2p address for the p2p interface in IPv6, but not in IPv4. In the Netlink interface, "address" field means "peer" address. As a result, a common notion for the Netlink users is to submit same address/peer for non-P2P interfaces. This change customises mapping the attribute on per-family basis. Specifically, for IPv4 - if the interface is P2P, assume "address" is p2p and "local" is the address. If the interfase is non-p2p, use "local" attribute as the address. If it's not set, use "address" attribute. for IPv6 - start with "local" attribute as the address. If it's not set, use use "address" attribute. If both are set and both are the same, assume non p2p, otherwise add as p2p. MFC after: 2 weeks Reported by: jkim
* netinet*: Fix redirects for connections from localhostDoug Rabson4 days4-1/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redirect rules use PFIL_IN and PFIL_OUT events to allow packet filter rules to change the destination address and port for a connection. Typically, the rule triggers on an input event when a packet is received by a router and the destination address and/or port is changed to implement the redirect. When a reply packet on this connection is output to the network, the rule triggers again, reversing the modification. When the connection is initiated on the same host as the packet filter, it is initially output via lo0 which queues it for input processing. This causes an input event on the lo0 interface, allowing redirect processing to rewrite the destination and create state for the connection. However, when the reply is received, no corresponding output event is generated; instead, the packet is delivered to the higher level protocol (e.g. tcp or udp) without reversing the redirect, the reply is not matched to the connection and the packet is dropped (for tcp, a connection reset is also sent). This commit fixes the problem by adding a second packet filter call in the input path. The second call happens right before the handoff to higher level processing and provides the missing output event to allow the redirect's reply processing to perform its rewrite. This extra processing is disabled by default and can be enabled using pfilctl: pfilctl link -o pf:default-out inet-local pfilctl link -o pf:default-out6 inet6-local PR: 268717 Reviewed-by: kp, melifaro MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D40256
* pmc: Bump major version for just-committed breaking changesJessica Clarke5 days1-2/+2
| | | | | Reviewed by: jkoshy, mhorne, emaste Differential Revision: https://reviews.freebsd.org/D40050
* pmc: Rework PROCEXEC event to support PIEsJessica Clarke5 days5-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the PROCEXEC event only reports a single address, entryaddr, which is the entry point of the interpreter in the typical dynamic case, and used solely to calculate the base address of the interpreter. For PDEs this is fine, since the base address is known from the program headers, but for PIEs the base address varies at run time based on where the kernel chooses to load it, and so pmcstat has no way of knowing the real address ranges for the executable. This was less of an issue in the past since PIEs were rare, but now they're on by default on 64-bit architectures it's more of a problem. To solve this, pass through what was picked for et_dyn_addr by the kernel, and use that as the offset for the executable's start address just as is done for everything in the kernel. Since we're changing this interface, sanitise the way we determine the interpreter's base address by passing it through directly rather than indirectly via the entry point and having to subtract off whatever the ELF header's e_entry is (and anything that wants the entry point in future can still add that back on as needed; this merely changes the interface to directly provide the underlying variables involved). This will be followed up by a bump to the pmc major version. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D39595
* imgact: Make et_dyn_addr part of image_paramsJessica Clarke5 days2-18/+18
| | | | | | | | | | This already gets passed around between various imgact_elf functions, so moving it removes an argument from all those places. A future commit will make use of this for hwpmc, though, to provide the load base for PIEs, which currently isn't available to tools like pmcstat. Reviewed by: kib, markj, jhb Differential Revision: https://reviews.freebsd.org/D39594
* pmc: Provide full path to modules from kernel linkerJessica Clarke5 days2-2/+2
| | | | | | | | | | | | | | | | | This unifies the user object and kernel module paths in libpmcstat, allows modules loaded from non-standard locations (e.g. from a user's home directory when testing) to be found and, since buffer is what all the warnings here use (they were never updated when buffer_modules were added to pick based on where the file was found) has the side-effect of ensuring the messages are correct. This includes obsoleting the now-superfluous -k option in pmcstat. This change breaks the hwpmc ABI and will be followed by a bump to the pmc major version. Reviewed by: jhb, jkoshy, mhorne Differential Revision: https://reviews.freebsd.org/D40048
* pmc: Initialise and check the pm_flags field for CONFIGURELOGJessica Clarke5 days1-0/+6
| | | | | | | | | | Whilst the former is not breaking, the latter is, and so this will be followed by a bump to the pmc major version. This will allow the flags to actually be usable in future, as otherwise we cannot distinguish uninitialised stack junk from a deliberately-initialised value. Reviewed by: jhb, mhorne Differential Revision: https://reviews.freebsd.org/D40049
* inpcb: Restore missing validation of local addresses for jailed socketsMark Johnston5 days2-4/+8
| | | | | | | | | | | | | | | | | | When looking up a listening socket, the SMR-protected lookup routine may return a jailed socket with no local address. This happens when using classic jails with more than one IP address; in a single-IP classic jail, a bound socket's local address is always rewritten to be that of the jail. After commit 7b92493ab1d4, the lookup path failed to check whether the jail corresponding to a matched wildcard socket actually owns the address, and would return the match regardless. Restore the omitted checks. Fixes: 7b92493ab1d4 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup") Reported by: peter Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D40268