aboutsummaryrefslogtreecommitdiff
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* bpf: add BIOCGETIFLIST ioctl that returns all available tap pointsGleb Smirnoff45 hours2-2/+73
| | | | Differential Revision: https://reviews.freebsd.org/D53873
* bpf: modularize ifnet(9) part of bpfGleb Smirnoff45 hours3-466/+537
| | | | | | | | | | | | | | | | | | | | Imagine that bpf(9) tapping can happen at any point in the network stack, not necessarily at interface transmit or receive. To achieve that we need a thin layer of abstraction defined by struct bif_methods, that defines how generic bpf layer works with a tap point of this kind. Implement ifnet(9) specific methods in a separate file bpf_ifnet.c. At this point there is 100% compatibility for all existing interfaces, there is no KPI change, yet. The legacy attaching KPI is layered over new ifnet agnostic KPI. The new KPI may change though, as we can implement multiple DLTs per single tap point in a prettier fashion. The new abstraction layer allows us to move all the 802.11 radio injection hacks out of bpf.c into ieee80211_radiotap.c, so do that immediately as a good proof of concept. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D53872
* lacp: Sort port map by interface indexAndrew Gallatin2 days1-1/+25
| | | | | | | | | | This makes it easier to reason about system topology, and to potentially map applications to NIC queues by (ab)using the mbuf flowid to select egress NIC and queue in a predictable fashion. Differential Revision: https://reviews.freebsd.org/D54053 Reviewed by: glebius, kbowling Sponsored by: Netflix
* if_clone: don't overwrite dump_nl of an attaching cloner with defaultSeyed Pouria Mousavizadeh Tehrani2 days1-3/+4
| | | | | Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D54190
* pfsync: Avoid zeroing the state export unionMark Johnston3 days1-2/+6
| | | | | | | | | | | | | | | | pfsync_state_export() takes a pointer to a union that is in reality a pointer to one of the three state formats (1301, 1400, 1500), and zeros the union. The three formats do not have the same size, so zeroing is wrong when the format isn't that which has the largest size. Refactor a bit so that the zeroing happens at the layer where we know which format we're dealing with. Reported by: CHERI Reviewed by: kp MFC after: 1 week Sponsored by: CHERI Research Centre (EPSRC grant UKRI3001) Differential Revision: https://reviews.freebsd.org/D54163
* altq(4): Fix a typo in a source code commentGordon Bergling4 days1-1/+1
| | | | | | - s/backet/bucket/ MFC after: 3 days
* bpf: convert several boolean natured fields of bpf_d to flagsGleb Smirnoff5 days2-41/+42
| | | | | | This shrinks the structure a bit. Should be no functional change. Differential Revision: https://reviews.freebsd.org/D53870
* pf: handle TTL expired during nat64Kristof Provost6 days1-1/+0
| | | | | | | | | | | | | | | | | If the TTL (or hop limit) expires during nat64 translation we may need to send the error message in the original address family (i.e. pre-translation). We'd usually handle this in pf_route()/pf_route6(), but at that point we have already translated the packet, making it difficult to include it in the generated ICMP message. Check for this case in pf_translate_af() and send icmp errors directly from it. PR: 291527 MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D54166
* if_ovpn: use epoch to free peersKristof Provost8 days1-2/+12
| | | | | | | | | | | | Avoid a possible use-after-free in the rx path. ovpn_decrypt_rx_cb() calls ovpn_finish_rx() which releases the lock, but continues to use the peer. Ensure that the peer cannot be freed until we're sure all potential users have stopped using it (i.e. have left net_epoch). Reported by: Kevin Day <kevin@your.org> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
* lltable: use own lockGleb Smirnoff9 days2-27/+32
| | | | | | | | | Add struct mtx to struct lltable and stop using IF_AFDATA_LOCK, that was created for a completely different purpose. No functional change intended. Reviewed by: zlei, melifaro Differential Revision: https://reviews.freebsd.org/D54086
* linux: store Linux Ethernet interface number in struct ifnetGleb Smirnoff9 days1-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The old approach where we go through the list of interfaces and count them has bugs. One obvious bug with this dynamic translation is that once an Ethernet interface in the middle of the list goes away, all interfaces following it would change their Linux names. A bigger problem is the ifnet arrival and departure times. For example linsysfs has event handler for ifnet_arrival_event, and of course it wants to resolve the name. This accidentially works, due to a bug in if_attach() where we call if_link_ifnet() before invoking all the event handlers. Once the bug is fixed linsysfs won't be able to resolve the old way. The other side is ifnet_departure_event, where there is no bug, the eventhandlers are called after the if_unlink_ifnet(). This means old translation won't work for departure event handlers. One example is netlink. This change gives the Netlink a chance to emit a proper Linux interface departure message. However, there is another problem in Netlink, that the ifnet pointer is lost in the Netlink translation layer. Plug this with a cookie in netlink writer structure that can be set by the route layer and used by the Netlink Linux translation layer. This part of the diff seems unrelated, but it is hard to make it a separate change, as the old KPI goes away and to use the new one we need the pointer. Differential Revision: https://reviews.freebsd.org/D54077
* net: fix LINT-NOIP buildGleb Smirnoff11 days1-5/+3
| | | | Fixes: fd131b47f20dbeb515f5e3e6ea87948f2638eda9
* net: remove dom_ifmtuGleb Smirnoff13 days3-29/+18
| | | | | | It is a remnant of a network stack design that was supposed to support multiple network protocols. Today it is clear that we are left with IPv4 and IPv6 only. Only IPv6 may have an MTU different to the interface MTU.
* net: routing table attach never failsGleb Smirnoff13 days1-5/+1
|
* pf: make unhandled_af() inlineGleb Smirnoff13 days1-1/+5
| | | | | | | Otherwise you just can't include pfvar.h without compiling pf in. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D54064
* bpf: global bpf list doesn't need CKGleb Smirnoff14 days1-14/+14
| | | | | | | | | All accesses to this list are done with the global lock held. The CK connotation is just confusing the reader. Fixes: 699281b545a8a3fc5109b5f2db62d261b65b588b Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D53869
* bpf: calculate net.bpf.stats buffer size dynamicallyGleb Smirnoff14 days1-11/+17
| | | | | | | This removed the global counter, that was updated in a racy manner. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D53868
* bpf: retire struct bpf_if_extGleb Smirnoff14 days2-16/+10
| | | | | | | | | | The struct was used for bpf_if to bif_dlist masking, that is used to optimize bpf_peers_present() call. The only functional change here is that bif_dlist and bif_next swap their places in the structure. Both belong to the first cache line anyway. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D53867
* if.h: Fix a couple of typos in commentsNavdeep Parhar2025-11-251-2/+2
| | | | No functional change.
* loopback: Clear hash unconditionally.Andrew Gallatin2025-11-241-2/+0
| | | | | | | | | | Clear the RSS hash on transmit, now that RSS hashing is enabled unconditionally, and the network stack may want to trust that it is getting the correct hash on input. Differential Revision: https://reviews.freebsd.org/D53090 Reviewed by: zlei Sponsored by: Netflix
* rss: Enable portions of RSS globally to enable symmetric hashingAndrew Gallatin2025-11-222-33/+53
| | | | | | | | | | | We use the fact that all NICs that support hashing are using the same hash algorithm and hash key to enable symmetic hashing in TCP, where a software version of the same hash is used to establish hashes on outgoing connections. Sponsored by: Netflix Reviewed by: adrian, zlei (both early version) Differential Revision: https://reviews.freebsd.org/D53089
* bpf: remove DDB codeGleb Smirnoff2025-11-221-37/+0
| | | | | With modern debugging tools it isn't useful at all and is just a maintenance burden.
* bpf: leave only locked version of bpf_detachd()Gleb Smirnoff2025-11-211-17/+10
| | | | The unlocked one is used only once. No functional change.
* bpf: refactor buffer pre-allocation for BIOCSETIFGleb Smirnoff2025-11-211-25/+20
| | | | | This basically refactors 4f42daa4a326f to use less indentation and variables. The code is still not race proof.
* bpf: remove dead codeGleb Smirnoff2025-11-212-25/+0
| | | | Should have gone together with 9738277b5c66.
* iflib: fix iflib_simple_transmit() when interface is downAndrew Gallatin2025-11-201-3/+7
| | | | | | | | | | | Use the same check as iflib_if_transmit() to detect when the interface is down and return the proper error code, and also free the mbuf. This fixes an mbuf leak when a member of a lagg is brought down (and probably many other scenarios). Sponsored by: Netflix
* if_ovpn: use IFT_TUNNELKristof Provost2025-11-171-1/+1
| | | | | | | | IFT_ENC has special behaviour in pf we don't desire, and this also ensures that for all interface types there is N:1:1 correspondence between if_type:dlt:header len. Requested by: glebius MFC after: 1 week
* sys/net/sff8436.h: Fix the register address of link length of copper or ↵Kirill Kochnev2025-11-161-1/+1
| | | | | | | | | | | | | | active cable The register address of link length of copper or active cable is 146 as per the SFF-8436 specification [1]. [1] 7.6.2 Upper Memory Map Page 00h SFF-8436 Specification (pdf): https://members.snia.org/document/dl/25896 Reviewed by: imp, zlei MFC after: 1 week Pull Request: https://github.com/freebsd/freebsd-src/pull/1885 Closes: https://github.com/freebsd/freebsd-src/pull/1885
* Fix typo in recently added 400G mediaNavdeep Parhar2025-11-141-1/+1
| | | | | | | Reported by: glebius Fixes: 2d608a4cebbd if_media.h: Add 400GBase-SR8 and 400GBase-CR8 MFC after: 1 week Sponsored by: Chelsio Communications
* if_media.h: Add 400GBase-SR8 and 400GBase-CR8Navdeep Parhar2025-11-122-0/+8
| | | | | | | Reviewed by: bz (network) MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D53387
* iflib: remove transmit prefetchingAndrew Gallatin2025-11-111-44/+2
| | | | | | | | | | | | | | | | Remove prefetching from the transmit path of iflib in the interest of increased performance and reduced complexity. Details regarding the performance penalties of prefetching can be found in the differential review. Note this prefetching was only done on link speeds of 10Gb/s and above, so the change is a no-op (or perhaps slight performance improvement simply due to the code simplification) for slower interfaces. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D53674 Reviewed by: kbowling, markj, mjg
* if_tuntap: defer transient destroy_dev() to a taskqueueKyle Evans2025-11-051-6/+57
| | | | | | | | | | | | | | | | | | | | | | We're in the dtor, so we can't destroy it now without deadlocking after recent changes to make destroy_dev() provide a barrier. However, we know there isn't any other dtor to run, so we can go ahead and clean up our state and just prevent a use-after-free if someone races to open the device while we're trying to destroy it. tunopen() now uses the net epoch to protect against softc release by a concurrent tun_destroy(). While we're here, allow a destroy operation to proceed if we caught a signal in cv_wait_sig() but tun_busy dropped to 0 while we were waiting to acquire the lock. This was more of an inherent design flaw, rather than a bug in the below-refed commit. PR: 290575 Fixes: 4dbe6628179d ("devfs: make destroy_dev() a release [...]") Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D53438
* pf: convert DIOCRSETADDRS to netlinkKristof Provost2025-10-311-2/+3
| | | | | | | | | The list of addresses is potentially very large. Larger than we can fit in a single netlink request, so we indicate via the PFR_FLAG_START/PFR_FLAG_DONE flags when we start and finish, so the kernel can work out which addresses need to be removed. Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: Check if source nodes use a valid redirection addressKajetan Staszkiewicz2025-10-301-0/+4
| | | | | | | | | | | | | | | | | | | | Source nodes redirect (nat-to, rdr-to, route-to) all further connections matching the rule which has created the source node. The source node is valid as long as there are states resulting from the rule or until the source node lifetime expires. When the rule's redirection pool is modified (e.g. table contents are changed) the source node is still valid and it will redirect new connections to invalid target (e.g. a dead next-hop). When performing source tracking after finding a source node check if the redirection address still exists in pool of the rule which has created this node. If not, delete the source node. This will result in finding a new redirection address and creation of a new source node. Reviewed by: kp Obtained from: OpenBSD Sponsored by: InnoGames GmbH Differential Revision: https://reviews.freebsd.org/D53231
* net: Remove useless field annotationsMark Johnston2025-10-271-4/+4
| | | | MFC after: 1 week
* altq: Clear stats structures in get_class_stats()Mark Johnston2025-10-273-0/+6
| | | | | | | | | | | These structures are copied out to userspace, and it's possible to leak uninitialized stack bytes since these routines and their callers weren't careful to clear them first. Add memsets to avoid this. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: kp, emaste MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D53342
* net: Validate interface group names in ioctl handlersMark Johnston2025-10-271-8/+26
| | | | | | | | | | The handlers were not checking that the group names are nul-terminated. Add checks for this. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: zlei MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D53344
* if_vxlan: fix byteorder of source portSeyed Pouria Mousavizadeh Tehrani2025-10-211-2/+2
| | | | | | | | Fix the htons byteorder of vxlan packets after `vxlan_pick_source_port` picks a source port during encapsulation. Reviewed by: zlei, kp, adrian Differential Revision: https://reviews.freebsd.org/D53022
* knotes: kqueue: handle copy for trivial filtersKonstantin Belousov2025-10-182-0/+4
| | | | | | | | Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D52045
* net: Use proper prototype for SYSINIT functionsZhenlei Huang2025-10-133-3/+3
| | | | MFC after: 1 week
* iflib: Implement tx desc reclaim thresholdAndrew Gallatin2025-10-012-13/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | On some iflib drivers, the txd reclaim routine can be fairly expensive at high packet rates. Iflib was designed with the intent of only reclaiming tx descriptors above a configurable threshold, but this logic was left unimplemented. This change: - implements 2 new knobs, iflib.tx_reclaim_thresh and iflib.tx_reclaim_ticks. - moves tx reclaim thresh from the if_shared_ctx and into the iflib_ctx as drivers don't need to see it, and it needs to be changed, so it can't be const - tx_reclaim_thresh and ticks are replicated into the txq to improve cache locality of data accessed in the hot path - ticks is used rather than more expensive timekeeping mechanism so as to keep things simple and cheap This change substantially improves packet rates on bnxt. It has been tested on bxnt and ixl Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D52561 Reviewed by: markj (initial version)
* Revert "IfAPI: Added missing accessor for if_home_vnet"Kristof Provost2025-10-012-7/+0
| | | | | | | | This reverts commit 4e7a375804e5ad4b244ce9a035fa971cbf2f0944. We do not want out-of-tree consumers to access the home_vnet variable. As discussed with the author and Gleb Smirnoff.
* IfAPI: Added missing accessor for if_home_vnetItzBlinkzy2025-09-292-0/+7
| | | | | Reviewed by: kp Signed-off-by: Kevin Irabor <kevin.irabor04@gmail.com>
* iflib: ifdef iflib_simple_transmit and iflib_simple_select_queue on ALTQMateusz Guzik2025-09-291-1/+4
| | | | | | Otherwise builds warn about them being unused. Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: Fix rule and state countersKajetan Staszkiewicz2025-09-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Increasing counters on "match" rules causes the 1st packet making a connection to be double-counted, but only for rule counters, not rules' tables, because those are not increased at all during rule parsing. Remove "match" rule counter handling during rule parsing, do it only in pf_counters_inc(). NAT can be performed either by "nat" rules in the NAT ruleset or by "match" rules. Rules before the NAT rule, and the NAT rule itself match on pre-NAT addresses, and later rules match on post-NAT addresses. When increasing counters go over rules in the same order as a packet would and use source and destination addresses for updating table counters from appropriate state key, taking into consideration on which rule NAT happens. Use AF from state key, so that table counters can be properly updated for af-to rules. Synchronize match rule updating behaviour to that of OpenBSD: if rules match, but state is not created, don't update counters. Reviewed by: kp Sponsored by: InnoGames GmbH Differential Revision: https://reviews.freebsd.org/D52447
* pf: print 'once' rule expire timeKristof Provost2025-09-251-0/+1
| | | | | Obtained from: OpenBSD, sashan <sashan@openbsd.org>, 8cf23eed7f Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: Add pfsync protocol for FreeBSD 15Kajetan Staszkiewicz2025-09-232-5/+64
| | | | | | | | | | | | A new version of pfsync packet is introduced: 1500. This version solves the issues with data alignment introduced in version 1400 and adds syncing of information needed to sync states created by rules with af-to (original interface, af and proto separate for wire and stack keys), of rt_af needed for prefer-ipv6-nexthop, and of tag names. Reviewed by: kp Sponsored by: InnoGames GmbH Differential Revision: https://reviews.freebsd.org/D52176
* pf: Count m_gethdr() failures in PFRES_MEMORY counterKristof Provost2025-09-171-4/+5
| | | | | | | | | This requires passing the reason pointer down into pf_build_tcp(). ok bluhm@ Obtained from: OpenBSD, sf <sf@openbsd.org>, 03c532ca70 Sponsored by: Rubicon Communications, LLC ("Netgate")
* if_ovpn.c: fix use of uninitialized variableAlex Richardson2025-09-151-2/+4
| | | | | | | | | | | | In case we use OVPN_CIPHER_ALG_NONE, the memcpy will attempt to copy 0 bytes from an uninitialized pointer. While the memcpy() implementation will treat this as a no-op and not actually dereferece the undefined variable it is still undefined behaviour to the compiler and should be fixed. Found by building with clang HEAD Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D52543
* pf: sync_ifp doesn't exist, remove externsKristof Provost2025-09-151-2/+0
| | | | | Obtained from: OpenBSD, jsg <jsg@openbsd.org>, 7ac7a88014 Sponsored by: Rubicon Communications, LLC ("Netgate")