aboutsummaryrefslogtreecommitdiff
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* bpf: don't call bpf_detachd() in bpf_setdlt()Gleb Smirnoff5 days1-1/+0
| | | | | | | The bpf_attachd() will perform bpf_detachd() itself. Performing it twice will lead to doing CK_LIST_REMOVE twice. Reported & tested by: bz
* lagg: Avoid dropping locks when starting the interfaceZhenlei Huang7 days1-17/+19
| | | | | | | | | | | | | | | The init routine of a lagg(4) interface will not change during the whole lifecycle. So we can call lagg_init() directly instead of through the function pointer. Well, that requires a drop and pickup lock, which unnecessarily expose a small race window. Refactor lagg_init() into lagg_init_locked() and call the later one to avoid that. Meanwhile, delay updating the driver managed status until after the interface is really ready. Reviewed by: markj MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D55198
* pf: remove unused variable from pf_test_ctxKristof Provost8 days1-1/+0
| | | | Sponsored by: Rubicon Communications, LLC ("Netgate")
* net: Remove the IFF_RENAMING flagMark Johnston8 days6-21/+0
| | | | | | | | | | | This used to be needed when interface renames were broadcast using the ifnet_departure_event eventhandler, but since commit 349fcf079ca3 ("net: add ifnet_rename_event EVENTHANDLER(9) for interface renaming"), it has no purpose. Remove it. Reviewed by: pouria, zlei Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D55171
* iflib: Add support for SIOCGIFDOWNREASON ioctlChandrakanth Patil8 days2-0/+16
| | | | | | | | | | | | | | | | | | | This change adds native support for the SIOCGIFDOWNREASON ioctl in iflib. When ifconfig issues SIOCGIFDOWNREASON, the request is now routed through a new driver callback (IFDI_GET_DOWNREASON). iflib allocates the ifdownreason structure, calls the driver to fill the down-reason message, and then returns the data back to ifconfig for display. Without this change, iflib-based drivers cannot implement link-down reason reporting even if the hardware provides the information. No functional change for existing drivers unless they implement the new IFDI_GET_DOWNREASON method. Existing drivers continue to behave as before. Reviewed by: gallatin, erj, kgalazka, ssaxena, #iflib Differential Revision: https://reviews.freebsd.org/D54045 MFC After: 1 week
* lagg: Make lagg_link_active() staticZhenlei Huang9 days1-1/+1
| | | | | | | | | | | | It is declared as static. Make the definition consistent with the declaration. It was ever fixed by commit 52e53e2de0ec, but the commit was reverted, leaving it unfixed. No functional change intended. MFC after: 3 days
* lagg: Remove the member pr_num from struct lagg_protoZhenlei Huang12 days1-13/+6
| | | | | | | | | | | | | | | It is set but never used. Remove it to avoid confusion and save a little space. While here, use designated initializers to initialize the LAGG protocol table. That improves readability, and it will be safer to initialize the table if we introduce new protocols in the future. No functional change intended. Reviewed by: glebius MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D55124
* lagg: Make the none protocol a first-class citizenZhenlei Huang12 days1-9/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the other protocols have corresponding start and input routines, which are used in the fast path. Currently the none protocol is treated specially. In the fast path it is checked to indicate whether a working protocol is configured. There are two issues raised by this design: 1. In production, other protocols are commonly used, but not the none protocol. It smells like an overkill to always check it in the fast path. It is unfair to other commonly used protocols. 2. PR 289017 reveals that there's a small window between checking the protocol and calling lagg_proto_start(). lagg_proto_start() is possible to see the none protocol and do NULL deferencing. Fix them by making the none protocol a first-class citizen so that it has start and input routines just the same as other protocols. Then we can stop checking it in the fast path, since lagg_proto_start() and lagg_proto_input() will never fail to work. The error ENETDOWN is chosen for the start routine. Obviously no active ports are available, and the packets will go nowhere. It is also a better error than ENXIO, since indeed the interface is configured and has a TX algorithm (the none protocol). PR: 289017 Diagnosed by: Qiu-ji Chen <chenqiuji666@gmail.com> Tested by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed by: glebius MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D55123
* bpf: don't clear pointer from descriptor to the tap on descriptor closeGleb Smirnoff14 days1-1/+1
| | | | | | | | | | During packet processing the descriptor is looked up using epoch(9) and it can be accessed after bpf_detachd(). In scenario of descriptor close the tap point is alive (it actually produces packets) and thus the pointer can be legitimately dereferenced. This fixes a race on a bpf(4) device close that would otherwise result in panic. Differential Revision: https://reviews.freebsd.org/D55064
* pf: fix use of uninitialised variableKristof Provost2026-02-031-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | In pf_match_rule() we attempt to append matching rules to the end of 'match_rules'. We want to preserve the order to make the multiple pflog entries easier to understand. So we keep track of the last added rule item in 'rt'. However, that assumed that 'match_rules' was only ever added to in that one call to pf_match_rules(). This isn't always the case, for example if we have match rules in different anchors. In that case we'd end up using the uninitialised 'rt' variable in the SLIST_INSERT_AFTER call. Instead track the match rules and the last matching rule (to enable easy appending) in the struct pf_test_ctx. This also allows us to reduce the number of arguments for some functions, because we passed a ctx to most functions that needed 'match_rules'. While here also make pf_match_rules() static, because it's only ever used in pf.c Add a test case to exercise the relevant code path. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate")
* epair: add VLAN_HWTAGGINGTimo Völker2026-01-301-12/+16
| | | | | | | | | | | | | | | | | | | | | | Add capability VLAN_HWTAGGING to the epair interface and enable it by default. When sending a packet over a VLAN interface that uses an epair interface, the flag M_VLANTAG and the ether_vtag (which contains the VLAN ID and/or PCP) are set in the mbuf to inform the hardware that the VLAN header has to be added. The sending epair end does not need to actually add a VLAN header. It can just pass the mbuf with this setting to the other epair end, which receives the packet. The receiving epair end can just pass the mbuf with this setting to the upper layer. Due to this setting, the upper layer believes that there was a VLAN header that has been removed by the interface. If the packet later leaves the host, the outgoing physical interface can add the VLAN header in hardware if it supports VLAN_HWTAGGING. If not, the implementation of Ethernet or bridge adds the VLAN header in software. Reviewed by: zlei, tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D52465
* net/iflib.c: move out scheduler-depended code into the hookKonstantin Belousov2026-01-291-79/+3
| | | | | | | | | | | | Add sched_find_l2_neighbor(). This really should be not scheduler-depended, in does not have anything to do with scheduler at all. But for now keep the same code structure. Reviewed by: olce Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D54831
* netinet6: store ND context directly in struct in6_ifextraGleb Smirnoff2026-01-231-2/+2
| | | | | | | | | | | | | | | | | | | | | Stop using struct nd_ifinfo for that, because it is an API struct for SIOCGIFINFO_IN6. The functional changes are isolated to the protocol attach and detach: in6_ifarrival(), nd6_ifattach(), in6_ifdeparture(), nd6_ifdetach(), as well as to the nd6_ioctl(), nd6_ra_input(), nd6_slowtimo() and in6_ifmtu(). The dad_failures member was just renamed to match the rest. The M_IP6NDP malloc(9) type declaration moved to files that actually use it. The rest of the changes are mechanical substitution of double pointer dereference via ND_IFINFO() to a single pointer dereference. This was achieved with a sed(1) script: s/ND_IFINFO\(([a-z0-9>_.-]+)\)->(flags|linkmtu|basereachable|reachable|retrans|chlim)/\1->if_inet6->nd_\2/g s/nd_chlim/nd_curhoplimit/g Reviewed by: tuexen, madpilot Differential Revision: https://reviews.freebsd.org/D54725
* iflib: null out freed mbuf in iflib_txsd_freeAndrew Gallatin2026-01-191-0/+1
| | | | | | | | | | | | When adding the IFLIB_GET_MBUF/FLAGS, I neglected to NULL out the mbuf in the descriptor ring. I didn't think this should matter as the I thought this code was only used when the ring was about to be freed. But I was wrong, and leaving a stale mbuf in there can cause panics. Reported by: Marek Zarychta (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=292547) Fixes: 14d93f612f26 Sponsored by: Netflix
* net: on interface detach purge all its routes before detaching protocolsGleb Smirnoff2026-01-171-2/+2
| | | | | | | | | | | | | | | | Otherwise, a forwarding thread may use the interface being detached. This is a regression from 0d469d23715d, which manifests itself as a reliably reproducible panic in in6_selecthlim(). Note that there are old bug reports about such a panic, and I believe this change will not fix them, as their nature is not due to a screwed up detach sequence, but due to lack of proper epoch(9) based synchronization between the detach and forwarding. Reviewed by: pouria Reported & tested by: jhibbits PR: 292162 Fixes: 0d469d23715d690b863787ebfa51529e1f6a9092 Differential Revision: https://reviews.freebsd.org/D54721
* if_ovpn: add interface countersKristof Provost2026-01-151-0/+32
| | | | | | | | | Count input/output packets and bytes on the interface as well, not just in openvpn-specific counters. PR: 292464 MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: configurable action on limiter exceededKristof Provost2026-01-141-2/+9
| | | | | | | | | | | | | | | | This change extends pf(4) limiters so administrator can specify action the rule executes when limit is reached. By default when limit is reached the limiter overrides action specified by rule to no-match. If administrator wants to block packet instead then rule with limiter should be changed to: pass in from any to any state limiter test (block) OK dlg@ Obtained from: OpenBSD, sashan <sashan@openbsd.org>, 04394254d9 Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: convert state limiter interface to netlinkKristof Provost2026-01-141-65/+43
| | | | | | | This is a new feature with new ioctl calls, so we can safely remove them right now. Sponsored by: Rubicon Communications, LLC ("Netgate")
* pf: introduce source and state limitersKristof Provost2026-01-141-3/+411
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | both source and state limiters can provide constraints on the number of states that a set of rules can create, and optionally the rate at which they are created. state limiters have a single limit, but source limiters apply limits against a source address (or network). the source address entries are dynamically created and destroyed, and are also limited. this started out because i was struggling to understand the source and state tracking options in pf.conf, and looking at the code made it worse. it looked like some functionality was missing, and the code also did some things that surprised me. taking a step back from it, even it if did work, what is described doesn't work well outside very simple environments. the functionality i'm talking about is most of the stuff in the Stateful Tracking Options section of pf.conf(4). some of the problems are illustrated one of the simplest options: the "max number" option that limits the number of states that a rule is allowed to create: - wiring limits up to rules is a problem because when you load a new ruleset the limit is reset, allowing more states to be created than you intended. - a single "rule" in pf.conf can expand to multiple rules in the kernel thanks to things like macro expansion for multiple ports. "max 1000" on a line in pf.conf could end up being many times that in effect. - when a state limit on a rule is reached, the packet is dropped. this makes it difficult to do other things with the packet, such a redirect it to a tarpit or another server that replies with an outage notices or such. a state limiter solves these problems. the example from the pf.conf.5 change demonstrates this: An example use case for a state limiter is to restrict the number of connections allowed to a service that is accessible via multiple protocols, e.g. a DNS server that can be accessed by both TCP and UDP on port 53, DNS-over-TLS on TCP port 853, and DNS-over-HTTPS on TCP port 443 can be limited to 1000 concurrent connections: state limiter "dns-server" id 1 limit 1000 pass in proto { tcp udp } to port domain state limiter "dns-server" pass in proto tcp to port { 853 443 } state limiter "dns-server" a single limit across all these protocols can't be implemented with per rule state limits, and any limits that were applied are reset if the ruleset is reloaded. the existing source-track implementation appears to be incomplete, i could only see code for "source-track global", but not "source-track rule". source-track global is too heavy and unweildy a hammer, and source-track rule would suffer the same issues around rule lifetimes and expansions that the "max number" state tracking config above has. a slightly expanded example from the pf.conf.5 change for source limiters: An example use for a source limiter is the mitigation of denial of service caused by the exhaustion of firewall resources by network or port scans from outside the network. The states created by any one scanner from any one source address can be limited to avoid impacting other sources. Below, up to 10000 IPv4 hosts and IPv6 /64 networks from the external network are each limited to a maximum of 1000 connections, and are rate limited to creating 100 states over a 10 second interval: source limiter "internet" id 1 entries 10000 \ limit 1000 rate 100/10 \ inet6 mask 64 block in on egress pass in quick on egress source limiter "internet" pass in on egress proto tcp probability 20% rdr-to $tarpit the extra bit is if the source limiter doesn't have "space" for the state, the rule doesn't match and you can fall through to tarpitting 20% of the tcp connections for fun. i've been using this in anger in production for over 3 years now. sashan@ has been poking me along (slowly) to get it in a good enough shape for the tree for a long time. it's been one of those years. bluhm@ says this doesnt break the regress tests. ok sashan@ Obtained from: OpenBSD, dlg <dlg@openbsd.org>, 8463cae72e Sponsored by: Rubicon Communications, LLC ("Netgate")
* enc: create an interface at SI_SUB_PROTO_IF stageGleb Smirnoff2026-01-131-1/+1
| | | | | | | | | | | | | | | | Creation of enc0 before SI_SUB_PROTO_MC mangles the MLD list as well as encounters IGMP mutex not initialized yet. Reported & tested by: mjg NB: the enc(4) is not a true interface indeed. In a perfect world the module shall not create a cloner, shall not enter if_attach(), shall not trigger ifnet_arrival_event, neither shall have any protocol attached to it. The enc0 exists for two purposes: 1) create a bpf(9) tap; 2) to allow injection packets in the middle of ipsec(4) processing temporarily rewriting m_pkthdr.rcvif to point at enc0. While the problem 1 is already solved with a recent divorce between bpf(9) and ifnet(9), the problem 2 is harder to solve without breaking packet filter rules that use "via enc0".
* iflib: remove convoluted custom zeroing codeBrooks Davis2026-01-091-60/+5
| | | | | | | | | | | | | | Replace a collection of aliasing violations and ifdefs with memset (which now expands to __builtin_memset and should be quite reliably inlined.) The old code is hard to maintain as evidenced by the most recent change to if_pkt_info_t updating the defines, but not the zeroing code. Reviewed by: gallatin, erj Effort: CHERI upstreaming Sponsored by: Innovate UK Fixes: 43d7ee540efe ("iflib: support for transmit side nic KTLS offload") Differential Revision: https://reviews.freebsd.org/D54605
* iflib: Drop tx lock when freeing mbufs using simple_transmitAndrew Gallatin2026-01-071-35/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Freeing completed transmit mbufs can be time consuming (due to them being cold in cache, and due to ext free routines taking locks), especially when we batch tx completions. If we do this when holding the tx ring mutex, this can cause lock contention on the tx ring mutex when using iflib_simple_transmit. To resolve this, this patch opportunistically copies completed mbuf pointers into a new array (ifsd_m_defer) so they can be freed after dropping the transmit mutex. The ifsd_m_defer array is opportunistically used, and may be NULL. If its NULL, then we free mbufs in the old way. The ifsd_m_defer array is atomically nulled when a thread is using it, and atomically restored when the freeing thread is done with it. The use of atomics here avoids acquire/release of the tx lock to restore the array after freeing mbufs. Since we're no longer always freeing mbufs inline, peeking into them to see if a transmit used TSO or not will cause a useless cache miss, as nothing else in the mbuf is likely to be accessed soon. To avoid that cache miss, we encode a TSO or not TSO flag in the lower bits of the mbuf pointer stored in the ifsd_m array. Note that the IFLIB_NO_TSO flag exists primarily for sanity/debugging. iflib_completed_tx_reclaim() was refactored to break out iflib_txq_can_reclaim() and _iflib_completed_tx_reclaim() so the that the tx routine can call iflib_tx_credits_update() just once, rather than twice. Note that deferred mbuf freeing is not enabled by default, and can be enabled using the dev.$DEV.$UNIT.iflib.tx_defer_mfree sysctl. Differential Revision: https://reviews.freebsd.org/D54356 Sponsored by: Netflix Reviewed by: markj, kbowling, ziaee
* bridge: Allow BRDGSIFVLANSET without IFBRF_VLANFILTERLexi Winter2026-01-031-3/+0
| | | | | | | | | | | | | | | Currently, we disallow BRDGSIFVLANSET when IFBRF_VLANFILTER is disabled. There's no particular reason to do this, and it causes some undesirable behaviour such as not being able to remove the tagged config on a member after disabling vlanfilter on the bridge. Remove the restriction so BRDGSIFVLANSET is always accepted. PR: 292019 MFC after: 1 week Reviewed by: zlei, p.mousavizadeh_protonmail.com Sponsored by: https://www.patreon.com/bsdivy Differential Revision: https://reviews.freebsd.org/D54435
* pf: sprinkle const over pf_addr_cmp()Kristof Provost2026-01-021-1/+1
| | | | Sponsored by: Rubicon Communications, LLC ("Netgate")
* sys/netipsec: ensure sah stability during input callback processingKonstantin Belousov2025-12-221-2/+10
| | | | | | | | | | | | Citing ae: this fixes some rare panics, that are reported in derived projects: `panic: esp_input_cb: Unexpected address family'. Reported by: ae Tested by: ae, Daniel Dubnikov <ddaniel@nvidia.com> Reviewed by: ae, Ariel Ehrenberg <aehrenberg@nvidia.com> (previous version) Sponsored by: NVidia networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D54325
* if_tuntap: use ifnet_rename_event instead of ifnet_arrival_eventGleb Smirnoff2025-12-221-12/+6
|
* ng_ether: refactor to use interface EVENTHANDLER(9)sGleb Smirnoff2025-12-224-43/+0
|
* net: add ifnet_rename_event EVENTHANDLER(9) for interface renamingGleb Smirnoff2025-12-223-15/+28
| | | | | | | | | | | | | | | | | and don't trigger ifnet_arrival_event and ifnet_departure_event for a rename, as the interface isn't being detached from any protocol. The consumers of the arrival/departure events are divided into a few categories: - which indeed need to do the same actions as if interface was fully detached and attached: routing socket and netlink notifications to userland and the Linux sysfs. All addressed by this commit. - which build their logic based on an interface name, but should actually update their database on rename: packet filters. This commit leaves them with the old behavior - emulate full detach & attach, but this should be improved. - which shouldn't do anything on rename, not touched by the commit. - ng_ether and if_tuntap, that are special and will be addressed by separate commits.
* net: on interface detach purge multicast addresses after protocolsGleb Smirnoff2025-12-221-2/+1
| | | | | | | | | | We first want to give a chance to all owners of multicast addresses to free them and only then run through the list of remaining ones. It might be that no addresses remain there normally, but this needs to be analyzed deeper. For now restore the sequence that was before 0d469d23715d to fix a possible use after free. Fixes: 0d469d23715d690b863787ebfa51529e1f6a9092
* iflib: support for transmit side nic KTLS offloadAndrew Gallatin2025-12-212-14/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds support to iflib for drivers that want to do transmit-side NIC ktls offload. This change does 2 things: 1) Extends the pkt info to include an optional mbuf pointer. This gives drivers the ability to find the start of a TLS record if they need to re-DMA part of the record to re-construct TLS state on the NIC. This mbuf pointer is only passed when CSUM_SND_TAG is present on the pkthdr. Note that I don't bother to inspect the send tag on purpose; this will only be present for TLS offloaded or paced connections 2) Allows the driver to specify how much ring padding is needed before the ring is considered to be full using the new isc_tx_pad field in if_softc_ctx. This re-uses a field that was marked spare in 2019 via d49e83eac3baf. Iflib initializes this to the previous value of 2 slots and allows the driver to override it. The TXQ_AVAIL() macro has been adjusted to subtract this padding, and uses of the macro have removed +2 from the other side of the comparison. To avoid potential cache misses from looking at the ifc_softc_ctx in TXQ_AVAIL(), the value is mirrored in the txq (in an alignment hole). Reviewed by: kbowling, kgalazka, sumit.saxena_broadcom.com, shurd Sponsored by: Netflix MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D54274
* bpf: add missing IFT_BRIDGE in the write methodGleb Smirnoff2025-12-191-0/+1
| | | | Fixes: 8774a990ee4094f16d596d4b78e0f3239e5d0c88
* net: attach IPv4 and IPv6 stacks to an interface with EVENTHANDLER(9)Gleb Smirnoff2025-12-185-131/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change retires two historic relics: the if_afdata[] array and the dom_ifattach/dom_ifdetach methods. The if_afdata[] array is a relic of the era, when there was expectation that many transport protocols will coexist with IP, e.g. IPX or NetAtalk. The array hasn't had any members except AF_INET and AF_INET6 for over a decade already. This change removes the array and just leaves two pointer fields: if_inet and if_inet6. The dom_ifattach/dom_ifdetach predates the EVENTHANDLER(9) framework and was a good enough method to initialize protocol contexts back then. Today there is no good reason to treat IPv4 and IPv6 stacks differently to other protocols/features that attach and detach from an interface. The locking of if_afdata[] is a relic of SMPng times, when the system startup and the interface attach was even more convoluted than before this change, and we also had unloadable protocols that used a field in if_afdata[]. Note that IPv4 and IPv6 are not unloadable. Note that this change removes NET_EPOCH_WAIT() from the interface detach sequence. This may surface several new races associated with interface removal. I failed to hit any with consecutive test suite runs, though. The expected general race scenario is that while struct ifnet is freed with proper epoch_call(9) itself, some structures hanging off ifnet are freed with direct free(9). The proper fix is either make if_foo point at some static "dead" structure providing SMP visibility of this store, or free those structure with epoch_call(9). All of these cases are planned to be found and resolved during 16.0-CURRENT lifetime. Reviewed by: zlei, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D54089
* net: split ifnet_arrival_event into two eventsGleb Smirnoff2025-12-183-11/+21
| | | | | | | | | | | | Run the original ifnet_arrival_event before linking the interface. Otherwise there is a race window when interface is already visible, but not all of the protocols have completed their attach. Provide a new event handler ifnet_attached_event, that is executed when the inteface is fully visible. Use it in route(4) socket and netlink(4) to announce new interface to the userland. Properly document the ifnet events in if_var.h. Reviewed by: zlei, melifaro Differential Revision: https://reviews.freebsd.org/D54085
* bpf: add a crutch to support if_vmoveGleb Smirnoff2025-12-183-0/+18
| | | | Fixes: 0bf42a0a05b9c802a6d9ca4a6b8696b29a26e08b
* vlan: plug a new panic associated with interface removalGleb Smirnoff2025-12-171-1/+9
| | | | | | | | | | | | | | | | | The ac6a7f621668 enabled execution of vlan_clone_dump_nl(), which previously was effectively disabled. The function itself was added back in 089104e0e01f0. This exposed a bug when Netlink dumps info on all interfaces using a dangerous KPI if_foreach_sleep(), which may call its callbacks on completely detached interfaces, hanging on the last reference. The ifc_dump_ifp_nl_default() is able to digest such interface without a panic, but vlan_clone_dump_nl() can't. Neither of the above revisions is the actual culprit, rather it is design problem of detaching interfaces and if_foreach_sleep(). Plug the problem with removing pointer to freed memory on detach and making a NULL check later. Reported by: pho
* bpf: virtualize bpf_iflistGleb Smirnoff2025-12-171-13/+15
| | | | | | | | | The reason the global list worked before 8774a990ee40 is that bpf_setif() used if_unit(), which is a VNET-aware function, and then went through the global list looking for bpf_if with matching pointer. PR: 291735 Fixes: 8774a990ee4094f16d596d4b78e0f3239e5d0c88
* bpf: add BIOCGETIFLIST ioctl that returns all available tap pointsGleb Smirnoff2025-12-152-2/+73
| | | | Differential Revision: https://reviews.freebsd.org/D53873
* bpf: modularize ifnet(9) part of bpfGleb Smirnoff2025-12-153-466/+537
| | | | | | | | | | | | | | | | | | | | Imagine that bpf(9) tapping can happen at any point in the network stack, not necessarily at interface transmit or receive. To achieve that we need a thin layer of abstraction defined by struct bif_methods, that defines how generic bpf layer works with a tap point of this kind. Implement ifnet(9) specific methods in a separate file bpf_ifnet.c. At this point there is 100% compatibility for all existing interfaces, there is no KPI change, yet. The legacy attaching KPI is layered over new ifnet agnostic KPI. The new KPI may change though, as we can implement multiple DLTs per single tap point in a prettier fashion. The new abstraction layer allows us to move all the 802.11 radio injection hacks out of bpf.c into ieee80211_radiotap.c, so do that immediately as a good proof of concept. Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D53872
* lacp: Sort port map by interface indexAndrew Gallatin2025-12-151-1/+25
| | | | | | | | | | This makes it easier to reason about system topology, and to potentially map applications to NIC queues by (ab)using the mbuf flowid to select egress NIC and queue in a predictable fashion. Differential Revision: https://reviews.freebsd.org/D54053 Reviewed by: glebius, kbowling Sponsored by: Netflix
* if_clone: don't overwrite dump_nl of an attaching cloner with defaultSeyed Pouria Mousavizadeh Tehrani2025-12-151-3/+4
| | | | | Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D54190
* pfsync: Avoid zeroing the state export unionMark Johnston2025-12-141-2/+6
| | | | | | | | | | | | | | | | pfsync_state_export() takes a pointer to a union that is in reality a pointer to one of the three state formats (1301, 1400, 1500), and zeros the union. The three formats do not have the same size, so zeroing is wrong when the format isn't that which has the largest size. Refactor a bit so that the zeroing happens at the layer where we know which format we're dealing with. Reported by: CHERI Reviewed by: kp MFC after: 1 week Sponsored by: CHERI Research Centre (EPSRC grant UKRI3001) Differential Revision: https://reviews.freebsd.org/D54163
* altq(4): Fix a typo in a source code commentGordon Bergling2025-12-131-1/+1
| | | | | | - s/backet/bucket/ MFC after: 3 days
* bpf: convert several boolean natured fields of bpf_d to flagsGleb Smirnoff2025-12-132-41/+42
| | | | | | This shrinks the structure a bit. Should be no functional change. Differential Revision: https://reviews.freebsd.org/D53870
* pf: handle TTL expired during nat64Kristof Provost2025-12-111-1/+0
| | | | | | | | | | | | | | | | | If the TTL (or hop limit) expires during nat64 translation we may need to send the error message in the original address family (i.e. pre-translation). We'd usually handle this in pf_route()/pf_route6(), but at that point we have already translated the packet, making it difficult to include it in the generated ICMP message. Check for this case in pf_translate_af() and send icmp errors directly from it. PR: 291527 MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D54166
* if_ovpn: use epoch to free peersKristof Provost2025-12-091-2/+12
| | | | | | | | | | | | Avoid a possible use-after-free in the rx path. ovpn_decrypt_rx_cb() calls ovpn_finish_rx() which releases the lock, but continues to use the peer. Ensure that the peer cannot be freed until we're sure all potential users have stopped using it (i.e. have left net_epoch). Reported by: Kevin Day <kevin@your.org> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
* lltable: use own lockGleb Smirnoff2025-12-082-27/+32
| | | | | | | | | Add struct mtx to struct lltable and stop using IF_AFDATA_LOCK, that was created for a completely different purpose. No functional change intended. Reviewed by: zlei, melifaro Differential Revision: https://reviews.freebsd.org/D54086
* linux: store Linux Ethernet interface number in struct ifnetGleb Smirnoff2025-12-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The old approach where we go through the list of interfaces and count them has bugs. One obvious bug with this dynamic translation is that once an Ethernet interface in the middle of the list goes away, all interfaces following it would change their Linux names. A bigger problem is the ifnet arrival and departure times. For example linsysfs has event handler for ifnet_arrival_event, and of course it wants to resolve the name. This accidentially works, due to a bug in if_attach() where we call if_link_ifnet() before invoking all the event handlers. Once the bug is fixed linsysfs won't be able to resolve the old way. The other side is ifnet_departure_event, where there is no bug, the eventhandlers are called after the if_unlink_ifnet(). This means old translation won't work for departure event handlers. One example is netlink. This change gives the Netlink a chance to emit a proper Linux interface departure message. However, there is another problem in Netlink, that the ifnet pointer is lost in the Netlink translation layer. Plug this with a cookie in netlink writer structure that can be set by the route layer and used by the Netlink Linux translation layer. This part of the diff seems unrelated, but it is hard to make it a separate change, as the old KPI goes away and to use the new one we need the pointer. Differential Revision: https://reviews.freebsd.org/D54077
* net: fix LINT-NOIP buildGleb Smirnoff2025-12-061-5/+3
| | | | Fixes: fd131b47f20dbeb515f5e3e6ea87948f2638eda9
* net: remove dom_ifmtuGleb Smirnoff2025-12-043-29/+18
| | | | | | It is a remnant of a network stack design that was supposed to support multiple network protocols. Today it is clear that we are left with IPv4 and IPv6 only. Only IPv6 may have an MTU different to the interface MTU.
* net: routing table attach never failsGleb Smirnoff2025-12-041-5/+1
|