aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* ip_mroute: Make the routing socket privateMark Johnston6 days6-25/+28
| | | | | | | | | | | | | | | | | | | | I have some patches which make ip_mroute and ip6_mroute multi-FIB-aware. This enables running per-FIB routing daemons, each of which has a separate routing socket. Several places in the network stack check whether multicast routing is configured by checking whether the multicast routing socket is non-NULL. This doesn't directly translate in my proposed scheme, as each FIB would have its own socket. I'd like to modify the ip(6)_mroute code to store all state, including the socket, in a per-FIB structure. So, take a step towards that and 1) hide the socket, 2) add a boolean flag which indicates whether a multicast router is registered. Reviewed by: pouria, zlei, glebius, adrian MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D55236
* tcp: restrict flowtype copying to specific RSS TCP typesCheng Cui9 days1-2/+2
| | | | | Reviewed by: gallatin, tuexen Differential Revision: https://reviews.freebsd.org/D55196
* ip_mroute: Try to make function pointer declarations more consistentMark Johnston9 days6-13/+22
| | | | | | | | | | | | | | | The ip_mroute and ip6_mroute modules hook into the network stack via several function pointers. Declarations for these pointers are scattered around several headers. Put them all in the same place, ip(6)_mroute.h. No functional change intended. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D55058
* ip_mroute: Use a local variable to store a VIF pointerMark Johnston9 days1-22/+29
| | | | | | | | | | | | | This is cleaner and will make it a bit easier to add some more indirection to the VIF table, specifically, to add per-FIB tables. No functional change intended. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D55057
* sctp: Use __sdt_used for variables only used by SDT probesJohn Baldwin10 days1-15/+9
| | | | | | | Previously this used a home-rolled version. Reviewed by: tuexen, imp, markj Differential Revision: https://reviews.freebsd.org/D55165
* sockets: let protocols be responsible for socket buffer mutexesGleb Smirnoff2026-02-031-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Sockets that implement their own socket buffers (marked with PR_SOCKBUF) are now also responsible for initialization of socket buffer mutexes in pr_attach and for destruction in pr_detach (or pr_close). This removes a big bunch of reported LORs, as now WITNESS is able to see that tcp(4) socket buffer mutex and netlink(4) socket buffer mutex are two different things. Distinct names also improve diagnostics for blocked threads. This also removes a hack from unix(4), where we used to mtx_destroy(). Also removes an innocent bug from unix(4) where for accept(2)-ed socket soreserve() was called twice. This one was innocent since first call to soreserve() was asking for 0 bytes of space. This slightly increased amount of pasted code in TCP's syncache_socket(). The problem is that while for sockets created with socket(2) it is pr_attach responsible for call to soreserve() (including !PR_SOCKBUF protocols), but for the sockets created with accept(2) it was solisten_clone() doing soreserve(), combined with the fact that for accept(2) TCP completely bypasses pr_attach. This all should improve once TCP has its own socket buffers. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D54984
* ip_mroute: Make privilege checking more consistentMark Johnston2026-02-021-5/+0
| | | | | | | | | | | | | | | | | | | | | - The v6 socket option and ioctl handlers had no privilege checks at all. The socket options, I believe, can only be reached via a raw socket, but a jailed root user with a raw socket shouldn't be able to configure multicast routing in a non-VNET jail. The ioctls can only be used to fetch stats. - Delete a bogus comment in X_mrt_ioctl(), one can issue multicast routing ioctls against any socket. Note that the call path is soo_ioctl()->rtioctl_fib()->mrt_ioctl(). I think all of the mroute privilege checks should be done within the ip(6)_mroute code, but let's first make the v4 and v6 modules consistent. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D54982
* ip_mroute: Convert to using a regular mutexMark Johnston2026-01-272-22/+23
| | | | | | | | | | | | | The multicast routing code was using spin mutexes for packet counting, but there is no reason to use them instead of regular mutexes, given that none of this code runs in an interrupt context. Convert to using default mutexes. Reviewed by: glebius MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D54603
* ip_mroute: EVENTHANDLER_REGISTER does not failMark Johnston2026-01-271-6/+0
| | | | | | | | No functional change intended. MFC after: 1 week Sponsored by: Stormshield Sponsored by: Klara, Inc.
* ip6: Remove support for RFC2675 (Jumbo Payload Option)Tom Jones2026-01-271-1/+11
| | | | | | | | | | | | | | | | | | | The Jumbo Payload option was intended to allow the deployment of IPv6 on networks with a link MTU in excess of 65,735 octets. Speaking to one of the authors of RFC2675 the networks which motivated the Jumbo Payload option no longer exist. FreeBSD does not currently support any links with this capacity and discussion when this change was first proposed suggested that the loop back interface had to be patched to test implementation. As there are no known devices that can carry Jumbo Payloads remove support. Reviewed by: glebius, teuxen, kp Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19960
* netinet6: store ND context directly in struct in6_ifextraGleb Smirnoff2026-01-231-3/+3
| | | | | | | | | | | | | | | | | | | | | Stop using struct nd_ifinfo for that, because it is an API struct for SIOCGIFINFO_IN6. The functional changes are isolated to the protocol attach and detach: in6_ifarrival(), nd6_ifattach(), in6_ifdeparture(), nd6_ifdetach(), as well as to the nd6_ioctl(), nd6_ra_input(), nd6_slowtimo() and in6_ifmtu(). The dad_failures member was just renamed to match the rest. The M_IP6NDP malloc(9) type declaration moved to files that actually use it. The rest of the changes are mechanical substitution of double pointer dereference via ND_IFINFO() to a single pointer dereference. This was achieved with a sed(1) script: s/ND_IFINFO\(([a-z0-9>_.-]+)\)->(flags|linkmtu|basereachable|reachable|retrans|chlim)/\1->if_inet6->nd_\2/g s/nd_chlim/nd_curhoplimit/g Reviewed by: tuexen, madpilot Differential Revision: https://reviews.freebsd.org/D54725
* netinet6: embed the counter(9) arrays in struct in6_ifextraGleb Smirnoff2026-01-231-2/+1
| | | | | Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D54723
* sctp: support bridge interfacesMichael Tuexen2026-01-201-0/+1
| | | | | | Reported by: Timo Völker Tested by: Timo Völker MFC after: 3 days
* ip: improve deferred computation of checksumsTimo Völker2026-01-202-9/+29
| | | | | | | | | | | | | | | | | | | | This patch adds the same functionality for the IPv4 header checksum as was done erlier for the SCTP/TCP/UDP transport checksum. When the IP implementation sends a packet, it does not compute the corresponding checksum but defers that. It will determine whether the network interface selected for the packet has the requested capability and computes the checksum in software, if the selected network interface does not have the requested capability. Do this not only for packets being sent by the local IP stack, but also when forwarding packets. Furthermore, when such packets are delivered to a local IP stack, do not compute or validate the checksum, since such packets have never been on the wire. This allows to support checksum offloading also in the case of local virtual machines or jails. Support for epair interfaces will be added in a separate commit. Reviewed by: pouria, tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D54455
* tcp: Unifidef use of rss software hash in syncacheAndrew Gallatin2026-01-051-3/+2
| | | | | | | | | | | | | | | Ever since "d9c55b2e8cd6 rss: Enable portions of RSS globally.." exposed the RSS software hashing functions, it has been possible to use them without "ifdef RSS". Do so now in the syncache so as to get flowids recorded. Note that the use of the rss hash functions is conditional on IP versions, so we must ifdef INET to ensure rss_proto_software_hash_v4() is available. Fixes 73fe85e486d2 Sponsored by: Netflix Reviewed by: glebius, p.mousavizadeh_protonmail.com, nickbanks_netflix.com, tuexen Differential Revision: https://reviews.freebsd.org/D54534
* TCP Stacks, Improve rack to better handle reorderingRandall Stewart2026-01-052-8/+87
| | | | | | | | | | | | | | With a recent bug in the igb (and a few other) driver LRO mis-queuing, rack did things ok, better than the base stack, due to the rack reordering protections in rack, but there was still room for improvements. When a series of packets are completely mis-ordered you often times can get the acks shortly after you have entered recovery and retransmitted the first of the packets indicated in the sack stream. Then the cum-ack arrives basically acking all those packets. If you look at the time from when you sent the packet to when the ack came back you can quickly determine that the ack was not to what you just transmitted but instead was original and you had a completely false recovery entry. Dropping out of that you can then restore the congestion state and continue on your way. The Dup-acks that also arrive help increase your reordering windows which makes you less likely to repeat the scenario. Differential Revision:<https://reviews.freebsd.org/D53832>
* tcp: fix checksum calculation bugTimo Völker2025-12-191-2/+2
| | | | | | | | | | | | | | The new function in_delayed_cksum_o() was introduced to compute the checksum in the case the mbuf chain does not start with the IP header. The offset of the IP header is specified by the parameter iph_offset. If iph_offset was positive, the function computed an incorrect checksum. Reviewed by: sobomax, tuexen Fixes: 5feb38e37847 ("netinet: provide "at offset" variant of the in_delayed_cksum() API") MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D54269
* net: attach IPv4 and IPv6 stacks to an interface with EVENTHANDLER(9)Gleb Smirnoff2025-12-189-60/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change retires two historic relics: the if_afdata[] array and the dom_ifattach/dom_ifdetach methods. The if_afdata[] array is a relic of the era, when there was expectation that many transport protocols will coexist with IP, e.g. IPX or NetAtalk. The array hasn't had any members except AF_INET and AF_INET6 for over a decade already. This change removes the array and just leaves two pointer fields: if_inet and if_inet6. The dom_ifattach/dom_ifdetach predates the EVENTHANDLER(9) framework and was a good enough method to initialize protocol contexts back then. Today there is no good reason to treat IPv4 and IPv6 stacks differently to other protocols/features that attach and detach from an interface. The locking of if_afdata[] is a relic of SMPng times, when the system startup and the interface attach was even more convoluted than before this change, and we also had unloadable protocols that used a field in if_afdata[]. Note that IPv4 and IPv6 are not unloadable. Note that this change removes NET_EPOCH_WAIT() from the interface detach sequence. This may surface several new races associated with interface removal. I failed to hit any with consecutive test suite runs, though. The expected general race scenario is that while struct ifnet is freed with proper epoch_call(9) itself, some structures hanging off ifnet are freed with direct free(9). The proper fix is either make if_foo point at some static "dead" structure providing SMP visibility of this store, or free those structure with epoch_call(9). All of these cases are planned to be found and resolved during 16.0-CURRENT lifetime. Reviewed by: zlei, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D54089
* tpc: retire net.inet.tcp.nolocaltimewaitGleb Smirnoff2025-12-121-42/+0
| | | | See c3fc0db3bc50df18a724e6e6b12ea4e060fd9255 for details.
* lltable: use own lockGleb Smirnoff2025-12-082-10/+10
| | | | | | | | | Add struct mtx to struct lltable and stop using IF_AFDATA_LOCK, that was created for a completely different purpose. No functional change intended. Reviewed by: zlei, melifaro Differential Revision: https://reviews.freebsd.org/D54086
* tcp: fix build with RSSGleb Smirnoff2025-12-061-0/+2
| | | | | PR: 291439 Fixes: 73fe85e486d297c9c976095854c1c84007e543f0
* tcp: retire do_newsack - always adhere to RFC6675 SACKRichard Scheffenegger2025-12-054-52/+6
| | | | | | | | | Depreciation notice for net.inet.tcp.newsack is in 15.0. Remove this tunable for HEAD, streamlining the code slightly. Reviewed by: tuexen, cc, nickbanks_netflix.com, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D54072
* net: routing table attach never failsGleb Smirnoff2025-12-041-3/+0
|
* tcp: don't set flowid in tcp_input()Gleb Smirnoff2025-12-031-31/+0
| | | | | | | | | With dd0e6bb996dc setting it always on connect(2) and syncache always picking up the flowid from the incoming packet, any ESTABLISHED connection shall have the flowid already set. Reviewed by: tuexen, gallatin Differential Revision: https://reviews.freebsd.org/D53886
* tcp: store flowid info in syncacheGleb Smirnoff2025-12-033-41/+70
| | | | | | | | Now retransmissions by syncache would use correct flowid, same as synchronous responds. Reviewed by: tuexen, gallatin Differential Revision: https://reviews.freebsd.org/D51792
* divert: Use CK_SLISTs for the divcb hash tableMark Johnston2025-12-031-8/+9
| | | | | | | | | | | The hash table is accessed in ip_divert_packet(), and there the accesses are synchronized only by the net epoch, so plain SLIST is not safe. Reviewed by: ae MFC after: 1 week Sponsored by: OPNsense Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D54011
* netinet: Remove left-over sys/cdefs.hWarner Losh2025-12-0351-51/+0
| | | | | | | | These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now. Sponsored by: Netflix MFC After: 2 weeks
* tcp: Enable symmetric hashing by setting hash on outgoing connsAndrew Gallatin2025-11-221-0/+10
| | | | | | | | | | | Now that we can trust NICs to supply an identical hash result to software, we can setup the inpcb hash on outgoing connections. This gives us symmetric hashing, meaning packets should enter and leave on the same NIC queue. Differential Revision: https://reviews.freebsd.org/D53104 Reviewed by: adrian, cc, kbowling, tuexen, zlei Sponsored by: Netflix
* rss: Enable portions of RSS globally to enable symmetric hashingAndrew Gallatin2025-11-221-0/+3
| | | | | | | | | | | We use the fact that all NICs that support hashing are using the same hash algorithm and hash key to enable symmetic hashing in TCP, where a software version of the same hash is used to establish hashes on outgoing connections. Sponsored by: Netflix Reviewed by: adrian, zlei (both early version) Differential Revision: https://reviews.freebsd.org/D53089
* ipfw: add extra parenthesis around ACTION_PTR() macroGleb Smirnoff2025-11-211-1/+1
| | | | This allows to immediately dereference ipfw_insn member.
* ip: use standard C types for ECN helper functionsSeyed Pouria Mousavizadeh Tehrani2025-11-212-10/+10
| | | | | | | No functional change intended, suggested by glebius. Reviewed by: rscheff, zlei, tuexen Differential Revision: https://reviews.freebsd.org/D53739
* TCP Pacing system (HPTS) is missing an APIRandall Stewart2025-11-181-0/+11
| | | | | | | Recent changes to HPTS have broken an API that was somehow removed (used by user space programs for time calculations). This commit will add back the inline function that was removed. Differential Revision:<https://reviews.freebsd.org/D53225>
* tcp: improve comments in the syncache codeMichael Tuexen2025-11-071-1/+12
| | | | | | | | | | Add a comment explaining why syncache entries are dropped and fix a typo in a comment. Reviewed by: rrs, glebius MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53564
* ddb: provide inp_flags2 when printing inpcbsMichael Tuexen2025-11-032-0/+10
| | | | | | | Reviewed by: markj, Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53542
* tcp: drop SYN ACK segment for listening socketsMichael Tuexen2025-11-033-21/+2
| | | | | | | | | | | | When a SYN ACK is received for a listening socket, just drop it instead of killing the SYN-cache entry and send a RST. This closes the possibility to kill a TCP connection during its handling in the SYN-cache. Reviewed by: Nick Banks, Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53540
* ddb: improve printing of inpcbsMichael Tuexen2025-11-021-4/+6
| | | | | | | | | | | * shuffle around the inp_label to give inp_flags more space since it can become long. * fix the indentation of in6p_icmp6filt, in6p_cksum, and in6p_hops. Reviewed by: Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53541
* ddb: use %b when showing flags for a tcpcbMichael Tuexen2025-11-022-261/+33
| | | | | | | | | This is much more compact. Thanks to markj@ for suggesting the change. Reviewed by: markj, Peter Lei, imp, Nick Banks MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53510
* ddb: use %b when showing flags for an inpMichael Tuexen2025-11-022-156/+18
| | | | | | | | | This is much more compact. Thanks to markj@ for suggesting the change. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53507
* ddb: fix compilationMichael Tuexen2025-10-311-2/+2
| | | | | Fixes: 9aa5a79e2af9 ("ddb: optionally print inp when printing tcpcb") Sponsored by: Netflix, Inc.
* ddb: optionally print inp when printing tcpcbMichael Tuexen2025-10-313-8/+18
| | | | | | | | | | Add /i option to the ddb commands show tcpcb and show all tcpcbs, which enables the printing of the t_inpcb. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53497
* ddb: whitespace changeMichael Tuexen2025-10-311-3/+3
| | | | | | | No functional change intended. MFC after: 3 days Sponsored by: Netflix, Inc.
* ddb: improve printing of inp_flagsMichael Tuexen2025-10-311-4/+16
| | | | | | | | | | | Add four missing flags (INP_BINDANY, INP_INHASHLIST, INP_RESERVED_0, INP_BOUNDFIB) used in inp_flags and remove one flag (INP_ORIGDSTADDR), which is actually a flag used in inp_flags2 and not in inp_flags. Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53498
* tcp: remove unused definePeter Lei2025-10-311-1/+0
| | | | | | Reviewed by: tuexen MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp: improve credential handling in syncacheMichael Tuexen2025-10-271-5/+9
| | | | | | | | | | | | | | When adding a syncache entry, take a reference count of the credentials while the inp is still locked. Thanks to markj@ for providing a hint regarding the root cause. Reported by: David Marker Reviewed by: glebius Tested by: David Marker Fixes: cbc9438f0505 ("tcp: improve ref count handling when processing SYN") MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53380
* ipfw: Retire obsolete compat codeEd Maste2025-10-271-4/+0
| | | | | | | | | | | | | The current IPFW version 3 dates to 2010 (commit cc4d3c30ea28, "Bring in the most recent version of ipfw and dummynet, developed"). The compat code for FreeBSD 8 and earlier has a number of issues and is no longer needed, so remove it. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed by: ae, glebius Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D53343
* udp: honor IPV6_TCLASS cmsg for UDP/IPv4 packetsMichael Tuexen2025-10-261-0/+17
| | | | | | | | | | Honor the IPPROTO_IPV6-level cmsg of type IPV6_TCLASS when sending an UDP/IPv4 packet on an AF_INET6 socket. Reviewed by: bz MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53347
* udp: honor IPV6_TCLASS socket option for UDP/IPv4 packetsMichael Tuexen2025-10-261-0/+12
| | | | | | | | | | Honor the IPPROTO_IPV6-level socket option IPV6_TCLASS when sending an UDP/IPv4 packet on an AF_INET6 socket. Reviewed by: bz, glebius MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53346
* tcp: save progress timeout cause in connection end statusPeter Lei2025-10-241-2/+5
| | | | | | | | | | | TCP stats are currently incremented for the persist and progress timeout conditions, but only the persist cause was saved in the connection end info status, which in turn is logged in the blackbox "connection end" event. Reviewed by: tuexen MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp rack: cleanupPeter Lei2025-10-242-198/+4
| | | | | | | | | The TCP_SAD_DETECTION code was removed. Remove the remaining sysctl-variables and counters. Reviewed by: tuexen MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp over udp: don't copy more bytes than avaiableMichael Tuexen2025-10-231-1/+1
| | | | | | | | | | | | | When copying the data in the first mbuf to get rid of the UDP header, use the correct length. It was copying too much (8 bytes, the length of the UDP header). This only applies to handling TCP over UDP packets. The support for TCP over UDP is disabled by default. Reported by: jtl Reviewed by: Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc.