aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet6
Commit message (Collapse)AuthorAgeFilesLines
* lltable: use own lockGleb Smirnoff5 days3-24/+21
| | | | | | | | | Add struct mtx to struct lltable and stop using IF_AFDATA_LOCK, that was created for a completely different purpose. No functional change intended. Reviewed by: zlei, melifaro Differential Revision: https://reviews.freebsd.org/D54086
* netinet6: use IF_ADDR_LOCK instead of IF_AFDATA_LOCK in defrtr_ipv6_only_ifpGleb Smirnoff8 days1-6/+9
| | | | | | | It is not clear what exactly this function is locking against. Seems like just use some generic interface lock. The IF_AFDATA_LOCK goes away soon together with if_afdata[], so put at least something in its place. Note that this code is dead anyway (#ifdef EXPERIMENTAL).
* netinet6: use IF_ADDR_LOCK instead of IF_AFDATA_LOCKGleb Smirnoff8 days1-5/+9
| | | | | | | It is not clear what exactly this function is locking against. Seems like just use some generic interface lock. The IF_AFDATA_LOCK goes away soon together with if_afdata[], so put at least something in its place.
* net: remove dom_ifmtuGleb Smirnoff9 days3-7/+3
| | | | | | It is a remnant of a network stack design that was supposed to support multiple network protocols. Today it is clear that we are left with IPv4 and IPv6 only. Only IPv6 may have an MTU different to the interface MTU.
* net: routing table attach never failsGleb Smirnoff9 days1-3/+0
|
* netinet: Remove left-over sys/cdefs.hWarner Losh11 days25-25/+0
| | | | | | | | These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now. Sponsored by: Netflix MFC After: 2 weeks
* rss: Enable portions of RSS globally to enable symmetric hashingAndrew Gallatin2025-11-221-0/+3
| | | | | | | | | | | We use the fact that all NICs that support hashing are using the same hash algorithm and hash key to enable symmetic hashing in TCP, where a software version of the same hash is used to establish hashes on outgoing connections. Sponsored by: Netflix Reviewed by: adrian, zlei (both early version) Differential Revision: https://reviews.freebsd.org/D53089
* ip: use standard C types for ECN helper functionsSeyed Pouria Mousavizadeh Tehrani2025-11-211-2/+2
| | | | | | | No functional change intended, suggested by glebius. Reviewed by: rscheff, zlei, tuexen Differential Revision: https://reviews.freebsd.org/D53739
* mld6: Properly initialize MLD packet optionsAndrey V. Elsukov2025-11-021-0/+1
| | | | | | | | | | After commit 530c2c30b0c7 we need to set flags to ensure that hop-by-hop and hop limit options are included. PR: 290407 Reviewed by: zlei, markj MFC after: 3 days Fixes: 530c2c30b0c7 ("ip6_output: Reduce cache misses on pktopts")
* netinet6: Use proper prototype for SYSINIT functionsZhenlei Huang2025-10-131-1/+1
| | | | MFC after: 1 week
* ipv6: don't complain when deleting an address with prefix length of 128Andrey V. Elsukov2025-10-071-7/+7
| | | | | | | | | | | | Save prefix length in unused field in6_ifaddr->ia_plen, then on remove check if an address has 128 prefix length, and if so, we don't need to complain that there is none of related prefixes. Reviewed by: kp Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D52952
* sys/netinet6: Use atomic(9) for dad_failures counterGuido Falsi2025-10-037-13/+16
| | | | | | | | | | | | | | | | Replace counter(9) usage with more lightweight atomic(9) in the code handling RFC 7217 SLAAC address generation. Also, use `u_int` types with this. Leaving `dad_failures` local to `in6_get_stableifid()` as a `uint64_t` to avoid changing the generated addresses from previous code; this also gives some headroom for future changes. While here, moved some `#include` lines to adhere to style(9). Reviewed by: glebius, jhibbits, jtl, zlei Approved by: glebius, jtl, zlei Differential Revision: https://reviews.freebsd.org/D52731
* carp6: revise the generation of ND6 NAAndrey V. Elsukov2025-10-034-82/+99
| | | | | | | | | | | | | | | | | | | | | | | * use ND_NA_FLAG_ROUTER flag in carp_send_na() when we work as router. * use in6addr_any as destination address for nd6_na_output(), then it will use ipv6-all-nodes multicast address. * add in6_selectsrc_nbr() function that accepts additional argument ip6_moptions. Use this function from ND6 code to avoid cases when nd6_na_output/nd6_ns_output can not find source address for multicast destinations. * add some comments from RFC2461 for better understanding. * use tlladdr argument as flags and use ND6_NA_OPT_LLA when we need to add target link-layer address option, and ND6_NA_CARP_MASTER when we know that target address is CARP master. Then we can prepare correct CARP's mac address if target address is CARP master. * move blocks of code where multicast options is initialized and use it when destination address is multicast. Reviewed by: kp Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D52825
* sys/netinet6: fix memory corruption in in6_ifaddMateusz Guzik2025-10-021-3/+2
| | | | | | | | | The routine allocates the wrong size and then passes it to in6_get_ifid. At the same time it violates invariants by issuing malloc with M_WAITOK while within net epoch section. Sponsored by: Rubicon Communications, LLC ("Netgate")
* sys/netinet6: Fix ABI breakage introduced with RFC 7217 supportGuido Falsi2025-09-227-10/+10
| | | | | | | | | | | | | | | | | commit 31ec8b6407fdd5a87d70265762457c67ce618283 added a `dad_failures` variable to `struct nd_ifinfo`, which broke the netowrking ABI. This commit fixes it by moving such variable to `struct in6_ifextra` which is not a public interface, while `struct nd_ifinfo` is back in its original state. Thanks to kib, markj and glebious for their help and suggestions in solving this problem. Reported by: "Herbert J. Skuhra" <herbert@gojira.at> Tested by: "Herbert J. Skuhra" <herbert@gojira.at> Approved by: glebius Fixes: 31ec8b6407fdd5a87d70265762457c67ce618283
* sys/netinet6: Implement RFC 7217Guido Falsi2025-09-2010-94/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement RFC 7217 (A Method for Generating Semantically Opaque Interface Identifiers with IPv6 Stateless Address Autoconfiguration (SLAAC)) in our IPv6 stack. A new ifconfig `stableaddr` flag is added to enable the feature on interfaces, which defaults to on or off for new interfaces based on the sysctl `net.inet6.ip6.use_stableaddr` (off by default, so this commit causes no change in behavior with default settings). The algorithm follows the RFC in its logic, using SHA256-HMAC as the algorithm to derive addresses so as to provide code that can be leveraged by future implentations of RFC 8981, leveraging the `hostuuid` as the secret. The source of the hostidentifier can be configured using the sysctl `net.inet6.ip6.stableaddr_netifsource`, while the number of retries generating a new address in case of collision can be configured using the `net.inet6.ip6.stableaddr_maxretries` sysctl (default 3). Documentation about all these flags is added to the ifconfig(8) man page. Reviewed by: cognet, glebius, hrs Tested by: zarychtam@plan-b.pwste.edu.pl Approved by: cognet, glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D49681
* IPv6: fix off-by-one in pltime and vltime expiration checksAndrey V. Elsukov2025-09-161-2/+2
| | | | | | | | | | | | | | | | | | Previously, the macros used '>' instead of '>=' when comparing elapsed time against the preferred and valid lifetimes. This caused any deprecated address to become usable again for one extra second after receiving each Router Advertisement. In that short window, the address could be selected as a source for outgoing connections. Update the checks to use '>=' so that addresses are deprecated or invalid when their lifetime expires. PR: 289177 Reported by: Dmitry Nexus <fbsd.4f6a at nexus tel> Reviewed by: zlei Submitted by: Marek Zarychta MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D52323
* ip6: add SO_BINTIME supportJonathan T. Looney2025-09-151-17/+36
| | | | | | | | | | | | | This adds support for obtaining timestamps from IPv6 packets using the SO_BINTIME socket option, bringing it in parity with IPv4 behavior. Enable testing the SO_BINTIME option in the relevant (manual) regression test. PR: 289423 Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D52504
* sys/netinet6: Fix SLAAC for interfaces with no /64 LL addressReid Linnemann2025-09-053-19/+43
| | | | | | | | | | | | | | | | | in6_ifadd() asserts that an interface has an existing LL address with a /64 prefix from which to extract the ifid for SLAAC address selection (even though the comments suggest that an ifid will be generated if one does not exist). This is adequate for most generic cases, however to support PPP links with /128 LL addresses we must be able to fall back on another source for the ifid since we cannot assume the /128 LL has a unique ifid in the lower 64 bits. To do this, the static function get_ifid() in in6_ifattach.c is renamed to non-static in6_get_ifid(), and this is used in lieu of a proper /64 LL address to attempt to obtain a valid ifid. Reviewed by kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D51778
* bridge: Print a warning if member_ifaddrs=1Lexi Winter2025-09-041-4/+13
| | | | | | | | | | | When adding an interface with an IP address to a bridge, or assigning an IP address to an interface which is in a bridge, and member_ifaddrs=1, print a warning so users are informed this is deprecated. Also add "(deprecated)" to the sysctl description. MFC after: 9 hours Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D52335
* ifnet: Defer detaching address family dependent dataZhenlei Huang2025-09-031-0/+2
| | | | | | | | | | | | | While diagnosing PR 279653 and PR 285129, I observed that thread may write to freed memory but the system does not crash. This hides the real problem. A clear NULL pointer derefence is much better than writing to freed memory. PR: 279653 PR: 285129 Reviewed by: glebius MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D49444
* bridge: Fix adding gif(4) interface assigned with IP addresses as bridge memeberZhenlei Huang2025-09-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | and fix assigning IP addresses to the gif(4) interface when it is a member of a if_bridge(4) interface. When setting the sysctl net.link.bridge.member_ifaddrs to 1, if_bridge(4) can eliminate unnecessary walk of the member list to determine whether the inbound unicast packets are for us or not. Well when a gif(4) interface is member of a if_bridge(4) interface, it acts as the tunnel endpoint to tunnel Ethernet frames over IP network, aka the EtherIP protocol, so the IP addresses configured on it are independent of the if_bridge(4) interface or other if_bridge(4) members, hence the sysctl net.link.bridge.member_ifaddrs should not have any influnce over gif(4) interfaces's behavior of assigning IP addresses. PR: 227450 Reported by: Siva Mahadevan <me@svmhdvn.name> Reviewed by: ivy, #bridge MFC after: 1 week Fixes: 0a1294f6c610 bridge: allow IP addresses on members to be disabled Differential Revision: https://reviews.freebsd.org/D52200
* udp: Fix a typo in a source code commentGordon Bergling2025-08-171-1/+1
| | | | | | - s/datgram/datagram/ MFC after: 3 days
* IPv6: Ignore PTB packets with an MTU < 1280Eric van Gyzen2025-08-122-106/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 2460 section 5 paragraph 7 allowed a Packet Too Big message to report a Next-Hop MTU less than 1280 in support of 6-to-4 routers. A node receiving such a message was required to add a Fragment Header to outgoing packets, even though they were not fragmented. Almost 20 years later, RFC 8200 was published. It obsoletes RFC 2460 and removes that paragraph. UNH IOL Intact was updated to test for compliance with the new standard. Remove code supporting that obsolete paragraph. Test cases v6LC_4_1_06a and 06b failed before this change, saying: DUT processed PTB and sent a fragmented echo reply Those two test cases now pass: DUT did not process PTB and sent un-fragmented echo reply All PMTU test cases pass except v6LC_4_1_08. It fails because we ignore the MTU in RAs. Reviewed by: tuexen MFC After: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D51835
* sctp, tcp, udp: improve deferred computation of checksumsTimo Völker2025-08-014-1/+66
| | | | | | | | | | | | | | | | | | | | | | When the SCTP, TCP, or UDP implementation send a packet, it does not compute the corresponding checksum but defers that. The network layer will determine whether the network interface selected for the packet has the requested capability and computes the checksum in software, if the selected network interface doesn't have the requested capability. Do this not only for packets being sent by the local SCTP, TCP, and UDP stack, but also when forwarding packets. Furthermore, when such packets are delivered to a local SCTP, TCP, or UDP stack, do not compute or validate the checksum, since such packets never have been on the wire. This allows to support checksum offloading also in the case of local virtual machines or jails. Support for epair, vtnet, and tap interfaces will be added in separate commits. Reviewed by: kp, rgrimes, tuexen, manpages MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D51475
* netinet6: Don't return non-IPv6 enabled interfaces from in6_getlinkifnet()Kristof Provost2025-07-291-1/+16
| | | | | | | | | | | | | | | | | There are scenarios where we can end up looking up an interface by its scope and turn up an interface that doesn't have IPv6 enabled on it. If that happens we could end up dereferencing a NULL pointer accessing ifp->if_afdata[AF_INET6]. Check for this. One such scenario is if a firewall rewrites a destination address to a link-local address, with an embedded scope for such an interface. Attach a test case which provokes this. PR: 288263 Reported by: Robert Morris <rtm@lcs.mit.edu> Reviewed by: zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D51500
* udp: Fix a inpcb refcount leak in the tunnel receive pathMark Johnston2025-07-251-3/+8
| | | | | | | | | | | | | | | When the socket has a tunneling function attached, udp_append() drops the inpcb lock before calling it. To keep the inpcb alive, we bump the refcount. After commit 742e7210d00b we only dropped the reference if the tunnel consumed the packet, but it needs to be dropped in either case. if_ovpn is the only driver that can trigger this bug. Fixes: 742e7210d00b ("udp: allow udp_tun_func_t() to indicate it did not eat the packet") Reviewed by: kp MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D51505
* if_gif(4): Support the NOCLAMP flag to change MTU handling for IPv6Koichiro Iwao2025-07-211-4/+14
| | | | | | | | | | | | | | | | | The patch was originally written by hrs [1], and later modified by meta to use named flags instead of generic link-layer flags. [1] https://reviews.freebsd.org/D45854 PR: 280736 Co-authored-by: Hiroki Sato <hrs@FreeBSD.org> Reviewed by: ae, ziaee, zlei, pauamma Reported by: Kazuki Shimizu <kazubu@jtime.net> Approved by: pauamma (manpages) Approved by: ae MFC after: 2 weeks Sponsored by: Cybertrust Japan Differential Revision: https://reviews.freebsd.org/D51297
* mld: allow sysctls to be set per vnetKristof Provost2025-07-191-13/+16
| | | | | | | | | | | Allow net.inet6.mld.use_allow, net.inet6.mld.v2enable and net.inet6.mld.v1enable to be set per vnet. While here convert them to booleans. Reviewed by: glebius, zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D51409
* netinet6: allow binding a raw socket to an anycast addressLexi Winter2025-07-151-2/+1
| | | | | | | | | | | Raw sockets have a separate check for this in rip6_bind() that was missed in the previous change. This fixes e.g. 'ping -S' using an anycast address. Fixes: ca4b046105f6 ("netinet6: allow binding to anycast addresses") Reviewed by: tuexen, kevans, des (previous version) Approved by: kevans (mentor) Differential Revision: https://reviews.freebsd.org/D50438
* counter(9): rate limit periods may be more than 1 secondKristof Provost2025-06-251-5/+4
| | | | | | | | | | | | Teach counter_rate() to deal with periods of more than 1 second, so we can express 'at most 100 in 10 seconds', which is different from 'at most 10 in 1 second'. While here move the struct counter_rate definition into subr_counter.c so users cannot mess with its internals. Add allocation and free functions. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D50796
* inet6: RFC 8981 SLAAC Temporary Address ExtensionsMarek Zarychta2025-06-204-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deprecate the use of MD5 as the algorithm for generating temporary interface identifiers (IIDs) for IPv6 addresses, improving cryptographic robustness. Introduce per-address randomized IIDs, ensuring that each temporary address uses a distinct interface identifier to enhance privacy and avoid correlation across addresses. Update the IID generation logic to respect the Reserved IPv6 Interface Identifiers list. Enhance sysctl_ip6_temppltime() so that ip6_temp_max_desync_factor is dynamically recalculated whenever ip6_temp_preferred_lifetime is updated via sysctl. This ensures that MAX_DESYNC_FACTOR remains approximately 1/32 of the preferred lifetime plus 10 minutes. DESYNC_FACTOR is also regenerated after each update. Timers related to temporary address regeneration were updated to match the design recommendations in RFC 8981. A new read-only sysctl variable net.inet6.ip6.temp_max_desync_factor is introduced to expose the computed value of MAX_DESYNC_FACTOR to userland for observability and debugging. Input validation to reject temppltime values too small or too large is included. This all brings the temporary address lifetime handling closer to the intended design in RFC 8981 and improves robustness against misconfiguration. PR: 245103 MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D50108
* inet6: RFC 8981 SLAAC Temporary Address ExtensionsFernando Gont2025-06-204-133/+52
| | | | | | | | Initial implementation of SLAAC temporary address extensions back when they were still draft-ietf-6man-rfc4941bis. PR: 245103 MFC after: 1 month
* udp: fix local blackholingMichael Tuexen2025-06-131-1/+1
| | | | | | | | | | | | The sysctl-variable net.inet.udp.blackhole_local should affect UDP packets from an IPv6 address of the local host, not of a host on the local area network. Thanks to cc@ for pointing me to the issue. Reviewed by: cc MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D50829
* netinet6: Remove ndpr_raf_ra_derived flagHiroki Sato2025-06-124-10/+4
| | | | | | | | | | | | | | | | | | | | This flag was introduced at 8036234c72c9361711e867cc1a0c6a7fe0babd84 to prevent the SIOCSPFXFLUSH_IN6 ioctl from removing manually-added entries. However, this flag did actually not work due to an incomplete implementation making prelist_update() not handle it before calling nd6_prelist_add(). This patch removes the flag because a prefix is derived from an RA always has an entry in the ndpr_advrtrs member in the struct nd_prefix. Having a separate flag is not a good idea because it can cause a mismatch between the flag and the ndpr_advrtrs entry. Testing using LIST_EMPTY() is simpler for the origial goal. This also removes in a prefix check in the ICMPV6CTL_ND6_PRLIST sysctl to exclude manually-added entries. This ioctl is designed to list all entries, and there is no relationship to SIOCSPFXFLUSH_IN6. Differential Revision: https://reviews.freebsd.org/D46441
* machine/stdarg.h -> sys/stdarg.hBrooks Davis2025-06-111-2/+1
| | | | | | | | | | | | | Switch to using sys/stdarg.h for va_list type and va_* builtins. Make an attempt to insert the include in a sensible place. Where style(9) was followed this is easy, where it was ignored, aim for the first block of sys/*.h headers and don't get too fussy or try to fix other style bugs. Reviewed by: imp Exp-run by: antoine (PR 286274) Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
* icmp6: fix use-after-reference-releaseKristof Provost2025-05-271-4/+3
| | | | | | | | | | | | | | We release the reference to the in6_ifaddr but retain a pointer to it. Copy the address itself, rather than keeping the pointer to fix this. The previous version was actually safe, because ifa_free() uses an epoch callback to free it, so the pointer would have remained valid as long as we are in net_epoch. Change it to copying the address anyway because it is more obviously correct and will remain correct even if ifa_free() changes later. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D50460
* icmp6: zero out pad spaceKristof Provost2025-05-231-0/+1
| | | | | | | | | | In icmp6_redirect_output() we potentially add padding, but failed to clear this memory. This triggered a KMSAN panic during the sys/netinet/carp:unicast_v6 test. Reviewed by: zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D50461
* netinet6: Remove a set but not used global variable in6_maxmtuZhenlei Huang2025-05-215-45/+2
| | | | | | | | | | | | | | and its setter in6_setmaxmtu(). This variable was introduced by the KAME projec [1]. It holds the max IPv6 MTU through all the interfaces, but is never used anywhere. [1] 82cd038d51e2 KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet) Reviewed by: glebius MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D49357
* bridge: allow IP addresses on members to be disabledLexi Winter2025-05-051-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | add a new sysctl, net.link.bridge.member_ifaddrs, which defaults to 1. if it is set to 1, bridge behaviour is unchanged. if it is set to 0: - an interface which has AF_INET6 or AF_INET addresses assigned cannot be added to a bridge. - an interface in a bridge cannot have an AF_INET6 or AF_INET address assigned to it. - the bridge will no longer consider the lladdrs on bridge members to be local addresses, i.e. frames sent to member lladdrs will not be processed by the host. update bridge.4 to document this behaviour, as well as the existing recommendation that IP addresses should not be configured on bridge members anyway, even if it currently partially works. in testing, setting this to 0 on a bridge with 50 member interfaces improved throughput by 22% (4.61Gb/s -> 5.67Gb/s) across two member epairs due to eliding the bridge member list walk in GRAB_OUR_PACKETS. Reviewed by: kp, des Approved by: des (mentor) Differential Revision: https://reviews.freebsd.org/D49995
* netinet6: allow binding to anycast addressesLexi Winter2025-04-241-5/+4
| | | | | | | | | | | | | | | | the restriction on sending packets from anycast source addresses was removed in RFC4291, so there's no reason to forbid binding to such addresses. this allows anycast services (e.g., DNS) to actually use anycast addresses, which was previously impossible. RFC4291 also removes the restriction that only routers may configure anycast addresses; this was never enforced in code but was documented in ifconfig.8. update ifconfig.8 to document both changes. PR: 285545 Reviewed by: des, adrian Approved by: des (mentor) Differential Revision: https://reviews.freebsd.org/D49905
* netinet6: Do not forward or send ICMPv6 messages to the unspec addressMark Johnston2025-04-222-1/+8
| | | | | | | | | | | | | | | | | | | | | As in f7174eb2b4c4 ("netinet: Do not forward or ICMP response to INADDR_ANY"), the IPv6 stack should avoid sending packets to the unspecified address. In particular: - Make sure that we do not forward received packets to the unspecified address; the check in ip6_input() catches this in the common case, but after commit 40faf87894ff it's possible for a pfil hook to bypass this check and pass the packet to ip6_forward() using the PACKET_TAG_IPFORWARD tag. - Make sure that we do not reflect packets back to the unspecified address; RFC 4443 section 2.4 states that we must not generate error messages in response to packets from the unspecified address. Reviewed by: zlei, glebius Reported by: Franco Fichtner <franco@opnsense.org> MFC after: 1 month Sponsored by: Klara, Inc. Sponsored by: OPNsense Differential Revision: https://reviews.freebsd.org/D49339
* ip6: leave room for link headers in UDPAndrew Gallatin2025-04-151-3/+7
| | | | | | | | | | | | | | | | | UDP over IPv6 was not leaving space for link headers, resulting in the ethernet header being placed in its own mbuf at the front of the mbuf chain sent down to the NIC driver. This is inefficient, in terms of allocating 2x as many header mbufs as needed, and its also confusing for drivers which may expect to find ether/ip/l4 headers together in the same mbuf. Reviewed by: glebius, rrs, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D49840 This is a port of e6ccd7093618, which was done by Robert Watson in 2004 for IP4
* in6_control_ioctl: correctly report errors from SIOCAIFADDR_IN6Lexi Winter2025-04-071-1/+1
| | | | | | | | we have to use 'goto out' here rather than 'break' because otherwise error is set to 0, which means the error is not propagated back to the caller. Reviewed by: kp
* netinet: Fix getcred sysctl handlers to do nothing if no input is givenMark Johnston2025-03-202-0/+4
| | | | | | | | | | | These routines were all assuming that the sysctl handler has some new value, but this is not the case. SYSCTL_IN() returns 0 in this scenario, so they were all operating on an uninitialized address. This is mostly harmless, but trips KMSAN checks, so let's fix them. Reviewed by: zlei, rrs, glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49348
* inpcb: in_pcbinshash() can't fail on connect(2)Gleb Smirnoff2025-03-131-1/+2
| | | | CID: 1593687
* ip6_cksum.c: generalize in6_cksum_partial() to allow L2 headers in passed mbufKonstantin Belousov2025-03-132-8/+19
| | | | | | Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com> Sponsored by: NVidia networking MFC after: 1 week
* inpcb: retire two-level port hash databaseGleb Smirnoff2025-03-071-43/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This structure originates from the pre-FreeBSD times when system RAM was measured in single digits of MB and Internet speeds were measured in Kb. At first level the database hashes the port value only to calculate index into array of pointers to lazily allocated headers that hold lists of inpcbs with the same local port. This design apparently was made to preserve kernel memory. In the modern kernel size of the first level of the hash is derived from maxsockets, which is derived from maxfiles, which in its turn is derived from amount of physical memory. Then the size of the hash is capped by IPPORT_MAX, cause it doesn't make any sense to have hash table larger then the set of possible values. In practice this cap works even on my laptop. I haven't done precise calculation or experiments, but my guess is that any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash. Apparently, this hash is a degenerate one: it never has more than one entries in any slot. You can check this with kgdb: set $i = 0 while ($i <= tcbinfo->ipi_porthashmask) set $p = tcbinfo->ipi_porthashbase[$i].clh_first set $c = 0 while ($p != 0) set $c = $c + 1 set $p = $p->phd_hash.cle_next end if ($c > 1) printf "Slot %u count %u", $i, $c end set $i = $i + 1 end Retiring the two level hash we remove a lot of complexity at the cost of only one comparison 'inp->inp_lport != lport' in the lookup cycle, which is going to be always false on most machines anyway. This comparison definitely shall be cheaper than extra pointer traversal. Another positive change to be singled out is that now we no longer need to allocate memory in non-sleepable context in in_pcbinshash(), so a potential ENOMEM on connect(2) is removed. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49151
* vm_lowmem: Fix signature mismatches in vm_lowmem callbacksSHENGYI HONG2025-03-052-2/+2
| | | | | | | This is required for kernel CFI. Reviewed by: rrs, jhb, glebius Differential Revision: https://reviews.freebsd.org/D49111
* ipfw: migrate ipfw to 32-bit size rule numbersAndrey V. Elsukov2025-03-031-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes ABI due to the changed opcodes and includes the following: * rule numbers and named object indexes converted to 32-bits * all hardcoded maximum rule number was replaced with IPFW_DEFAULT_RULE macro * now it is possible to grow maximum numbers or rules in build time * several opcodes converted to ipfw_insn_u32 to keep rulenum: O_CALL, O_SKIPTO * call stack modified to keep u32 rulenum. The behaviour of O_CALL opcode was changed to avoid possible packets looping. Now when call stack is overflowed or mbuf tag allocation failed, a packet will be dropped instead of skipping to next rule. * 'return' action now have two modes to specify return point: 'next-rulenum' and 'next-rule' * new lookup key added for O_IP_DST_LOOKUP opcode 'lookup rulenum' * several opcodes converted to keep u32 named object indexes in special structure ipfw_insn_kidx * tables related opcodes modified to use two structures: ipfw_insn_kidx and ipfw_insn_table * added ability for table value matching for specific value type in 'table(name,valtype=value)' opcode * dynamic states and eaction code converted to use u32 rulenum and named objects indexes * added insntod() and insntoc() macros to cast to specific ipfw instruction type * default sockopt version was changed to IP_FW3_OPVER=1 * FreeBSD 7-11 rule format support was removed * added ability to generate special rtsock messages via log opcode * added IP_FW_SKIPTO_CACHE sockopt to enable/disable skipto cache. It helps to reduce overhead when many rules are modified in batch. * added ability to keep NAT64LSN states during sets swapping Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D46183