aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* udp: slightly refactor udp_append()Gleb Smirnoff2025-09-011-21/+21
| | | | | | | | | Make it bool. Reword the comment, add note that mbuf is always consumed. In case tunnel consumed the mbuf, don't INP_RUNLOCK(), behave just like all the other normal exits from the function. Reviewed by: tuexen, kp, markj Differential Revision: https://reviews.freebsd.org/D52171
* udp: don't leak mbuf if tunnel didn't consume and inpcb is goneGleb Smirnoff2025-09-011-1/+4
| | | | | | | Fixes: e1751ef896119d7372035b1b60f18a6342bd0e3b Reviewed by: tuexen, kp, markj Differential Revision: https://reviews.freebsd.org/D52170
* bridge: Fix adding gif(4) interface assigned with IP addresses as bridge memeberZhenlei Huang2025-09-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | and fix assigning IP addresses to the gif(4) interface when it is a member of a if_bridge(4) interface. When setting the sysctl net.link.bridge.member_ifaddrs to 1, if_bridge(4) can eliminate unnecessary walk of the member list to determine whether the inbound unicast packets are for us or not. Well when a gif(4) interface is member of a if_bridge(4) interface, it acts as the tunnel endpoint to tunnel Ethernet frames over IP network, aka the EtherIP protocol, so the IP addresses configured on it are independent of the if_bridge(4) interface or other if_bridge(4) members, hence the sysctl net.link.bridge.member_ifaddrs should not have any influnce over gif(4) interfaces's behavior of assigning IP addresses. PR: 227450 Reported by: Siva Mahadevan <me@svmhdvn.name> Reviewed by: ivy, #bridge MFC after: 1 week Fixes: 0a1294f6c610 bridge: allow IP addresses on members to be disabled Differential Revision: https://reviews.freebsd.org/D52200
* tcp: improve sending of SYN-cookiesMichael Tuexen2025-08-301-41/+48
| | | | | | | | | | | | | | | | | | Ensure that when the sysctl-variable net.inet.tcp.syncookies_only is non zero, SYN-cookies are sent and no SYN-cache entry is added to the SYN-cache. In particular, this behavior should not depend on the value of the sysctl-variable net.inet.tcp.syncookies, which controls whether SYN cookies are used in combination with the SYN-cache to deal with bucket overflows. Also ensure that tcps_sc_completed does not include TCP connections established via a SYN-cookie. While there, make V_tcp_syncookies and V_tcp_syncookiesonly bool instead of int, since they are used as boolean variables. Reviewed by: rscheff, cc, Peter Lei, Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52225
* tcp: remove stale commentMichael Tuexen2025-08-281-1/+0
| | | | | MFC after: 3 days Sponsored by: Netflix, Inc.
* netinet: provide "at offset" variant of the in_delayed_cksum() APIMaxim Sobolev2025-08-262-3/+11
| | | | | | | | | | | | The need for such a variant comes from the fact that we need to re-calculate checksum aftet ng_nat(4) transformations while getting mbufs from the layer 2 (ethernet) directly. Reviewed by: markj, tuexen Approved by: tuexen Sponsored by: Sippy Software, Inc. Differential Revision: https://reviews.freebsd.org/D49677 MFC After: 2 weeks
* tcp: remove now unneeded icmp includesGleb Smirnoff2025-08-256-12/+0
|
* mod_cc(4): Fix a typo in a source code commentGordon Bergling2025-08-251-1/+1
| | | | | | - s/assigments/assignments/ MFC after: 3 days
* tcp: improve inflating cwnd in limited transmitMichael Tuexen2025-08-251-5/+3
| | | | | | | | | Don't subtract tcp_sack_adjust() sometimes twice, just once in all cases. Reviewed by: rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52140
* tcp: improve the condition for detecting dup ACKsMichael Tuexen2025-08-241-263/+236
| | | | | | | | | | | Take the condition of RFC 6675 into account. While there, remove stale comments. PR: 282605 Reviewed by: cc (earlier version) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51426
* icmp: clear offset and flags when reflecting a packetMichael Tuexen2025-08-181-1/+2
| | | | | | | | | | | When reflecting a packet, use an offset of 0 and clear all three bits, in particular the DF bit. PR: 288558 Reviewed by: markj, zlei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51991
* udp: Fix a typo in a source code commentGordon Bergling2025-08-171-1/+1
| | | | | | - s/datgram/datagram/ MFC after: 3 days
* tcp: fix sysctl name in the gone_in() printfGleb Smirnoff2025-08-131-1/+1
| | | | Fixes: c3fc0db3bc50df18a724e6e6b12ea4e060fd9255
* IPv6: Ignore PTB packets with an MTU < 1280Eric van Gyzen2025-08-122-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 2460 section 5 paragraph 7 allowed a Packet Too Big message to report a Next-Hop MTU less than 1280 in support of 6-to-4 routers. A node receiving such a message was required to add a Fragment Header to outgoing packets, even though they were not fragmented. Almost 20 years later, RFC 8200 was published. It obsoletes RFC 2460 and removes that paragraph. UNH IOL Intact was updated to test for compliance with the new standard. Remove code supporting that obsolete paragraph. Test cases v6LC_4_1_06a and 06b failed before this change, saying: DUT processed PTB and sent a fragmented echo reply Those two test cases now pass: DUT did not process PTB and sent un-fragmented echo reply All PMTU test cases pass except v6LC_4_1_08. It fails because we ignore the MTU in RAs. Reviewed by: tuexen MFC After: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D51835
* tcp: retire rstreasonMichael Tuexen2025-08-126-58/+41
| | | | | | | | | | | | | With the latest changes, this variable and parameter for tcp_dropwithreset() is not needed anymore. It also makes it harder to introduce the usage of multiple counters for TCP again, which might open side channel attacks. No funtional changes intended. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51872
* tcp: minor cleanupMichael Tuexen2025-08-121-14/+14
| | | | | | | | | | | | Don't use the rstreason variable as a hint that a second lookup is performed, since the rstreason variable will be removed. Use the INPLOOKUP_WILDCARD flag in the lookupflag variable instead. No functional change intended. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51847
* udp: use appropriate error countersMichael Tuexen2025-08-121-1/+5
| | | | | | | | | | Since there are multicast and broadcast specific error counters, use them. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51869
* icmp: remove unused BANDLIM_UNLIMITEDMichael Tuexen2025-08-112-2/+1
| | | | | | | Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51849
* tcp: mitigate a side channel for detection of TCP connectionsMichael Tuexen2025-08-091-0/+8
| | | | | | | | | | | | | | | | If a blind attacker wants to guess by sending ACK segments if there exists a TCP connection , this might trigger a challenge ACK on an existing TCP connection. To make this hit non-observable for the attacker, also increment the global counter, which would have been incremented if it would have been a non-hit. This issue was reported as issue number 11 in Keyu Man et al.: SCAD: Towards a Universal and Automated Network Side-Channel Vulnerability Detection Reviewed by: Nick Banks, Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51724
* tcp: rate limit the sending of all RST segmentsMichael Tuexen2025-08-073-7/+7
| | | | | | | | | | | | | | | | | | Also rate limit the sending of RST segments in the following cases: * when receiving data on a closed socket. * when a socket can not be created at the end of the handshake and the sysctl-variable net.inet.tcp.syncache.rst_on_sock_fail is 1. * when an ACK segment is received in SYN SENT state and it does not acknowledge the SYN segment. After this change, there is no need anymore to provide a rstreason to tcp_dropwithreset(), since it is always BANDLIM_TCP_RST. This will be a follow-up commit, since it will change the code in a couple of places, but will not change the functionality. Reviewed by: rrs, Nick Banks, Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51815
* tcp : remove assignment without effectMichael Tuexen2025-08-071-1/+0
| | | | | | | | | | | rstreason is only relevant in the code paths with the label 'dropwithreset', but not in the one with the label 'drop'. No functional change intended. Reviewed by: Nick Banks, rrs, Peter Lei, imp MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51814
* inet: fix typoWarner Losh2025-08-081-1/+1
| | | | | | | | | Note: btw submitted a number of other things in this area that haven't made it into the tree, so I'm making an exception to the no typo rule since it was done in that context. Submitted by: btw (Tiwei Bie GSOC 2015 so unsure what to use for author) Differential Revision: https://reviews.freebsd.org/D3510
* tcp: ensure SACK rxmit never ends up left of its holeRichard Scheffenegger2025-08-062-3/+3
| | | | | | | | | | | When a RTO happens during SACK loss recovery, snd_recover can possibly pulled left. With Lost Retransmission Detection (LRD) this can lead to rxmit of a hole to end up pointing to the left of the hole, which is unexpected and leads to complications. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D51725
* tcp sack: improve computation of delivered_dataMichael Tuexen2025-08-061-1/+1
| | | | | | | | | | | delivered_data is the number of bytes, which have newly been delivered to the peer. This includes the number of bytes cumulatively acknowledged and selectively acknowledged. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51718
* tcp: improve consistency of KASSERTs in tcp_sack.cMichael Tuexen2025-08-061-13/+17
| | | | | | | | | | When panicing, don't print the condition, which was violated, but the condition which holds at the time of the panic. Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51726
* rack, bbr: minor cleanupMichael Tuexen2025-08-061-4/+2
| | | | | | | | | No functional change intended. Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51734
* ipfw: add numeric initializers to enum ipfw_opcodesAndrey V. Elsukov2025-08-031-110/+110
| | | | | | | | | This is mostly for better readability when we need to resolve what opcode corresponds to specific number. Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D51457
* tcp: Fix wrap around comparison bugNick Banks2025-08-011-2/+1
| | | | | | | | The variables p_curtick and p_lasttick are not in usecs. Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* sctp, tcp, udp: improve deferred computation of checksumsTimo Völker2025-08-014-1/+45
| | | | | | | | | | | | | | | | | | | | | | When the SCTP, TCP, or UDP implementation send a packet, it does not compute the corresponding checksum but defers that. The network layer will determine whether the network interface selected for the packet has the requested capability and computes the checksum in software, if the selected network interface doesn't have the requested capability. Do this not only for packets being sent by the local SCTP, TCP, and UDP stack, but also when forwarding packets. Furthermore, when such packets are delivered to a local SCTP, TCP, or UDP stack, do not compute or validate the checksum, since such packets never have been on the wire. This allows to support checksum offloading also in the case of local virtual machines or jails. Support for epair, vtnet, and tap interfaces will be added in separate commits. Reviewed by: kp, rgrimes, tuexen, manpages MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D51475
* tcp hpts: cleanup header fileNick Banks2025-07-312-82/+81
| | | | | | | | | | | | Cleanup tcp_hpts.h by * move definition used only in tcp_hpts.c to that file * fix a typo * remove duplicate declarion of tcp_min_hptsi_time * rearange declarations for simpler reading Approved by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp: improve variable and constant namesNick Banks2025-07-314-42/+42
| | | | | | | | | | Don't use ticks in variable names or constant when they don't have a relation to ticks. Use slots or usecs. No functional change intended. Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp: improve function namesNick Banks2025-07-316-75/+75
| | | | | | | | | | tcp_tv_to_usectick(), tcp_tv_to_mssectick(), and tcp_tv_to_lusectick() are not related to ticks. Therefore remove the trailing 'tick'. No functional change intended. Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* sctp: whitespace cleanupMichael Tuexen2025-07-271-1/+0
| | | | MFC after: 1 week
* sendfile: don't hack sb_lowat for sockets that manage the watermarkGleb Smirnoff2025-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve higher performance. We would modify low watermark on the socket send buffer to avoid socket being reported as writable too early, which would result in lots of small writes. Skip that hack for applications that do setsockopt(SO_SNDLOWAT) or that register the socket in kevent(2) with NOTE_LOWAT feature. First, we don't want the hack to rewrite the watermark value explicitly specified by the user. Second, in certain cases that can lead to real performance regressions. A kevent(2) with NOTE_LOWAT would report socket as writable, but then sendfile(2) would write 0 bytes and return EAGAIN. The change also disables the hack for unix(4) sockets, leaving only TCP. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D50581
* udp: Fix a inpcb refcount leak in the tunnel receive pathMark Johnston2025-07-251-3/+8
| | | | | | | | | | | | | | | When the socket has a tunneling function attached, udp_append() drops the inpcb lock before calling it. To keep the inpcb alive, we bump the refcount. After commit 742e7210d00b we only dropped the reference if the tunnel consumed the packet, but it needs to be dropped in either case. if_ovpn is the only driver that can trigger this bug. Fixes: 742e7210d00b ("udp: allow udp_tun_func_t() to indicate it did not eat the packet") Reviewed by: kp MFC after: 2 weeks Sponsored by: Stormshield Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D51505
* tcp: remove trailing whitespacesNick Banks2025-07-2413-50/+50
| | | | | | Reviewed by: cc, tuexen, Peter Lei Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51437
* tcp rack,bbr: cleanupMichael Tuexen2025-07-242-13/+2
| | | | | | | | | Remove code that can't be enabled in FreeBSD anyway. Reviewed by: glebius, rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51423
* tcp rack: fix clearing of app limited periodsPeter Lei2025-07-211-1/+1
| | | | | | | | | | | Use the correct variable in the correct way. The app limited period is cleared when gp_seq is greater than or equal to cleared_app_ack_seq. Reviewed by: rrs, tuexen, Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51441
* tcp: use a single counter for limiting the RST rateMichael Tuexen2025-07-216-43/+46
| | | | | | | | | | Using two counters does not provide any benefit, but it provides an externally observable signal whether there is a listening port. Reviewed by: Nick Banks, Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51440
* tcp rack: remove duplicate header includePeter Lei2025-07-211-1/+0
| | | | | | Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp rack: fix sendmap app limited count trackingPeter Lei2025-07-211-3/+6
| | | | | | | | | | | | | | | rc_app_limited_cnt is an internal counter on the rack structure that tracks the number of sendmap entries that have the RACK_APP_LIMITED flag set. These entries gate goodput measurements. The counter is reported in a number of blackbox logging events. When a sendmap entry which has the RACK_APP_LIMITED flag set is cloned, the counter was not being maintained properly. While here, cleanup the counter check when a sendmap entry with the flag set is freed which previously hid this issue. Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp rack: fix typos and whitespace changesPeter Lei2025-07-211-15/+14
| | | | | | | | No functional changes intended. Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp: cleanupMichael Tuexen2025-07-201-3/+5
| | | | | | | | | | Don't use the variable rstreason temporarily with a different semantic. No functional change intended. Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51438
* tcp hpts: remove unused line argument from tcp_set_hptsNick Banks2025-07-202-3/+2
| | | | | | Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp hpts: use consistently inline instead of __inlineNick Banks2025-07-201-7/+7
| | | | | | Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp: remove duplicate tcp_bblogging_on checksNick Banks2025-07-203-41/+40
| | | | | | Reviewed by: tuexen MFC after: 1 week Sponsored by: Netflix, Inc.
* tcp: fix the test that a duplicate ACK has no dataMichael Tuexen2025-07-191-4/+5
| | | | | | | | | | | | | | When processing a TCP segment, data is removed from the head or the tail. The test whether a segment has no data on it should depend on the TCP segment before the removal. Without this, received segments might trigger a fast retransmit even when they should not. Reported by: syzbot+fc97a2b5a0f7ea161161@syzkaller.appspotmail.com Reviewed by: Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51425
* sysctl net.inet.tcp.ktcplist: properly fill driver status length fieldKonstantin Belousov2025-07-101-8/+18
| | | | | | | | Also ignore errors from drivers. If driver snd_tag status method returned an error, silently ignore the returned string, and not advance the position of the filled buffer. Sponsored by: Nvidia networking
* sysctl net.inet.tcp.ktcplist: try to handle EDEADLKKonstantin Belousov2025-07-101-10/+17
| | | | | | | | | If EDEADLK is returned from the locked handler, restart it. Do it limited number of times. Catch signals between tries. Reviewed by: glebius, markj Sponsored by: Nvidia networking Differential revision: https://reviews.freebsd.org/D51143
* sysctl net.inet.tcp.ktlslist: allow snd_tag_status_str() to sleepKonstantin Belousov2025-07-101-0/+20
| | | | | | | | | | For this, unlock inp around the calls, taking the reference on it. If the inp appears to be freed or unlinked after the relock, return EDEADLK. Reviewed by: glebius, markj Sponsored by: Nvidia networking Differential revision: https://reviews.freebsd.org/D51143