path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* ifnet_byindex() actually requires network epochGleb Smirnoff42 hours3-37/+31
| | | | | | | | | | | | | Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
* tcp: rack fails to send out a TLP after a MTU changeRandall Stewart44 hours1-0/+1
| | | | | | | | | | | | | | When rack sends out a TLP it sets up various state to make sure it avoids the cwnd (its been more than 1 RTT since our last send) and it may at times send new data. If an MTU change as occurred and our cwnd has collapsed we can have a situation where must_retran flag is set and we obey the cwnd thus never sending the TLP and then sitting stuck. This one line fix addresses that problem Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33231
* in_pcb: delay crfree() down into UMA dtorGleb Smirnoff3 days1-15/+27
| | | | | | | | | | | | inpcb lookups, which check inp_cred, work with pcbs that potentially went through in_pcbfree(). So inp_cred should stay valid until SMR guarantees its invisibility to lookups. While here, put the whole inpcb destruction sequence of in_pcbfree(), inpcb_dtor() and inpcb_fini() sequentially. Submitted by: markj Differential revision: https://reviews.freebsd.org/D33273
* sctp: unbreak NOINET6 builds.Michael Tuexen4 days1-1/+9
| | | | | | PR: 260119 Reported by: kostikbel MFC after: 1 week
* sctp: inherit IP level socket options from listening socketMichael Tuexen5 days1-0/+9
| | | | | | | | Ensure that TTL and TOS values set on a listener get inheritet to the accepted sockets. PR: 260119 MFC after: 1 week
* tcp_ccalgounload(): initialize the inpcb iterator when curvnet is setGleb Smirnoff5 days1-2/+2
| | | | | Pointy hat to: glebius Fixes: de2d47842e88
* in_pcb: limit the effect of wraparound in TCP random port allocation checkPeter Lei5 days1-2/+2
| | | | | | | | | | | | | | | | The check to see if TCP port allocation should change from random to sequential port allocation mode may incorrectly cause a false positive due to negative wraparound. Example: V_ipport_tcpallocs = 2147483585 (0x7fffffc1) V_ipport_tcplastcount = 2147483553 (0x7fffffa1) V_ipport_randomcps = 100 The original code would compare (2147483585 <= -2147483643) and thus incorrectly move to sequential allocation mode. Compute the delta first before comparing against the desired limit to limit the wraparound effect (since tcplastcount is always a snapshot of a previous tcpallocs).
* sctp: use the correct traffic class when sending SCTP/IPv6 packetsMichael Tuexen5 days2-22/+24
| | | | | | | | | | | When sending packets the stcb was used to access the inp and then access the endpoint specific IPv6 level options. This fails when there exists an inp, but no stcb yet. This is the case for sending an INIT-ACK in response to an INIT when no association already exists. Fix this by just providing the inp instead of the stcb. PR: 260120 MFC after: 1 week
* in_pcb: fix TCP local ephemeral port accountingPeter Lei5 days1-1/+1
| | | | | | Fix logic error causing UDP(-Lite) local ephemeral port bindings to count against the TCP allocation counter, potentially causing TCP to go from random to sequential port allocation mode prematurely.
* tcp_drain(): initialize the inpcb iterator when curvnet is setGleb Smirnoff5 days1-2/+2
| | | | | | Reported by: cy Pointy hat to: glebius Fixes: de2d47842e88
* udp_detach(): fix set but not used warningGleb Smirnoff5 days1-2/+0
* udp_multi_input(): the UDP header is only needed for probesGleb Smirnoff5 days1-0/+2
| | | | | Reported by: kib Fixes: de2d47842e88
* Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"Cy Schubert6 days22-2591/+1353
| | | | | | | | This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
* wpa: Import wpa_supplicant/hostapd commit 14ab4a816Cy Schubert6 days22-1353/+2591
| | | | | | This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month
* ip_input: remove pointless check in INP_RECVIF handlingGleb Smirnoff6 days1-2/+1
| | | | | An mbuf rcvif pointer is supposed to be valid and doesn't need extra checks. The code appeared in d314ad7b73639.
* tcp_hpts: rewrite inpcb synchronizationGleb Smirnoff6 days5-281/+262
| | | | | | | | | | | | | | | | | | | | Just trust the pcb database, that if we did in_pcbref(), no way an inpcb can go away. And if we never put a dropped inpcb on our queue, and tcp_discardcb() always removes an inpcb to be dropped from the queue, then any inpcb on the queue is valid. Now, to solve LOR between inpcb lock and HPTS queue lock do the following trick. When we are about to process a certain time slot, take the full queue of the head list into on stack list, drop the HPTS lock and work on our queue. This of course opens a race when an inpcb is being removed from the on stack queue, which was already mentioned in comments. To address this race introduce generation count into queues. If we want to remove an inpcb with generation count mismatch, we can't do that, we can only mark it with desired new time slot or -1 for remove. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33026
* tcp_hpts: rename input queue to drop queue and trim dead codeGleb Smirnoff6 days8-246/+163
| | | | | | | | | | | | | | | The HPTS input queue is in reality used only for "delayed drops". When a TCP stack decides to drop a connection on the output path it can't do that due to locking protocol between main tcp_output() and stacks. So, rack/bbr utilize HPTS to drop the connection in a different context. In the past the queue could also process input packets in context of HPTS thread, but now no stack uses this, so remove this functionality. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33025
* tcp_hpts: make struct tcp_hpts_entry private to the module.Gleb Smirnoff6 days2-178/+78
| | | | | | | | Also, make some of the functions also private to the module. Remove unused functions discovered after that. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33024
* tcp_hpts: provide tcp_in_hpts().Gleb Smirnoff6 days4-40/+47
| | | | | | | It will hide some internal HPTS knowledge from the consumers. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33023
* SMR protection for inpcbsGleb Smirnoff6 days13-884/+821
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector. Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details: - The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *). Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022
* inpcb: reduce some aliased functions after removal of PCBGROUP.Gleb Smirnoff6 days4-43/+11
| | | | | Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33021
* Remove "options PCBGROUP"Gleb Smirnoff6 days5-946/+4
| | | | | | | | | | | | | | | | With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020
* Allow to compile RSS without PCBGROUP.Gleb Smirnoff6 days1-4/+0
| | | | | Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019
* tcp: unloading a module that is set to default should error.Randall Stewart6 days1-4/+3
| | | | | | | | | | I just discovered that the return of the EBUSY error was incorrectly rigged so that you could unload a CC module that was set to default. Its supposed to be an EBUSY error. Make it so. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33229
* sctp: improve handling of assoc ids in socket optionsMichael Tuexen7 days1-7/+23
| | | | | | | | For socket options related to local and remote addresses providing generic association ids does not make sense. Report EINVAL in this case. MFC after: 1 week
* sctp: cleanup, no functional change intended.Michael Tuexen7 days1-29/+30
| | | | MFC after: 1 week
* netinet: Fix a common typo in source code commentsGordon Bergling8 days3-3/+3
| | | | | | - s/segement/segment/ MFC after: 3 days
* inet(3): Fix two typos in sysctl descriptionsGordon Bergling8 days1-2/+2
| | | | | | - s/sequental/sequential/ MFC after: 3 days
* tcp(4): Fix a typo in a sysctl descriptionGordon Bergling8 days1-1/+1
| | | | | | - s/entires/entries/ MFC after: 3 days
* tcp: Don't try to upgrade a read lock just for loggingMichael Tuexen9 days1-14/+40
| | | | | | Reviewed by: glebius, lstewart, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D33098
* sctp: improve consistency, no functional change intendedMichael Tuexen12 days4-18/+18
* sctp: add some asserts, no functional changes intendedMichael Tuexen12 days1-2/+9
| | | | | This might help in narrowing down https://syzkaller.appspot.com/bug?id=fbd79abaec55f5aede63937182f4247006ea883b
* netinet: Remove unneeded mb_unmapped_to_ext() callsMark Johnston14 days2-21/+6
| | | | | | | | | | in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted. Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097
* netinet: Implement in_cksum_skip() using m_apply()Mark Johnston14 days1-31/+32
| | | | | | | | | | | | This allows it to work with unmapped mbufs. In particular, in_cksum_skip() calls no longer need to be preceded by calls to mb_unmapped_to_ext() to avoid a page fault. PR: 259645 Reviewed by: gallatin, glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33096
* netinet: Deduplicate most in_cksum() implementationsMark Johnston14 days1-0/+257
| | | | | | | | | | | | | | | | | | | in_cksum() and related routines are implemented separately for each platform, but only i386 and arm have optimized versions. Other platforms' copies of in_cksum.c are identical except for style differences and support for big-endian CPUs. Deduplicate the implementations for the rest of the platforms. This will make it easier to implement in_cksum() for unmapped mbufs. On arm and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is not to be compiled. No functional change intended. Reviewed by: kp, glebius MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33095
* netinet: Remove in_cksum.cMark Johnston14 days1-148/+0
| | | | | | | | | | It does not get compiled into the kernel. No functional change inteneded. Reviewed by: kp, glebius, cy MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33094
* cc_newreno(4): Fix a typo in a source code commentGordon Bergling2021-11-191-1/+1
| | | | | | - s/conditons/conditions/ MFC after: 3 days
* Add tcp_freecb() - single place to free tcpcb.Gleb Smirnoff2021-11-194-104/+101
| | | | | | | | | | | | Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise. They were pretty much copy-n-paste, except that in the default case we would run tcp_hc_update(). Merge this into single function tcp_freecb() and move new short version of tcp_timer_discard() to tcp_timer.c and make it static. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32965
* tcp_timewait: use on stack struct tcptw as last resortGleb Smirnoff2021-11-191-4/+5
| | | | | | | | | In case we failed to uma_zalloc() and also failed to reuse with tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow to run through tcp_twrespond() and standard tcpcb discard routine. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32965
* tcp: Rack ack war with a mis-behaving firewall or nat with resets.Randall Stewart2021-11-173-15/+62
| | | | | | | | | | | | | | | | | | Previously we added ack-war prevention for misbehaving firewalls. This is where the f/w or nat messes up its sequence numbers and causes an ack-war. There is yet another type of ack war that we have found in the wild that is like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such) and instead of turning the ack/seq around and adding a TH_RST it does something real stupid and sends a new packet with seq=0. This of course triggers the challenge ack in the reset processing which then sends in a challenge ack (if the seq=0 is within the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat. This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters) to prevent it and allow say 5 per second by default. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32938
* sctp: Remove now-unneeded mb_unmapped_to_ext() callsMark Johnston2021-11-162-18/+0
| | | | | | | | | | | sctp_delayed_checksum() now handles unmapped mbufs, thanks to m_apply(). No functional change intended. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32942
* sctp: Use m_apply() to calcuate a checksum for an mbuf chainMark Johnston2021-11-161-24/+21
| | | | | | | | | | | | | | | | | m_apply() works on unmapped mbufs, so this will let us elide mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in the network stack. Modify sctp_calculate_cksum() to assume it's passed an mbuf header. This assumption appears to be true in practice, and we need to know the full length of the chain. No functional change intended. Reviewed by: tuexen, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32941
* kernel: partially revert e9efb1125a15, default inet maskMike Karels2021-11-141-4/+13
| | | | | | | | | | When no mask is supplied to the ioctl adding an Internet interface address, revert to using the historical class mask rather than a single default. Similarly for the NFS bootp code. MFC after: 3 weeks Reviewed by: melifaro glebius Differential Revision: https://reviews.freebsd.org/D32951
* tcp: Fix a locking issue related to loggingMichael Tuexen2021-11-141-15/+23
| | | | | | | | | | | | tcp_respond() is sometimes called with only a read lock. The logging however, requires a write lock. So either try to upgrade the lock if needed, or don't log the packet. Reported by: syzbot+8151ef969c170f76706b@syzkaller.appspotmail.com Reported by: syzbot+eb679adb3304c511c1e4@syzkaller.appspotmail.com Reviewed by: markj, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32983
* tcp_usr_detach: revert debugging piece from f5cf1e5f5a500.Gleb Smirnoff2021-11-131-20/+3
| | | | | | | | The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32968
* tcp_timers: check for (INP_TIMEWAIT | INP_DROPPED) only onceGleb Smirnoff2021-11-131-37/+4
| | | | | | | | | | All timers keep inpcb locked through their execution. We need to check these flags only once. Checking for INP_TIMEWAIT earlier is is also safer, since such inpcbs point into tcptw rather than tcpcb, and any dereferences of inp_ppcb as tcpcb are erroneous. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32967
* tcp: Fix a locking issueMichael Tuexen2021-11-121-4/+9
| | | | | | | | | | INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return from the function, so any locks held must be released. Reported by: syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com Reviewed by: markj Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32975
* tcp: Ensure that vnets have an initialized V_default_cc_ptrMark Johnston2021-11-121-0/+17
| | | | | | | | | This causes new vnets to inherit the cc algorithm from vnet0. This is a temporary patch to fix vnet jail creation. With encouragement from: glebius Fixes: b8d60729deef ("tcp: Congestion control cleanup.") Differential Revision: https://reviews.freebsd.org/D32970
* tcp: better congestion control defaultsWarner Losh2021-11-121-0/+7
| | | | | | | | | | | | Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config files. It's the default congestion control algorithm. Add code to cc.c so that CC_DEFAULT is "newreno" if it's not overriden in the config file. Sponsored by: Netflix Fixes: b8d60729deef ("tcp: Congestion control cleanup.") Revired by: manu, hselasky, jhb, glebius, tuexen Differential Revision: https://reviews.freebsd.org/D32964
* Add net.inet.ip.source_address_validationGleb Smirnoff2021-11-121-0/+16
| | | | | | | | | | | Drop packets arriving from the network that have our source IP address. If maliciously crafted they can create evil effects like an RST exchange between two of our listening TCP ports. Such packets just can't be legitimate. Enable the tunable by default. Long time due for a modern Internet host. Reviewed by: donner, melifaro Differential revision: https://reviews.freebsd.org/D32914