path: root/sys/netinet/tcp_stacks
Commit message (Collapse)AuthorAgeFilesLines
* tcp: Missing mfree in rack and bbrRandall Stewart2021-06-142-1/+6
| | | | | | | | | | | | | | | | | Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and then not including a timestamp. This involved in the input path doing a goto done_with_input label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN. This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this missing m_freem() but rack did not). This then caused the missing m_freem to show up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs later (after processing) is not a good idea, even though its only for logging. Best to copy that off before any frees can take place. Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30727 (cherry picked from commit ba1b3e48f5be320f0590bc357ea53fdc3e4edc65)
* tcp: Mbuf leak while holding a socket buffer lock.Randall Stewart2021-06-142-52/+64
| | | | | | | | | | | | | | | | | | | | | | | When running at NF the current Rack and BBR changes with the recent commits from Richard that cause the socket buffer lock to be held over the ip_output() call and then finally culminating in a call to tcp_handle_wakeup() we get a lot of leaked mbufs. I don't think that this leak is actually caused by holding the lock or what Richard has done, but is exposing some other bug that has probably been lying dormant for a long time. I will continue to look (using his changes) at what is going on to try to root cause out the issue. In the meantime I can't leave the leaks out for everyone else. So this commit will revert all of Richards changes and move both Rack and BBR back to just doing the old sorwakeup_locked() calls after messing with the so_rcv buffer. We may want to look at adding back in Richards changes after I have pinpointed the root cause of the mbuf leak and fixed it. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30704 (cherry picked from commit 67e892819b26c198e4232c7586ead7f854f848c5)
* tcp: remove debug output from RACKMichael Tuexen2021-06-131-2/+0
| | | | | | | | | | Reported by: iron.udjin@gmail.com, Marek Zarychta Reviewed by: rrs PR: 256538 Differential Revision: https://reviews.freebsd.org/D30723 Sponsored by: Netflix, Inc. (cherry picked from commit f1536bb53898b12e2d19938f8fe2d04b5e5d12a6)
* tcp: fix compilation of IPv4-only buildsMichael Tuexen2021-06-131-0/+2
| | | | | | | | PR: 256538 Reported by: iron.udjin@gmail.com Sponsored by: Netflix, Inc. (cherry picked from commit 224cf7b35b9bbe8d075f6004249d850c620b7855)
* tcp: fix a RACK socket buffer lock issueMichael Tuexen2021-06-091-1/+2
| | | | | | | | | | | Fix a missing socket buffer unlocking of the socket receive buffer. Reviewed by: gallatin, rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D30402 (cherry picked from commit 9bbd1a8fcb13928cd4b6cfddf0a8359d5dc97451)
* rack: honor prior socket buffer lock when doing the upcallRichard Scheffenegger2021-06-091-2/+2
| | | | | | | | | | | | | | While partially reverting D24237 with D29690, due to introducing some unintended effects for in-kernel TCP consumers, the preexisting lock on the socket send buffer was not considered properly. Found by: markj MFC after: 2 weeks Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D30390 (cherry picked from commit 39756885633fd9d9649b4cb0f0abf594bfeb8dbb)
* [tcp] Keep socket buffer locked until upcallRichard Scheffenegger2021-06-093-22/+19
| | | | | | | | | | | | | | | | r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client components (MP threads) accessing the socket buffer with potentially not correctly updated state. Reported by: rmacklem Reviewed By: tuexen, #transport Tested by: rmacklem, otis MFC after: 2 weeks Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29690 (cherry picked from commit 032bf749fd44ac5ff20aab2c3d8e3c05491778ea)
* tcp: A better fix for the previously attempted fix of the ack-war issue with ↵Randall Stewart2021-06-092-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | tcp. So it turns out that my fix before was not correct. It ended with us failing some of the "improved" SYN tests, since we are not in the correct states. With more digging I have figured out the root of the problem is that when we receive a SYN|FIN the reassembly code made it so we create a segq entry to hold the FIN. In the established state where we were not in order this would be correct i.e. a 0 len with a FIN would need to be accepted. But if you are in a front state we need to strip the FIN so we correctly handle the ACK but ignore the FIN. This gets us into the proper states and avoids the previous ack war. I back out some of the previous changes but then add a new change here in tcp_reass() that fixes the root cause of the issue. We still leave the rack panic fixes in place however. Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30627 (cherry picked from commit 4747500deaaa7765ba1c0413197c23ddba4faf49)
* tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.Randall Stewart2021-06-092-2/+12
| | | | | | | | | | | | | | The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr). However there was a missing case, i.e. where we get an out-of-order FIN by itself. In such a case we don't want to leave the FIN bit set, otherwise we will do the wrong thing and ack the FIN incorrectly. Instead we need to go through the tcp_reasm() code and that way the FIN will be stripped and all will be well. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30497 (cherry picked from commit 8c69d988a8d32e53310c7b73ec8721b04b7249e6)
* tcp: Add a socket option to rack so we can test various changes to the slop ↵Randall Stewart2021-06-092-13/+33
| | | | | | | | | | | | | | | | | value in timers. Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack timers were longer and links were slower. A 200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that lowering this value to something more in line with todays delayed ack values (40ms) might improve TCP. This bit of code makes it so rack can, via a socket option, adjust the timer slop. Reviewed by: mtuexen Sponsered by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30249 (cherry picked from commit 4f3addd94be5e02e6e425f6119f5409972ab5d14)
* tcp: Fix bugs related to the PUSH bit and rack and an ack warRandall Stewart2021-06-092-20/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed in the last commit. The problem is the left edge gets transmitted before the adjustments are done to the send_map, this means that right edge bits must be considered to be added only if the entire RSM is being retransmitted. Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself. After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically what happens is we go into the reassembly code and lose the FIN bit. The trick here is we should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything. That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no timers running. This is because the usrclosed function gets called and the FIN's and such have already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2 timer can speed this up in testing. Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30451 (cherry picked from commit 13c0e198ca275447f9a60a03f730c38c98f19009)
* tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu ↵Randall Stewart2021-06-091-2/+6
| | | | | | | | | | | | | | | change for fsb's The push bit itself was also not actually being properly moved to the right edge. The FIN bit was incorrectly on the left edge. We fix these two issues as well as plumb in the mtu_change for alternate stacks. Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30413 (cherry picked from commit 631449d5d03506295eaa6947c1b0e8a168a2f6b7)
* tcp: Handle stack switch while processing socket optionsMichael Tuexen2021-06-092-55/+67
| | | | | | | | | | | | | Handle the case where during socket option processing, the user switches a stack such that processing the stack specific socket option does not make sense anymore. Return an error in this case. Reviewed by: markj Reported by: syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com Sponsored by: Netflix, Inc. Differential revision: https://reviews.freebsd.org/D30395 (cherry picked from commit 8923ce630492d21ec57c2637757bcc44da9970f8)
* tcp: Fix sending of TCP segments with IP level optionsMichael Tuexen2021-06-092-4/+4
| | | | | | | | | | | | | | When bringing in TCP over UDP support in https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605, the length of IP level options was considered when locating the transport header. This was incorrect and is fixed by this patch. X-MFC with: https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605 Reviewed by: markj, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D30358 (cherry picked from commit 500eb6dd80404ea512e31a8f795c73cb802c9c64)
* tcp: Incorrect KASSERT causes a panic in rackRandall Stewart2021-06-091-2/+10
| | | | | | | | | | | | | | | | | Skyzall found an interesting panic in rack. When a SYN and FIN are both sent together a KASSERT gets tripped where it is validating that a mbuf pointer is in the sendmap. But a SYN and FIN often will not have a mbuf pointer. So the fix is two fold a) make sure that the SYN and FIN split the right way when cloning an RSM SYN on left edge and FIN on right. And also make sure the KASSERT properly accounts for the case that we have a SYN or FIN so we don't panic. Reviewed by: mtuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D30241 (cherry picked from commit 02cffbc2507e83944b0c29d69d6ddf26c9386d54)
* tcp rack: improve initialisation of retransmit timeoutMichael Tuexen2021-06-091-2/+7
| | | | | | | | | | | When the TCP is in the front states, don't take the slop variable into account. This improves consistency with the base stack. Reviewed by: rrs@ Differential Revision: https://reviews.freebsd.org/D30230 Sponsored by: Netflix, Inc. (cherry picked from commit 251842c63927fc4af63bdc61989bbfbf3823c679)
* tcp: In rack, we must only convert restored rtt when the hostcache does ↵Randall Stewart2021-06-091-3/+6
| | | | | | | | | | | | | | | | | | restore them. Rack now after the previous commit is very careful to translate any value in the hostcache for srtt/rttvar into its proper format. However there is a snafu here in that if tp->srtt is 0 is the only time that the HC will actually restore the srtt. We need to then only convert the srtt restored when it is actually restored. We do this by making sure it was zero before the call to cc_conn_init and it is non-zero afterwards. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30213 (cherry picked from commit 4b86a24a76a4d58c1d870fcb2252b321f61cb3cc)
* tcp:Host cache and rack ending up with incorrect values.Randall Stewart2021-06-091-45/+57
| | | | | | | | | | | | | | | | | | | | | | | The hostcache up to now as been updated in the discard callback but without checking if we are all done (the race where there are more than one calls and the counter has not yet reached zero). This means that when the race occurs, we end up calling the hc_upate more than once. Also alternate stacks can keep there srtt/rttvar in different formats (example rack keeps its values in microseconds). Since we call the hc_update *before* the stack fini() then the values will be in the wrong format. Rack on the other hand, needs to convert items pulled from the hostcache into its internal format else it may end up with very much incorrect values from the hostcache. In the process lets commonize the update mechanism for srtt/rttvar since we now have more than one place that needs to call it. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30172 (cherry picked from commit 9867224bab3f247ac875d89c2472aa4bc855fe3b)
* Fix a UDP tunneling issue with rack. Basically there are twoRandall Stewart2021-06-081-3/+13
| | | | | | | | | | | | | | | | issues. A) Not enough hdrlen was being calculated when a UDP tunnel is in place. and B) Not enough memory is allocated in racks fsb. We need to overbook the fsb to include a udphdr just in case. Submitted by: Peter Lei Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30157 (cherry picked from commit a16cee0218652230d94a73690201e76baab0bba1)
* This brings into sync FreeBSD with the netflix versions of rack and bbr.Randall Stewart2021-06-086-1854/+7009
| | | | | | | | | | | | | | This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack.. Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036 (cherry picked from commit 5d8fd932e418f03e98b3469c4088a36f0ef34ffe)
* This pulls over all the changes that are in the netflixRandall Stewart2021-06-072-9/+9
| | | | | | | | | | | tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357 (cherry picked from commit 1a714ff204193b9eb810426048e03f5d76e9730e)
* mendMichael Tuexen2021-06-072-54/+10
* tcp: improve handling of SYN segments in SYN-SENT stateMichael Tuexen2021-06-022-2/+6
| | | | | | | | | | | Ensure that the stack does not generate a DSACK block for user data received on a SYN segment in SYN-SENT state. Reviewed by: rscheff Differential Revision: https://reviews.freebsd.org/D29376 Sponsored by: Netflix, Inc. (cherry picked from commit 40f41ece765dc0b0907ca90796a1af4f4f89b2a0)
* net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macrosHans Petter Selasky2021-05-101-1/+1
| | | | | | | | | | | | | Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header. Use them where appropriate. Reviewed by: ae (previous version), rscheff, tuexen, rgrimes Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056 (cherry picked from commit bb4a7d94b99fbf7f59c876ffff8ded5f6a5b5c3e)
* rack: Fix ECN on finalizing session.Richard Scheffenegger2021-04-221-1/+1
| | | | | | | | | | | | | | Maintain code similarity between RACK and base stack for ECN. This may not strictly be necessary, depending when a state transition to FIN_WAIT_1 is done in RACK after a shutdown() or close() syscall. MFC after: 3 days Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29658 (cherry picked from commit 2e97826052d169f6e2e1d2f87b086f56d1cf2b0b)
* rack: unbreak TCP fast open for the client sideMichael Tuexen2021-03-081-1/+2
| | | | | | | | | | Allow sending user data on the SYN segment. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D29082 Sponsored by: Netflix, Inc. (cherry picked from commit 705d06b289e9821439b7b694d766cad75bc064e5)
* RACK: fix an issue triggered by using the CDG CC moduleMichael Tuexen2021-03-041-2/+2
| | | | | | | | Obtained from: rrs@ PR: 238741 Sponsored by: Netlix, Inc. (cherry picked from commit 99adf230061268175a36061130e6adb0882270e8)
* tcp: add sysctl to tolerate TCP segments missing timestampsMichael Tuexen2021-01-142-4/+6
| | | | | | | | | | | | | | | | When timestamp support has been negotiated, TCP segements received without a timestamp should be discarded. However, there are broken TCP implementations (for example, stacks used by Omniswitch 63xx and 64xx models), which send TCP segments without timestamps although they negotiated timestamp support. This patch adds a sysctl variable which tolerates such TCP segments and allows to interoperate with broken stacks. Reviewed by: jtl@, rscheff@ Differential Revision: https://reviews.freebsd.org/D28142 Sponsored by: Netflix, Inc. PR: 252449 MFC after: 1 week
* tcp: fix handling of TCP RST segments missing timestampsMichael Tuexen2021-01-142-4/+6
| | | | | | | | | | | A TCP RST segment should be processed even it is missing TCP timestamps. Reported by: dmgk@, kevans@ Reviewed by: rscheff@, dmgk@ Sponsored by: Netflix, Inc. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28143
* RFC 7323 specifies that:Michael Tuexen2020-11-092-13/+38
| | | | | | | | | | | | | | | | | * TCP segments without timestamps should be dropped when support for the timestamp option has been negotiated. * TCP segments with timestamps should be processed normally if support for the timestamp option has not been negotiated. This patch enforces the above. PR: 250499 Reviewed by: gnn, rrs MFC after: 1 week Sponsored by: Netflix, Inc Differential Revision: https://reviews.freebsd.org/D27148 Notes: svn path=/head/; revision=367530
* Prevent premature SACK block transmission during loss recoveryRichard Scheffenegger2020-11-083-16/+25
| | | | | | | | | | | | | | | | Under specific conditions, a window update can be sent with outdated SACK information. Some clients react to this by subsequently delaying loss recovery, making TCP perform very poorly. Reported by: chengc_netapp.com Reviewed by: rrs, jtl MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D24237 Notes: svn path=/head/; revision=367492
* So it turns out that syzkaller hit another crash. It has to do with switchingRandall Stewart2020-09-092-3/+23
| | | | | | | | | | | | | | | stacks with a SENT_FIN outstanding. Both rack and bbr will only send a FIN if all data is ack'd so this must be enforced. Also if the previous stack sent the FIN we need to make sure in rack that when we manufacture the "unknown" sends that we include the proper HAS_FIN bits. Note for BBR we take a simpler approach and just refuse to switch. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D26269 Notes: svn path=/head/; revision=365501
* bbr: remove unused static functionBjoern A. Zeeb2020-09-051-25/+0
| | | | | | | | | | | bbr_log_type_hrdwtso() is a file local static unused function. Remove it to avoid warnings on kernel compiles. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D26331 Notes: svn path=/head/; revision=365350
* net: clean up empty lines in .c and .h filesMateusz Guzik2020-09-015-80/+0
| | | | Notes: svn path=/head/; revision=365071
* RFC 3465 defines a limit L used in TCP slow start for limiting the numberMichael Tuexen2020-08-251-2/+1
| | | | | | | | | | | | | | | | of acked bytes as described in Section 2.2 of that document. This patch ensures that this limit is not also applied in congestion avoidance. Applying this limit also in congestion avoidance can result in using less bandwidth than allowed. Reported by: l.tian.email@gmail.com Reviewed by: rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D26120 Notes: svn path=/head/; revision=364754
* TCP: remove special treatment for hardware (ifnet) TLSAndrew Gallatin2020-08-192-290/+21
| | | | | | | | | | | | | | | | | | | | | | | | | Remove most special treatment for ifnet TLS in the TCP stack, except for code to avoid mixing handshakes and bulk data. This code made heroic efforts to send down entire TLS records to NICs. It was added to improve the PCIe bus efficiency of older TLS offload NICs which did not keep state per-session, and so would need to re-DMA the first part(s) of a TLS record if a TLS record was sent in multiple TCP packets or TSOs. Newer TLS offload NICs do not need this feature. At Netflix, we've run extensive QoE tests which show that this feature reduces client quality metrics, presumably because the effort to send TLS records atomically causes the server to both wait too long to send data (leading to buffers running dry), and to send too much data at once (leading to packet loss). Reviewed by: hselasky, jhb, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26103 Notes: svn path=/head/; revision=364405
* Fix the cleanup handling in a error path for TCP BBR.Michael Tuexen2020-07-011-0/+2
| | | | | | | | | | Reported by: syzbot+df7899c55c4cc52f5447@syzkaller.appspotmail.com Reviewed by: rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D25486 Notes: svn path=/head/; revision=362846
* iSo in doing final checks on OCA firmware with all the latest tweaks the ↵Randall Stewart2020-06-162-7/+6
| | | | | | | | | | | | | | | | dup-ack checking packet drill script was failing with a number of unexpected acks. So it turns out if you have the default recvwin set up to 1Meg (like OCA's do) and you have no window scaling (like the dupack checking code) then we have another case where we are always trying to update the rwnd and sending an ack when we should not. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25298 Notes: svn path=/head/; revision=362234
* So it turns out rack has a shortcoming in dup-ack counting. It counts the ↵Randall Stewart2020-06-161-3/+5
| | | | | | | | | | | | | | dupacks but then does not properly respond to them. This is because a few missing bits are not present. BBR actually does properly respond (though it also sends a TLP which is interesting and maybe something to fix).. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25294 Notes: svn path=/head/; revision=362225
* So it turns out with the right window scaling you can get the code in all ↵Randall Stewart2020-06-122-12/+24
| | | | | | | | | | | | | | | | | | | | | stacks to always want to do a window update, even when no data can be sent. Now in cases where you are not pacing thats probably ok, you just send an extra window update or two. However with bbr (and rack if its paced) every time the pacer goes off its going to send a "window update". Also in testing bbr I have found that if we are not responding to data right away we end up staying in startup but incorrectly holding a pacing gain of 192 (a loss). This is because the idle window code does not restict itself to only work with PROBE_BW. In all other states you dont want it doing a PROBE_BW state change. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D25247 Notes: svn path=/head/; revision=362113
* An important statistic in determining if a server process (or client) is ↵Randall Stewart2020-06-082-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | being delayed is to know the time to first byte in and time to first byte out. Currently we have no way to know these all we have is t_starttime. That (t_starttime) tells us what time the 3 way handshake completed. We don't know when the first request came in or how quickly we responded. Nor from a client perspective do we know how long from when we sent out the first byte before the server responded. This small change adds the ability to track the TTFB's. This will show up in BB logging which then can be pulled for later analysis. Note that currently the tracking is via the ticks variable of all three variables. This provides a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be to change all three of these values into something with a much finer resolution (either microseconds or nanoseconds), though we may want to make the resolution configurable so that on lower powered machines we could still use the much cheaper ticks variable. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24902 Notes: svn path=/head/; revision=361926
* This fixes a couple of skyzaller crashes. MostRandall Stewart2020-06-033-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | of them have to do with TFO. Even the default stack had one of the issues: 1) We need to make sure for rack that we don't advance snd_nxt beyond iss when we are not doing fast open. We otherwise can get a bunch of SYN's sent out incorrectly with the seq number advancing. 2) When we complete the 3-way handshake we should not ever append to reassembly if the tlen is 0, if TFO is enabled prior to this fix we could still call the reasemmbly. Note this effects all three stacks. 3) Rack like its cousin BBR should track if a SYN is on a send map entry. 4) Both bbr and rack need to only consider len incremented on a SYN if the starting seq is iss, otherwise we don't increment len which may mean we return without adding a sendmap entry. This work was done in collaberation with Michael Tuexen, thanks for all the testing! Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D25000 Notes: svn path=/head/; revision=361751
* bbr: Use arc4random_uniform from libkern.Emmanuel Vadot2020-05-231-37/+1
| | | | | | | | | This unbreak LINT build Reported by: jenkins, melifaro Notes: svn path=/head/; revision=361422
* With RFC3168 ECN, CWR SHOULD only be sent with new dataRichard Scheffenegger2020-05-211-9/+16
| | | | | | | | | | | | | | | | Overly conservative data receivers may ignore the CWR flag on other packets, and keep ECE latched. This can result in continous reduction of the congestion window, and very poor performance when ECN is enabled. Reviewed by: rgrimes (mentor), rrs Approved by: rgrimes (mentor), tuexen (mentor) MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23364 Notes: svn path=/head/; revision=361347
* Retain only mutually supported TCP options after simultaneous SYNRichard Scheffenegger2020-05-212-6/+12
| | | | | | | | | | | | | | | | | | | When receiving a parallel SYN in SYN-SENT state, remove all the options only we supported locally before sending the SYN,ACK. This addresses a consistency issue on parallel opens. Also, on such a parallel open, the stack could be coaxed into running with timestamps enabled, even if administratively disabled. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23371 Notes: svn path=/head/; revision=361346
* Handle ECN handshake in simultaneous openRichard Scheffenegger2020-05-211-0/+14
| | | | | | | | | | | | | | While testing simultaneous open TCP with ECN, found that negotiation fails to arrive at the expected final state. Reviewed by: tuexen (mentor) Approved by: tuexen (mentor), rgrimes (mentor) MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D23373 Notes: svn path=/head/; revision=361345
* This fixes several skyzaller issues found with theRandall Stewart2020-05-153-4/+32
| | | | | | | | | | | | | | | | | help of Michael Tuexen. There was some accounting errors with TCPFO for bbr and also for both rack and bbr there was a FO case where we should be jumping to the just_return_nolock label to exit instead of returning 0. This of course caused no timer to be running and thus the stuck sessions. Reported by: Michael Tuexen and Skyzaller Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D24852 Notes: svn path=/head/; revision=361080
* When in the SYN-SENT state bbr and rack will not properly send an ACK but ↵Randall Stewart2020-05-072-2/+11
| | | | | | | | | | | | | instead start the D-ACK timer. This causes so_reuseport_lb_test to fail since it slows down how quickly the program runs until the timeout occurs and fails the test Sponsored by: Netflix inc. Differential Revision: https://reviews.freebsd.org/D24747 Notes: svn path=/head/; revision=360798
* NF has an internal option that changes the tcp_mcopy_m routine slightly (hasRandall Stewart2020-05-072-6/+0
| | | | | | | | | | | | a few extra arguments). Recently that changed to only have one arg extra so that two ifdefs around the call are no longer needed. Lets take out the extra ifdef and arg. Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D24736 Notes: svn path=/head/; revision=360776
* Add net epoch support back, which was taken out by accident inMichael Tuexen2020-05-041-0/+4
| | | | | | | | | | | https://svnweb.freebsd.org/changeset/base/360639 Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24694 Notes: svn path=/head/; revision=360645