| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.
Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30727
(cherry picked from commit ba1b3e48f5be320f0590bc357ea53fdc3e4edc65)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.
In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.
Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30704
(cherry picked from commit 67e892819b26c198e4232c7586ead7f854f848c5)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.
Reviewed by: mtuexen, gallatin, hselasky
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30695
(cherry picked from commit b45daaea95abd8bda52caaacf120f9197caab3e7)
|
|
|
|
|
|
|
|
|
|
|
| |
* Completely initialise the CC module specific data
* Use beta_ecn in case of an ECN event whenever ABE is enabled
or it is requested by the stack.
Reviewed by: rscheff, rrs
Sponsored by: Netflix, Inc.
(cherry picked from commit fa3746be4203fc9a3414afb21d964eec8bad74f8)
|
|
|
|
|
|
|
|
|
|
| |
Reported by: iron.udjin@gmail.com, Marek Zarychta
Reviewed by: rrs
PR: 256538
Differential Revision: https://reviews.freebsd.org/D30723
Sponsored by: Netflix, Inc.
(cherry picked from commit f1536bb53898b12e2d19938f8fe2d04b5e5d12a6)
|
|
|
|
|
|
|
|
| |
PR: 256538
Reported by: iron.udjin@gmail.com
Sponsored by: Netflix, Inc.
(cherry picked from commit 224cf7b35b9bbe8d075f6004249d850c620b7855)
|
|
|
|
|
|
|
|
|
|
|
| |
from main to stable/13: Add TCP LRO support for VLAN and VxLAN.
Make sure all counters are allocated.
This is a direct commit.
Reported by: Herbert J. Skuhra <herbert@gojira.at>
Sponsored by: Mellanox Technologies // NVIDIA Networking
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a missing socket buffer unlocking of the socket receive buffer.
Reviewed by: gallatin, rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D30402
(cherry picked from commit 9bbd1a8fcb13928cd4b6cfddf0a8359d5dc97451)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.
Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390
(cherry picked from commit 39756885633fd9d9649b4cb0f0abf594bfeb8dbb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.
Reported by: rmacklem
Reviewed By: tuexen, #transport
Tested by: rmacklem, otis
MFC after: 2 weeks
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29690
(cherry picked from commit 032bf749fd44ac5ff20aab2c3d8e3c05491778ea)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.
I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.
Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30627
(cherry picked from commit 4747500deaaa7765ba1c0413197c23ddba4faf49)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.
Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30497
(cherry picked from commit 8c69d988a8d32e53310c7b73ec8721b04b7249e6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.
Reviewed by: mtuexen
Sponsered by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30249
(cherry picked from commit 4f3addd94be5e02e6e425f6119f5409972ab5d14)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.
Reviewed by: mtuexen,rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30451
(cherry picked from commit 13c0e198ca275447f9a60a03f730c38c98f19009)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.
Reviewed by: mtuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30413
(cherry picked from commit 631449d5d03506295eaa6947c1b0e8a168a2f6b7)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.
Reviewed by: markj
Reported by: syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com
Sponsored by: Netflix, Inc.
Differential revision: https://reviews.freebsd.org/D30395
(cherry picked from commit 8923ce630492d21ec57c2637757bcc44da9970f8)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When bringing in TCP over UDP support in
https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605,
the length of IP level options was considered when locating the
transport header. This was incorrect and is fixed by this patch.
X-MFC with: https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605
Reviewed by: markj, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D30358
(cherry picked from commit 500eb6dd80404ea512e31a8f795c73cb802c9c64)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skyzall found an interesting panic in rack. When a SYN and FIN are
both sent together a KASSERT gets tripped where it is validating that
a mbuf pointer is in the sendmap. But a SYN and FIN often will not
have a mbuf pointer. So the fix is two fold a) make sure that the
SYN and FIN split the right way when cloning an RSM SYN on left
edge and FIN on right. And also make sure the KASSERT properly
accounts for the case that we have a SYN or FIN so we don't
panic.
Reviewed by: mtuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D30241
(cherry picked from commit 02cffbc2507e83944b0c29d69d6ddf26c9386d54)
|
|
|
|
|
|
|
|
|
|
|
| |
When the TCP is in the front states, don't take the slop variable
into account. This improves consistency with the base stack.
Reviewed by: rrs@
Differential Revision: https://reviews.freebsd.org/D30230
Sponsored by: Netflix, Inc.
(cherry picked from commit 251842c63927fc4af63bdc61989bbfbf3823c679)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
restore them.
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30213
(cherry picked from commit 4b86a24a76a4d58c1d870fcb2252b321f61cb3cc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.
Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30172
(cherry picked from commit 9867224bab3f247ac875d89c2472aa4bc855fe3b)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).
Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30163
(cherry picked from commit 5a4333a5378f7afe4f8cab293a987865ae0c32c4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
in place.
and
B) Not enough memory is allocated in racks fsb. We need to
overbook the fsb to include a udphdr just in case.
Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157
(cherry picked from commit a16cee0218652230d94a73690201e76baab0bba1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..
Sponsored by: Netflix
Reviewed by: rscheff, mtuexen
Differential Revision: https://reviews.freebsd.org/D30036
(cherry picked from commit 5d8fd932e418f03e98b3469c4088a36f0ef34ffe)
|
|
|
|
|
|
|
|
| |
Discussed with: rrs@
Differential Revision: https://reviews.freebsd.org/D28357
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit db46c0d0cb3da2813727e56df1f2db292065867a)
|
|
|
|
|
|
|
|
|
|
|
| |
tree that fix the ratelimit code. There were several bugs
in tcp_ratelimit itself and we needed further work to support
the multiple tag format coming for the joint TLS and Ratelimit dances.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28357
(cherry picked from commit 1a714ff204193b9eb810426048e03f5d76e9730e)
|
| |
|
|
|
|
|
|
|
| |
Thanks to Tolya Korniltsev for reporting the issue for
the userland stack and testing the fix.
(cherry picked from commit eec6aed5b8c848841ae8d25940e0a333e5039ce9)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functionality to detect a newly created link after processing a
single packet is decoupled from the packet processing. Every new
packet is processed asynchronously and will reset the indicator, hence
the function is unusable. I made a Google search for third party code,
which uses the function, and failed to find one.
That's why the function should be removed: It unusable and unused.
A much simplified API/ABI will remain in anything below 14.
Discussed with: kp
Reviewed by: manpages (bcr)
Differential Revision: https://reviews.freebsd.org/D30275
(cherry picked from commit bfd41ba1fe1d0e40b6a813aeb0354cac8d884f5b)
|
|
|
|
|
|
|
| |
Thanks to Taylor Brandstetter for finding the issue and providing
a patch for the userland stack.
(cherry picked from commit 12dda000ed32efa16f59909a6294e4d4b5a771ba)
|
|
|
|
| |
(cherry picked from commit d1cb8d11b0c09c35b87c144bab7b02b75c5725b6)
|
|
|
|
| |
(cherry picked from commit b621fbb1bf1b2a1e6ea22e0ad2d7667b1aec9fae)
|
|
|
|
|
|
|
| |
If the alternate address has to be removed, force the stack to
find a new one, if it is still needed.
(cherry picked from commit 8b3d0f6439fa27f0d37a9a7b9d27bbfdfdf487c4)
|
|
|
|
|
|
|
| |
This fixes in particular a possible use after free bug reported
Anatoly Korniltsev and Taylor Brandstetter for the userland stack.
(cherry picked from commit a89481d328fd96ccbfa642e1db6d03825fa1dc6d)
|
|
|
|
| |
(cherry picked from commit 655c200cc89185c940bc7d5724be09a0f2e1a8a6)
|
|
|
|
|
|
|
|
| |
When processing INIT and INIT-ACK information, also during
COOKIE processing, delete the current association, when it
would end up in an inconsistent state.
(cherry picked from commit 5f2e1835054ee84f2e68ebc890d92716a91775b7)
|
|
|
|
|
|
| |
This is needed in case of responding with an ABORT to an INIT-ACK.
(cherry picked from commit e010d20032c8c2a04da103b3402a8d24bd682dd5)
|
|
|
|
|
|
|
| |
Ignore spp_pathmtu if it is 0, when setting the IPPROTO_SCTP level
socket option SCTP_PEER_ADDR_PARAMS as required by RFC 6458.
(cherry picked from commit eb79855920ffa33d6c096221eac9cc9a6d7a484b)
|
|
|
|
| |
(cherry picked from commit eecdf5220b1a559e4b58c3c21daf502e3fbfd1cd)
|
|
|
|
|
|
|
| |
Just skip the chunk, if no other handling is required by the
specification.
(cherry picked from commit 9de7354bb8e0c7821aa90db3486605f933c6796d)
|
|
|
|
| |
(cherry picked from commit 059ec2225c00cc18ed9745d733cc9aa0dbd9eaa2)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop further processing of a packet when detecting that it
contains an INIT chunk, which is too small or is not the only
chunk in the packet. Still allow to finish the processing
of chunks before the INIT chunk.
Thanks to Antoly Korniltsev and Taylor Brandstetter for reporting
an issue with the userland stack, which made me aware of this
issue.
(cherry picked from commit c70d1ef15db0d994eff4a2c4d9feabdc46bff1c6)
|
|
|
|
| |
(cherry picked from commit 163153c2a0809d2710e607463dcb24c7f795e156)
|
|
|
|
|
|
| |
Reported by: syzbot+5eb0e009147050056ce9@syzkaller.appspotmail.com
(cherry picked from commit d995cc7e5431873b839269fe22577acfa3b157bd)
|
|
|
|
|
|
|
|
|
|
|
| |
Ensure that the stack does not generate a DSACK block for user
data received on a SYN segment in SYN-SENT state.
Reviewed by: rscheff
Differential Revision: https://reviews.freebsd.org/D29376
Sponsored by: Netflix, Inc.
(cherry picked from commit 40f41ece765dc0b0907ca90796a1af4f4f89b2a0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Free the input mbuf in a single place instead of in every error path.
- Handle PRUS_NOTREADY consistently.
- Flush the socket's send buffer if an implicit connect fails. At that
point the mbuf has already been enqueued but we don't want to keep it
in the send buffer.
Reviewed by: gallatin, tuexen
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 7d2608a5d24ec3534dad7f24191f12a8181ea206)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to commit f161d294b we only checked the sockaddr length, but now
we verify the address family as well. This breaks at least ttcp. Relax
the check to avoid breaking compatibility too much: permit AF_UNSPEC if
the address is INADDR_ANY.
Fixes: f161d294b
Reported by: Bakul Shah <bakul@iitbombay.org>
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f96603b56f0f74fa52d8f1ef0be869fca7305b99)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.
Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.
To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure.
This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but
instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of
big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once
per TCP LRO flush. This gives a minor performance improvement and
keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing
of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch
predictions.
Bump the __FreeBSD_version due to adding new member to the "lro_ctrl" structure.
Tested by: Netflix
Reviewed by: rrs (transport)
Differential Revision: https://reviews.freebsd.org/D29564
Sponsored by: Mellanox Technologies // NVIDIA Networking
(cherry picked from commit 9ca874cf740ee68c5742df8b5f9e20910085c011)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
(cherry picked from commit c358f1857f0c749ad166fb9e9bef04f4033f3a72)
|
|
|
|
|
|
|
|
|
|
|
| |
The field nullAddress in struct libalias is never set and never used.
It exists as a placeholder for an unused argument only.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30253
(cherry picked from commit 189f8eea138a78b09c9f19114b1362b0df1cf87d)
(cherry picked from commit b03a41befeaf17ef25da96fc7bc2dc19c9a6b253)
|