sys/netinet/tcp_stacks/sack_filter.h


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

#ifndef __sack_filter_h__
#define __sack_filter_h__
/*-
 * Copyright (c) 2017-9 Netflix, Inc.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

/**
 *
 * The Sack filter is designed to do two functions, first it trys to reduce
 * the processing of sacks. Consider that often times you get something like
 *
 * ack 1 (sack 100:200)
 * ack 1 (sack 100:300)
 * ack 1 (sack(100:400)
 *
 * You really want to process the 100:200 and then on the next sack process
 * only 200:300 (the new data) and then finally on the third 300:400. The filter
 * removes from your processing routines the already processed sack information so
 * that after the filter completes you only have "new" sacks that you have not
 * processed. This saves computation time so you do not need to worry about
 * previously processed sack information.
 *
 * The second thing that the sack filter does is help protect against malicious
 * attackers that are trying to attack any linked lists (or other data structures) 
 * that are used in sack processing. Consider an attacker sending in sacks for
 * every other byte of data outstanding. This could in theory drastically split
 * up any scoreboard you are maintaining and make you search through a very large
 * linked list (or other structure) eating up CPU. If you split far enough and
 * fracture your data structure enough you could potentially be crippled by a malicious
 * peer. How the filter works here is it filters out sacks that are less than an MSS.
 * We do this because generally a packet (aka MSS) should be kept whole. The only place
 * we allow a smaller SACK is when the SACK touches the end of our socket buffer. This allows
 * TLP to still work properly and yet protects us from splitting. The filter also only allows
 * a set number of splits (defined in SACK_FILTER_BLOCKS). If more than that many sacks locations
 * are being sent we discard additional ones until the earlier holes are filled up. The maximum
 * the current filter can be is 15, which we have moved to since we want to be as generous as
 * possible with allowing for loss. However, in previous testing of the filter it was found
 * that there was very little benefit from moving from 7 to 15 sack points. Though at
 * that previous set of tests, we would just discard earlier information in the filter. Now
 * that we do not do that i.e. discard information and instead drop sack data we have raised
 * the value to the max i.e. 15. If you want to expand beyond 15 one would have to either increase
 * the size of the sf_bits to a uint32_t which could then get you a maximum of 31 splits or
 * move to a true bitstring. If this is done however it further increases your risk to
 * sack attacks, the bigger the number of splits (filter blocks) that are allowed
 * the larger your processing arrays will grow as well as the filter.
 *
 * Note that this protection does not prevent an attacker from asking for a 20 byte
 * MSS, that protection must be done elsewhere during the negotiation of the connection
 * and is done now by just ignoring sack's from connections with too small of MSS which
 * prevents sack from working and thus makes the connection less efficient but protects
 * the system from harm.
 *
 * We may actually want to consider dropping the size of the array back to 7 to further
 * protect the system which would be a more cautious approach.
 *
 * TCP Developer information:
 *
 * To use the sack filter its actually pretty simple. All you do is the normal sorting
 * and sanity checks of your sacks but then after that you call out to sack_filter_blks()
 * passing in the tcpcb, the sack-filter you are using (memory you have allocated) the
 * pointer to the sackblk array and how many sorted valid blocks there are as well
 * as what the new th_ack point is. The filter will return to you the number of
 * blocks left after filtering. It will reshape the blocks based on the previous
 * sacks you have received and processed. If sack_filter_blks() returns 0 then no
 * new sack data is present to be processed.
 *
 * Whenever you reach the point of snd_una == snd_max, you should call sack_filter_clear with
 * the snd_una point. You also need to call this if you invalidate your sack array for any
 * reason (such as RTO's or MTU changes or some other thing that makes you think all
 * data is now un-acknowledged). You can also pass in sack_filter_blks(tp, sf, NULL, 0, th_ack) to
 * advance the cum-ack point. You can use sack_filter_blks_used(sf) to determine if you have filter blocks as
 * well. So putting these two together, anytime the cum-ack moves forward you probably want to
 * do:
 * if (sack_filter_blks_used(sf))
 *    sack_filter_blks(tp, sf, NULL, 0, th_ack);
 *
 * If for some reason you have ran the sack-filter and something goes wrong (you can't allocate space
 * for example to split your sack-array. You can "undo" the data within the sack filter by calling
 * sack_filter_rject(sf, in) passing in the list of blocks to be "removed" from the sack-filter.
 * You can see an example of this use in bbr.c though rack.c has never found it needed.
 *
 */

#define SACK_FILTER_BLOCKS 15

struct sack_filter {
	tcp_seq sf_ack;
	uint16_t sf_bits;
	uint8_t sf_cur;
	uint8_t sf_used;
	struct sackblk sf_blks[SACK_FILTER_BLOCKS];
};
#ifdef _KERNEL
void sack_filter_clear(struct sack_filter *sf, tcp_seq seq);
int sack_filter_blks(struct tcpcb *tp, struct sack_filter *sf, struct sackblk *in, int numblks,
		     tcp_seq th_ack);
void sack_filter_reject(struct sack_filter *sf, struct sackblk *in);
static inline uint8_t sack_filter_blks_used(struct sack_filter *sf)
{
	return (sf->sf_used);
}

#endif
#endif