| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Add struct mtx to struct lltable and stop using IF_AFDATA_LOCK, that
was created for a completely different purpose. No functional change
intended.
Reviewed by: zlei, melifaro
Differential Revision: https://reviews.freebsd.org/D54086
|
| |
|
|
|
|
|
| |
It is not clear what exactly this function is locking against. Seems
like just use some generic interface lock. The IF_AFDATA_LOCK goes
away soon together with if_afdata[], so put at least something in its
place. Note that this code is dead anyway (#ifdef EXPERIMENTAL).
|
| |
|
|
|
|
|
| |
It is not clear what exactly this function is locking against. Seems
like just use some generic interface lock. The IF_AFDATA_LOCK goes
away soon together with if_afdata[], so put at least something in its
place.
|
| |
|
|
|
|
| |
It is a remnant of a network stack design that was supposed to support
multiple network protocols. Today it is clear that we are left with IPv4
and IPv6 only. Only IPv6 may have an MTU different to the interface MTU.
|
| | |
|
| |
|
|
|
|
|
|
| |
These were for $FreeBSD$ that was removed a while ago, but these
includes didn't get swept up in that. Remove them all now.
Sponsored by: Netflix
MFC After: 2 weeks
|
| |
|
|
|
|
|
|
|
|
|
| |
We use the fact that all NICs that support hashing are using the
same hash algorithm and hash key to enable symmetic hashing in
TCP, where a software version of the same hash is used to
establish hashes on outgoing connections.
Sponsored by: Netflix
Reviewed by: adrian, zlei (both early version)
Differential Revision: https://reviews.freebsd.org/D53089
|
| |
|
|
|
|
|
| |
No functional change intended, suggested by glebius.
Reviewed by: rscheff, zlei, tuexen
Differential Revision: https://reviews.freebsd.org/D53739
|
| |
|
|
|
|
|
|
|
|
| |
After commit 530c2c30b0c7 we need to set flags to ensure that hop-by-hop
and hop limit options are included.
PR: 290407
Reviewed by: zlei, markj
MFC after: 3 days
Fixes: 530c2c30b0c7 ("ip6_output: Reduce cache misses on pktopts")
|
| |
|
|
| |
MFC after: 1 week
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Save prefix length in unused field in6_ifaddr->ia_plen, then on remove
check if an address has 128 prefix length, and if so, we don't need to
complain that there is none of related prefixes.
Reviewed by: kp
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D52952
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace counter(9) usage with more lightweight atomic(9) in the
code handling RFC 7217 SLAAC address generation.
Also, use `u_int` types with this. Leaving `dad_failures` local to
`in6_get_stableifid()` as a `uint64_t` to avoid changing the generated
addresses from previous code; this also gives some headroom for
future changes.
While here, moved some `#include` lines to adhere to style(9).
Reviewed by: glebius, jhibbits, jtl, zlei
Approved by: glebius, jtl, zlei
Differential Revision: https://reviews.freebsd.org/D52731
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* use ND_NA_FLAG_ROUTER flag in carp_send_na() when we work as router.
* use in6addr_any as destination address for nd6_na_output(), then it
will use ipv6-all-nodes multicast address.
* add in6_selectsrc_nbr() function that accepts additional argument
ip6_moptions. Use this function from ND6 code to avoid cases when
nd6_na_output/nd6_ns_output can not find source address for
multicast destinations.
* add some comments from RFC2461 for better understanding.
* use tlladdr argument as flags and use ND6_NA_OPT_LLA when we need
to add target link-layer address option, and ND6_NA_CARP_MASTER when
we know that target address is CARP master. Then we can prepare
correct CARP's mac address if target address is CARP master.
* move blocks of code where multicast options is initialized and
use it when destination address is multicast.
Reviewed by: kp
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D52825
|
| |
|
|
|
|
|
|
|
| |
The routine allocates the wrong size and then passes it to in6_get_ifid.
At the same time it violates invariants by issuing malloc with M_WAITOK
while within net epoch section.
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 31ec8b6407fdd5a87d70265762457c67ce618283 added a `dad_failures`
variable to `struct nd_ifinfo`, which broke the netowrking ABI.
This commit fixes it by moving such variable to `struct in6_ifextra`
which is not a public interface, while `struct nd_ifinfo` is back
in its original state.
Thanks to kib, markj and glebious for their help and suggestions
in solving this problem.
Reported by: "Herbert J. Skuhra" <herbert@gojira.at>
Tested by: "Herbert J. Skuhra" <herbert@gojira.at>
Approved by: glebius
Fixes: 31ec8b6407fdd5a87d70265762457c67ce618283
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement RFC 7217 (A Method for Generating Semantically Opaque
Interface Identifiers with IPv6 Stateless Address Autoconfiguration
(SLAAC)) in our IPv6 stack.
A new ifconfig `stableaddr` flag is added to enable the feature on
interfaces, which defaults to on or off for new interfaces based
on the sysctl `net.inet6.ip6.use_stableaddr` (off by default, so
this commit causes no change in behavior with default settings).
The algorithm follows the RFC in its logic, using SHA256-HMAC as
the algorithm to derive addresses so as to provide code that can
be leveraged by future implentations of RFC 8981, leveraging the
`hostuuid` as the secret.
The source of the hostidentifier can be configured using the sysctl
`net.inet6.ip6.stableaddr_netifsource`, while the number of retries
generating a new address in case of collision can be configured
using the `net.inet6.ip6.stableaddr_maxretries` sysctl (default 3).
Documentation about all these flags is added to the ifconfig(8) man
page.
Reviewed by: cognet, glebius, hrs
Tested by: zarychtam@plan-b.pwste.edu.pl
Approved by: cognet, glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D49681
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the macros used '>' instead of '>=' when comparing elapsed
time against the preferred and valid lifetimes. This caused any deprecated
address to become usable again for one extra second after receiving each
Router Advertisement. In that short window, the address could be
selected as a source for outgoing connections.
Update the checks to use '>=' so that addresses are deprecated or
invalid when their lifetime expires.
PR: 289177
Reported by: Dmitry Nexus <fbsd.4f6a at nexus tel>
Reviewed by: zlei
Submitted by: Marek Zarychta
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D52323
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for obtaining timestamps from IPv6 packets using the
SO_BINTIME socket option, bringing it in parity with IPv4 behavior.
Enable testing the SO_BINTIME option in the relevant (manual) regression
test.
PR: 289423
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D52504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in6_ifadd() asserts that an interface has an existing LL address with a /64
prefix from which to extract the ifid for SLAAC address selection (even though
the comments suggest that an ifid will be generated if one does not exist). This
is adequate for most generic cases, however to support PPP links with /128 LL
addresses we must be able to fall back on another source for the ifid since we
cannot assume the /128 LL has a unique ifid in the lower 64 bits.
To do this, the static function get_ifid() in in6_ifattach.c is renamed to
non-static in6_get_ifid(), and this is used in lieu of a proper /64 LL address
to attempt to obtain a valid ifid.
Reviewed by kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D51778
|
| |
|
|
|
|
|
|
|
|
|
| |
When adding an interface with an IP address to a bridge, or assigning an
IP address to an interface which is in a bridge, and member_ifaddrs=1,
print a warning so users are informed this is deprecated. Also add
"(deprecated)" to the sysctl description.
MFC after: 9 hours
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D52335
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
While diagnosing PR 279653 and PR 285129, I observed that thread may
write to freed memory but the system does not crash. This hides the
real problem. A clear NULL pointer derefence is much better than writing
to freed memory.
PR: 279653
PR: 285129
Reviewed by: glebius
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D49444
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and fix assigning IP addresses to the gif(4) interface when it is a
member of a if_bridge(4) interface.
When setting the sysctl net.link.bridge.member_ifaddrs to 1, if_bridge(4)
can eliminate unnecessary walk of the member list to determine whether
the inbound unicast packets are for us or not.
Well when a gif(4) interface is member of a if_bridge(4) interface, it
acts as the tunnel endpoint to tunnel Ethernet frames over IP network,
aka the EtherIP protocol, so the IP addresses configured on it are
independent of the if_bridge(4) interface or other if_bridge(4) members,
hence the sysctl net.link.bridge.member_ifaddrs should not have any
influnce over gif(4) interfaces's behavior of assigning IP addresses.
PR: 227450
Reported by: Siva Mahadevan <me@svmhdvn.name>
Reviewed by: ivy, #bridge
MFC after: 1 week
Fixes: 0a1294f6c610 bridge: allow IP addresses on members to be disabled
Differential Revision: https://reviews.freebsd.org/D52200
|
| |
|
|
|
|
| |
- s/datgram/datagram/
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RFC 2460 section 5 paragraph 7 allowed a Packet Too Big message
to report a Next-Hop MTU less than 1280 in support of 6-to-4 routers.
A node receiving such a message was required to add a Fragment
Header to outgoing packets, even though they were not fragmented.
Almost 20 years later, RFC 8200 was published. It obsoletes RFC 2460
and removes that paragraph. UNH IOL Intact was updated to test for
compliance with the new standard.
Remove code supporting that obsolete paragraph.
Test cases v6LC_4_1_06a and 06b failed before this change, saying:
DUT processed PTB and sent a fragmented echo reply
Those two test cases now pass:
DUT did not process PTB and sent un-fragmented echo reply
All PMTU test cases pass except v6LC_4_1_08. It fails because we
ignore the MTU in RAs.
Reviewed by: tuexen
MFC After: 1 month
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D51835
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the SCTP, TCP, or UDP implementation send a packet, it does not
compute the corresponding checksum but defers that. The network layer
will determine whether the network interface selected for the packet
has the requested capability and computes the checksum in software,
if the selected network interface doesn't have the requested
capability.
Do this not only for packets being sent by the local SCTP, TCP,
and UDP stack, but also when forwarding packets. Furthermore, when
such packets are delivered to a local SCTP, TCP, or UDP stack, do not
compute or validate the checksum, since such packets never have been on
the wire.
This allows to support checksum offloading also in the case of local
virtual machines or jails.
Support for epair, vtnet, and tap interfaces will be added in
separate commits.
Reviewed by: kp, rgrimes, tuexen, manpages
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D51475
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are scenarios where we can end up looking up an interface by its scope and
turn up an interface that doesn't have IPv6 enabled on it. If that happens we
could end up dereferencing a NULL pointer accessing ifp->if_afdata[AF_INET6].
Check for this.
One such scenario is if a firewall rewrites a destination address to a
link-local address, with an embedded scope for such an interface. Attach a test
case which provokes this.
PR: 288263
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D51500
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the socket has a tunneling function attached, udp_append() drops
the inpcb lock before calling it. To keep the inpcb alive, we bump the
refcount. After commit 742e7210d00b we only dropped the reference if
the tunnel consumed the packet, but it needs to be dropped in either
case. if_ovpn is the only driver that can trigger this bug.
Fixes: 742e7210d00b ("udp: allow udp_tun_func_t() to indicate it did not eat the packet")
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51505
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch was originally written by hrs [1], and later modified by meta
to use named flags instead of generic link-layer flags.
[1] https://reviews.freebsd.org/D45854
PR: 280736
Co-authored-by: Hiroki Sato <hrs@FreeBSD.org>
Reviewed by: ae, ziaee, zlei, pauamma
Reported by: Kazuki Shimizu <kazubu@jtime.net>
Approved by: pauamma (manpages)
Approved by: ae
MFC after: 2 weeks
Sponsored by: Cybertrust Japan
Differential Revision: https://reviews.freebsd.org/D51297
|
| |
|
|
|
|
|
|
|
|
|
| |
Allow net.inet6.mld.use_allow, net.inet6.mld.v2enable and
net.inet6.mld.v1enable to be set per vnet.
While here convert them to booleans.
Reviewed by: glebius, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D51409
|
| |
|
|
|
|
|
|
|
|
|
| |
Raw sockets have a separate check for this in rip6_bind() that was
missed in the previous change. This fixes e.g. 'ping -S' using an
anycast address.
Fixes: ca4b046105f6 ("netinet6: allow binding to anycast addresses")
Reviewed by: tuexen, kevans, des (previous version)
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D50438
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Teach counter_rate() to deal with periods of more than 1 second, so we can
express 'at most 100 in 10 seconds', which is different from 'at most 10 in
1 second'.
While here move the struct counter_rate definition into subr_counter.c so users
cannot mess with its internals. Add allocation and free functions.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D50796
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate the use of MD5 as the algorithm for generating temporary
interface identifiers (IIDs) for IPv6 addresses, improving cryptographic
robustness.
Introduce per-address randomized IIDs, ensuring that each temporary
address uses a distinct interface identifier to enhance privacy and
avoid correlation across addresses.
Update the IID generation logic to respect the Reserved IPv6 Interface
Identifiers list.
Enhance sysctl_ip6_temppltime() so that ip6_temp_max_desync_factor is
dynamically recalculated whenever ip6_temp_preferred_lifetime is updated
via sysctl. This ensures that MAX_DESYNC_FACTOR remains approximately
1/32 of the preferred lifetime plus 10 minutes. DESYNC_FACTOR is also
regenerated after each update.
Timers related to temporary address regeneration were updated to match
the design recommendations in RFC 8981.
A new read-only sysctl variable net.inet6.ip6.temp_max_desync_factor
is introduced to expose the computed value of MAX_DESYNC_FACTOR to
userland for observability and debugging.
Input validation to reject temppltime values too small or too large is
included.
This all brings the temporary address lifetime handling closer to the
intended design in RFC 8981 and improves robustness against
misconfiguration.
PR: 245103
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D50108
|
| |
|
|
|
|
|
|
| |
Initial implementation of SLAAC temporary address extensions back when
they were still draft-ietf-6man-rfc4941bis.
PR: 245103
MFC after: 1 month
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The sysctl-variable net.inet.udp.blackhole_local should affect
UDP packets from an IPv6 address of the local host, not of a host on
the local area network.
Thanks to cc@ for pointing me to the issue.
Reviewed by: cc
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D50829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag was introduced at 8036234c72c9361711e867cc1a0c6a7fe0babd84
to prevent the SIOCSPFXFLUSH_IN6 ioctl from removing manually-added
entries. However, this flag did actually not work due to an
incomplete implementation making prelist_update() not handle it before
calling nd6_prelist_add().
This patch removes the flag because a prefix is derived from an RA
always has an entry in the ndpr_advrtrs member in the struct
nd_prefix. Having a separate flag is not a good idea because it can
cause a mismatch between the flag and the ndpr_advrtrs entry. Testing
using LIST_EMPTY() is simpler for the origial goal.
This also removes in a prefix check in the ICMPV6CTL_ND6_PRLIST sysctl
to exclude manually-added entries. This ioctl is designed to list all
entries, and there is no relationship to SIOCSPFXFLUSH_IN6.
Differential Revision: https://reviews.freebsd.org/D46441
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Switch to using sys/stdarg.h for va_list type and va_* builtins.
Make an attempt to insert the include in a sensible place. Where
style(9) was followed this is easy, where it was ignored, aim for the
first block of sys/*.h headers and don't get too fussy or try to fix
other style bugs.
Reviewed by: imp
Exp-run by: antoine (PR 286274)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We release the reference to the in6_ifaddr but retain a pointer to it.
Copy the address itself, rather than keeping the pointer to fix this.
The previous version was actually safe, because ifa_free() uses an epoch
callback to free it, so the pointer would have remained valid as long as we are
in net_epoch.
Change it to copying the address anyway because it is more obviously correct and
will remain correct even if ifa_free() changes later.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D50460
|
| |
|
|
|
|
|
|
|
|
| |
In icmp6_redirect_output() we potentially add padding, but failed to clear this
memory. This triggered a KMSAN panic during the sys/netinet/carp:unicast_v6
test.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D50461
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
and its setter in6_setmaxmtu().
This variable was introduced by the KAME projec [1]. It holds the max
IPv6 MTU through all the interfaces, but is never used anywhere.
[1] 82cd038d51e2 KAME netinet6 basic part(no IPsec,no V6 Multicast
Forwarding, no UDP/TCP for IPv6 yet)
Reviewed by: glebius
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D49357
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add a new sysctl, net.link.bridge.member_ifaddrs, which defaults to 1.
if it is set to 1, bridge behaviour is unchanged.
if it is set to 0:
- an interface which has AF_INET6 or AF_INET addresses assigned cannot
be added to a bridge.
- an interface in a bridge cannot have an AF_INET6 or AF_INET address
assigned to it.
- the bridge will no longer consider the lladdrs on bridge members to be
local addresses, i.e. frames sent to member lladdrs will not be
processed by the host.
update bridge.4 to document this behaviour, as well as the existing
recommendation that IP addresses should not be configured on bridge
members anyway, even if it currently partially works.
in testing, setting this to 0 on a bridge with 50 member interfaces
improved throughput by 22% (4.61Gb/s -> 5.67Gb/s) across two member
epairs due to eliding the bridge member list walk in GRAB_OUR_PACKETS.
Reviewed by: kp, des
Approved by: des (mentor)
Differential Revision: https://reviews.freebsd.org/D49995
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the restriction on sending packets from anycast source addresses was
removed in RFC4291, so there's no reason to forbid binding to such
addresses. this allows anycast services (e.g., DNS) to actually use
anycast addresses, which was previously impossible.
RFC4291 also removes the restriction that only routers may configure
anycast addresses; this was never enforced in code but was documented in
ifconfig.8. update ifconfig.8 to document both changes.
PR: 285545
Reviewed by: des, adrian
Approved by: des (mentor)
Differential Revision: https://reviews.freebsd.org/D49905
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As in f7174eb2b4c4 ("netinet: Do not forward or ICMP response to
INADDR_ANY"), the IPv6 stack should avoid sending packets to the
unspecified address. In particular:
- Make sure that we do not forward received packets to the unspecified
address; the check in ip6_input() catches this in the common case, but
after commit 40faf87894ff it's possible for a pfil hook to bypass this
check and pass the packet to ip6_forward() using the
PACKET_TAG_IPFORWARD tag.
- Make sure that we do not reflect packets back to the unspecified
address; RFC 4443 section 2.4 states that we must not generate error
messages in response to packets from the unspecified address.
Reviewed by: zlei, glebius
Reported by: Franco Fichtner <franco@opnsense.org>
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: OPNsense
Differential Revision: https://reviews.freebsd.org/D49339
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UDP over IPv6 was not leaving space for link headers,
resulting in the ethernet header being placed in its own mbuf
at the front of the mbuf chain sent down to the NIC driver.
This is inefficient, in terms of allocating 2x as many
header mbufs as needed, and its also confusing for drivers
which may expect to find ether/ip/l4 headers together in the same
mbuf.
Reviewed by: glebius, rrs, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D49840
This is a port of e6ccd7093618, which was done by Robert
Watson in 2004 for IP4
|
| |
|
|
|
|
|
|
| |
we have to use 'goto out' here rather than 'break' because otherwise
error is set to 0, which means the error is not propagated back to the
caller.
Reviewed by: kp
|
| |
|
|
|
|
|
|
|
|
|
| |
These routines were all assuming that the sysctl handler has some new
value, but this is not the case. SYSCTL_IN() returns 0 in this
scenario, so they were all operating on an uninitialized address. This
is mostly harmless, but trips KMSAN checks, so let's fix them.
Reviewed by: zlei, rrs, glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49348
|
| |
|
|
| |
CID: 1593687
|
| |
|
|
|
|
| |
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first level the database hashes the port value only to calculate index
into array of pointers to lazily allocated headers that hold lists of
inpcbs with the same local port. This design apparently was made to
preserve kernel memory.
In the modern kernel size of the first level of the hash is derived from
maxsockets, which is derived from maxfiles, which in its turn is derived
from amount of physical memory. Then the size of the hash is capped by
IPPORT_MAX, cause it doesn't make any sense to have hash table larger then
the set of possible values. In practice this cap works even on my laptop.
I haven't done precise calculation or experiments, but my guess is that
any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash.
Apparently, this hash is a degenerate one: it never has more than one
entries in any slot. You can check this with kgdb:
set $i = 0
while ($i <= tcbinfo->ipi_porthashmask)
set $p = tcbinfo->ipi_porthashbase[$i].clh_first
set $c = 0
while ($p != 0)
set $c = $c + 1
set $p = $p->phd_hash.cle_next
end
if ($c > 1)
printf "Slot %u count %u", $i, $c
end
set $i = $i + 1
end
Retiring the two level hash we remove a lot of complexity at the cost of
only one comparison 'inp->inp_lport != lport' in the lookup cycle, which
is going to be always false on most machines anyway. This comparison
definitely shall be cheaper than extra pointer traversal.
Another positive change to be singled out is that now we no longer need to
allocate memory in non-sleepable context in in_pcbinshash(), so a
potential ENOMEM on connect(2) is removed.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49151
|
| |
|
|
|
|
|
| |
This is required for kernel CFI.
Reviewed by: rrs, jhb, glebius
Differential Revision: https://reviews.freebsd.org/D49111
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes ABI due to the changed opcodes and includes the
following:
* rule numbers and named object indexes converted to 32-bits
* all hardcoded maximum rule number was replaced with
IPFW_DEFAULT_RULE macro
* now it is possible to grow maximum numbers or rules in
build time
* several opcodes converted to ipfw_insn_u32 to keep rulenum:
O_CALL, O_SKIPTO
* call stack modified to keep u32 rulenum. The behaviour of
O_CALL opcode was changed to avoid possible packets looping.
Now when call stack is overflowed or mbuf tag allocation
failed, a packet will be dropped instead of skipping to next
rule.
* 'return' action now have two modes to specify return point:
'next-rulenum' and 'next-rule'
* new lookup key added for O_IP_DST_LOOKUP opcode 'lookup rulenum'
* several opcodes converted to keep u32 named object indexes
in special structure ipfw_insn_kidx
* tables related opcodes modified to use two structures:
ipfw_insn_kidx and ipfw_insn_table
* added ability for table value matching for specific value type
in 'table(name,valtype=value)' opcode
* dynamic states and eaction code converted to use u32 rulenum
and named objects indexes
* added insntod() and insntoc() macros to cast to specific
ipfw instruction type
* default sockopt version was changed to IP_FW3_OPVER=1
* FreeBSD 7-11 rule format support was removed
* added ability to generate special rtsock messages via log opcode
* added IP_FW_SKIPTO_CACHE sockopt to enable/disable skipto cache.
It helps to reduce overhead when many rules are modified in batch.
* added ability to keep NAT64LSN states during sets swapping
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D46183
|