src - FreeBSD source tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	iflib: Fix detach of pseudo interfaces	Mark Johnston	2021-02-19	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 38bfc6dee33b we added an IFDI_DETACH() call to iflib_pseudo_deregister() since it looked like it was missing. One is present in the error-handling path of iflib_pseudo_register(). However, the detach actually comes from the DEVICE_DETACH() method for the above-mentioned device_t, so now we're calling IFDI_DETACH() twice when destroying a pseudo interface. Fix the problem by not calling IFDI_DETACH() from the device detach routine. This way we can ensure that iflib de-initialization always happens in a consistent order. It also ensures that you can't do silly things like "devctl detach <pseudo ifnet>", which would previously detach the driver without tearing down the corresponding ifnet. PR: 253541 Reviewed by: erj MFC after: 1 week Fixes: 38bfc6dee33b Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28774
*	Fix arp/ndp deletion broken by 2fe5a79425c7.	Alexander V. Chernikov	2021-02-19	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes in the 2fe5a79425c7 moved dst sockaddr masking from the routing control plane to the rtsock code. It broke arp/ndp deletion. It turns out, arp/ndp perform RTM_GET request first to get an interface index necessary for the deletion. Then they simply stamp the reply with RTF_LLDATA and set the command to RTM_DELETE. As a result, kernel receives request with non-empty RTA_NETMASK and clears RTA_DST host bits before passing the message to the lla code. De facto, the only needed bits are RTA_DST, RTA_GATEWAY and the subset of rtm_flags. With that in mind, fix the interace by clearing RTA_NETMASK for every messages with RTF_LLDATA. While here, cleanup arp/ndp code a bit. MFC after: 1 day Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D28804
*	iflib: Cast the result of iflib_netmap_txq_init() to void.	John Baldwin	2021-02-19	1	-1/+1
\| \| \| \| \| \| \| \|	This fixes a warning from GCC for kernels without netmap since the return value is never used. Reviewed by: vmaffione, erj Differential Revision: https://reviews.freebsd.org/D28598
*	Fix NOINET6 build broken by 2fe5a79425c7.	Alexander V. Chernikov	2021-02-16	1	-0/+8
\| \| \| \|	Reported by: mjg
*	Fix dst/netmask handling in routing socket code.	Alexander V. Chernikov	2021-02-16	1	-6/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Traditionally routing socket code did almost zero checks on the input message except for the most basic size checks. This resulted in the unclear KPI boundary for the routing system code (`rtrequest` and now `rib_action()`) w.r.t message validness. Multiple potential problems and nuances exists: Host bits in RTAX_DST sockaddr. Existing applications do send prefixes with hostbits uncleared. Even `route(8)` does this, as they hope the kernel would do the job of fixing it. Code inside `rib_action()` needs to handle it on its own (see `rt_maskedcopy()` ugly hack). * There are multiple way of adding the host route: it can be DST without netmask or DST with /32(/128) netmask. Also, RTF_HOST has to be set correspondingly. Currently, these 2 options create 2 DIFFERENT routes in the kernel. * no sockaddr length/content checking for the "secondary" fields exists: nothing stops rtsock application to send sockaddr_in with length of 25 (instead of 16). Kernel will accept it, install to RIB as is and propagate to all rtsock consumers, potentially triggering bugs in their code. Same goes for sin_port, sin_zero, etc. The goal of this change is to make rtsock verify all sockaddr and prefix consistency. Said differently, `rib_action()` or internals should NOT require to change any of the sockaddrs supplied by `rt_addrinfo` structure due to incorrectness. To be more specific, this change implements the following: * sockaddr cleanup/validation check is added immediately after getting sockaddrs from rtm. * Per-family dst/netmask checks clears host bits in dst and zeros all dst/netmask "secondary" fields. * The same netmask checking code converts /32(/128) netmasks to "host" route case (NULL netmask, RTF_HOST), removing the dualism. * Instead of allowing ANY "known" sockaddr families (0<..<AF_MAX), allow only actually supported ones (inet, inet6, link). * Automatically convert `sockaddr_sdl` (AF_LINK) gateways to `sockaddr_sdl_short`. Reported by: Guy Yur <guyyur at gmail.com> Reviewed By: donner Differential Revision: https://reviews.freebsd.org/D28668 MFC after: 3 days
*	Add ifa_try_ref() to simplify ifa handling inside epoch.	Alexander V. Chernikov	2021-02-16	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	More and more code migrates from lock-based protection to the NET_EPOCH umbrella. It requires some logic changes, including, notably, refcount handling. When we have an `ifa` pointer and we're running inside epoch we're guaranteed that this pointer will not be freed. However, the following case can still happen: * in thread 1 we drop to 0 refcount for ifa and schedule its deletion. * in thread 2 we use this ifa and reference it * destroy callout kicks in * unhappy user reports bug To address it, new `ifa_try_ref()` function is added, allowing to return failure when we try to reference `ifa` with 0 refcount. Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide cleaner error in such scenarious. Reviewed By: rstone, donner Differential Revision: https://reviews.freebsd.org/D28639 MFC after: 1 week
*	Use iflib_if_init_locked() during media change instead of iflib_init_locked().	Allan Jude	2021-02-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for media changes. iflib_if_init_locked() calls stop then init, so fixes the problem. PR: 253473 MFC after: 3 days Reviewed by: markj Sponsored by: Juniper Networks, Inc., Klara, Inc. Differential Revision: https://reviews.freebsd.org/D28667
*	Remove now-unused RTF_RNH_LOCKED route flag.	Alexander V. Chernikov	2021-02-15	2	-3/+1
\| \| \| \|	MFC after: 1 week
*	Fix ifa refcount leak during route addition.	Alexander V. Chernikov	2021-02-13	1	-4/+2
\| \| \| \| \| \|	Reported by: rstone Reviewed by: rstone MFC after: 1 day
*	Fix various NOINET* builds broken by 145bf6c0af48.	Alexander V. Chernikov	2021-02-12	1	-0/+4
\| \| \| \|	Reported by: mjg, bdragon
*	Fix interface route addition with net/bird.	Alexander V. Chernikov	2021-02-12	1	-24/+26
\| \| \| \| \| \| \| \| \| \|	The case of adding interface route by specifying interface address as the gateway was missed during code refactoring. Re-add it back by copying non-AF_LINK gateway data when RTF_GATEWAY is not set. Reviewed by: donner MFC after: 3 days
*	Fix blackhole/reject routes.	Alexander V. Chernikov	2021-02-11	1	-2/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Traditionally *BSD routing stack required to supply some interface data for blackhole/reject routes. This lead to varieties of hacks in routing daemons when inserting such routes. With the recent routeing stack changes, gateway sockaddr without RTF_GATEWAY started to be treated differently, purely as link identifier. This change broke net/bird, which installs blackhole routes with 127.0.0.1 gateway without RTF_GATEWAY flags. Fix this by automatically constructing necessary gateway data at rtsock level if RTF_REJECT/RTF_BLACKHOLE is set. Reported by: Marek Zarychta <zarychtam at plan-b.pwste.edu.pl> Reviewed by: donner MFC after: 1 week
*	Widen ifnet_detach_sxlock coverage	Kristof Provost	2021-02-11	3	-7/+11
\| \| \| \| \| \| \| \| \|	Widen the ifnet_detach_sxlock to cover the entire vnet sysuninit code. This ensures that we can't end up having the vnet_sysuninit free the UDP pcb while the detach code is running and trying to purge the UDP pcb. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28530
*	Revert "SO_RERROR indicates that receive buffer overflows should be handled ↵	Alexander V. Chernikov	2021-02-08	1	-6/+5
\| \| \| \| \| \| \| \|	as errors." Wrong version of the change was pushed inadvertenly. This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
*	Turn off forgotten multipath debug messages	Alexander V. Chernikov	2021-02-08	1	-1/+0
\| \| \| \| \|	Reported by: mike tancsa<mike at sentex.net> MFC after: 3 days
*	SO_RERROR indicates that receive buffer overflows should be handled as errors.	Alexander V. Chernikov	2021-02-08	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
*	Enable multipath routing by default.	Alexander V. Chernikov	2021-02-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	ROUTE_MPATH was added to the GENERIC kernel in r368648. According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0. Given enough time has passed, this change enables route multipath by default. The goal is to ship FreeBSD 13 with multipath turned on. Reviewed By: donner, olivier MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28423
*	iflib: Free resources in a consistent order during detach	Sai Rajesh Tallamraju	2021-02-01	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Memory and PCI resources are freed with no particular order. This could cause use-after-frees when detaching following a failed attach. For instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but iflib_tqg_detach() attempts to access this array. Similarly, adapter queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to access adapter queues to free PCI resources. MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27634
*	bridge: fix STP roles and protos strings	Jonah Caplan	2021-02-01	1	-6/+6
\| \| \| \| \| \| \| \| \|	Add the missing commas that got lost in e5539fb618cc7. PR: 252532 Reviewd by: kp@, donner@, freqlabs@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28425
*	Use process fib for inet/inet6 fib_algo sysctls.	Alexander V. Chernikov	2021-01-31	1	-2/+2
\| \| \| \| \| \|	This allows to set/query fib algo for non-default fibs. MFC after: 3 days
*	Fix the design problem with delayed algorithm sync.	Alexander V. Chernikov	2021-01-30	3	-42/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, if the immutable algorithm like bsearch or radix_lockless receives rtable update notification, it schedules algorithm rebuild. This rebuild is executed by the callout after ~50 milliseconds. It is possible that a script adding an interface address and than route with the gateway bound to that address will fail. It can happen due to the fact that fib is not updated by the time the route addition request arrives. Fix this by allowing synchronous algorithm rebuilds based on certain conditions. By default, these conditions assume: 1) less than net.route.algo.fib_sync_limit=100 routes 2) routes without gateway. * Move algo instance build entirely under rib WLOCK. Rib lock is only used for control plane (except radix algo, but there are no rebuilds). * Add rib_walk_ext_locked() function to allow RIB iteration with rib lock already held. * Fix rare potential callout use-after-free for fds by binding fd callout to the relevant rib rmlock. In that case, callout_stop() under rib WLOCK guarantees no callout will be executed afterwards. MFC after: 3 days
*	Add rib_subscribe_locked() and rib_unsubsribe_locked() to support	Alexander V. Chernikov	2021-01-30	2	-1/+36
\| \| \| \| \| \| \| \| \| \|	subscriptions during RIB modifications. Add new subscriptions to the beginning of the lists instead of the end. This fixes the situation when new subscription is created int the callback for the existing subscription, leading to the subscription notification handler pick it. MFC after: 3 days
*	Move business logic from rebuild_fd_callout() into rebuild_fd().	Alexander V. Chernikov	2021-01-30	1	-15/+25
\| \| \| \| \| \| \|	This simplifies code a bit and allows for future non-callout callers to request rebuild. MFC after: 3 days
*	Improve fib_algo debug messages.	Alexander V. Chernikov	2021-01-30	1	-18/+44
\| \| \| \| \| \| \| \| \|	* Move per-prefix debug lines under LOG_DEBUG2 * Create fib instance counter to distingush log messages between instances * Add more messages on rebuild reason. MFC after: 3 days
*	Fix multipath support for rib_lookup_info().	Alexander V. Chernikov	2021-01-29	1	-8/+9
\| \| \| \| \| \|	The initial plan was to remove rib_lookup_info() before FreeBSD 13. As several customers are still remaining, fix rib_lookup_info() for the multipath use case.
*	Fix subinterface vlan creation.	Alexander V. Chernikov	2021-01-29	2	-28/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D26436 introduced support for stacked vlans that changed the way vlans are configured. In particular, this change broke setups that have same-number vlans as subinterfaces. Vlan support was initially created assuming "vlanX" semantics. In this paradigm, automatic number assignment supported by cloning (ifconfig vlan create) was a natural fit. When "ifaceX.Y" support was added, allowing to have the same vlan number on multiple devices, cloning code became more complex, as the is no unified "vlan" namespace anymore. Such interfaces got the first spare index from "vlan" cloner. This, in turn, led to the following problem: ifconfig ix0.333 create -> index 1 ifconfig ix0.444 create -> index 2 ifconfig vlan2 create -> allocation failure This change fixes such allocations by using cloning indexes only for "vlanX" interfaces. Reviewed by: hselasky MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27505
*	Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.	Gleb Smirnoff	2021-01-29	3	-7/+7
\| \| \| \| \| \| \| \| \|	Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.
*	This pulls over all the changes that are in the netflix	Randall Stewart	2021-01-28	3	-0/+43
\| \| \| \| \| \| \| \| \|	tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357
*	altq: Fix typo in features sysctl description	Kristof Provost	2021-01-27	1	-1/+1
\| \| \| \|	Reported by: Jose Luis Duran
*	altq: Remove unused arguments from altq_attach()	Kristof Provost	2021-01-25	7	-24/+9
\| \| \| \| \| \| \|	Minor cleanup, no functional change. Reviewed by: donner@ Differential Revision: https://reviews.freebsd.org/D28304
*	Add FEATURE sysctls for ALTQ disciplines	Kristof Provost	2021-01-25	1	-0/+32
\| \| \| \| \| \| \| \|	This will allow userspace to more easily figure out if ALTQ is built into the kernel and what disciplines are supported. Reviewed by: donner@ Differential Revision: https://reviews.freebsd.org/D28302
*	iflib: netmap: move per-packet operation out of fragments loop	Vincenzo Maffione	2021-01-24	1	-6/+7
\| \| \| \|	MFC after: 1 week
*	iflib: netmap: add support for NS_MOREFRAG	Vincenzo Maffione	2021-01-24	1	-24/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NS_MOREFRAG flag can be set in a netmap slot to represent a multi-fragment packet. Only the last fragment of a packet does not have the flag set. On TX rings, the flag may be set by the userspace application. The kernel will look at the flag and use it to properly set up the NIC TX descriptors. On RX rings, the kernel may set the flag if the packet received was split across multiple netmap buffers. The userspace application should look at the flag to know when the packet is complete. Submitted by: rajesh1.kumar_amd.com Reviewed by: vmaffione MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27799
*	iflib: Fix a NULL pointer deref	Andrew Gallatin	2021-01-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	rxd_frag_to_sd() have pf_rv parameter as NULL with the current code. This patch fixes the NULL pointer dereference in that case thus avoiding a possible panic. Submitted by: rajesh1.kumar at amd.com Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D28115
*	Fix panic on vnet creation if fib algo has been set to fixed value.	Alexander V. Chernikov	2021-01-17	1	-15/+14
\| \| \| \|	Make fixed algo property per-VNET instead of global.
*	Create new in6_purgeifaddr() which purges bound ifa prefix if	Alexander V. Chernikov	2021-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it gets unused. Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6 ifaddrs. in6_purgeaddr() does not trrigger prefix removal if number of linked ifas goes to 0, as this is a low-level function. As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but keeps corresponding IPv6 prefixes. Fix this by creating higher-level wrapper which handles unused prefix usecase and use it in if_purgeifaddrs(). Differential revision: https://reviews.freebsd.org/D28128
*	Split rtinit() into multiple functions.	Alexander V. Chernikov	2021-01-16	5	-179/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186
*	Remove redundant rtinit() calls from tuntap.	Alexander V. Chernikov	2021-01-13	1	-31/+1
\| \| \| \| \| \| \| \| \| \| \|	Removed code iterates over if_addrhead and tries to remove routes for each ifa. This is exactly the thing that if_purgeaddrs() do, and if_purgeaddr() is already called in the end. Reviewed by: glebius MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D28106
*	pf: quiet -Wredundant-decls for pf_get_ruleset_number	Ryan Libby	2021-01-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	In e86bddea9fe62d5093a1942cf21950b3c5ca62e5 sys/netpfil/pf/pf.h grew a declaration of pf_get_ruleset_number. Now delete the old declaration from sys/net/pfvar.h. Reviewed by: kp Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D28081
*	Use static initializers for fib algo to shift initialization	Alexander V. Chernikov	2021-01-11	1	-11/+2
\| \| \| \| \| \| \|	to ealier stage. This allows to register modules loaded at boot time. Reported by: olivier
*	netmap: restore hwofs and support it in iflib	Vincenzo Maffione	2021-01-10	1	-13/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib. This patch brings the necessary changes to iflib to enable howfs to allow interface restarts without disrupting netmap applications actively using its rings. After this change, it becomes possible for multiple non-cooperating netmap applications to use non-overlapping subsets of the available netmap rings without clashing with each other. PR: 252453 MFC after: 1 week
*	iflib: fix build failure in case DEV_NETMAP is not defined	Vincenzo Maffione	2021-01-10	1	-0/+2
\| \| \| \| \| \| \|	This addresses the build failure introduced by 3d65fd97e85ab807f3baa62. MFC with: 3d65fd97e85ab807f3baa62
*	iflib: add assert to prevent out-of-bounds array access	Vincenzo Maffione	2021-01-10	1	-5/+4
\| \| \| \| \| \| \| \| \| \|	The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs for each rxqset, and links each struct to a different free list. As a result, it must be isc_nrxqs >= isc_nfl (plus the completion queue, if present). Add an assertion to make this constraint explicit. MFC after: 2 weeks
*	netmap: iflib: enable/disable krings on any interface reinit	Vincenzo Maffione	2021-01-10	1	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 1d238b07d5d4d9660ae0e0, krings are disabled before a reinit cycle triggered by iflib_netmap_register. However, this operation is actually necessary also for any interface reinit triggered by other causes (i.e., ifconfig commands). We achieve this goal by moving the krings enable/disable operation inside iflib_stop() and iflib_init_locked(). Once here, this change also removes some redundant operations from iflib_netmap_register(), that are already performed by iflib_stop(). PR: 252453 MFC after: 1 week
*	netmap: iflib: fix asserts in netmap_fl_refill()	Vincenzo Maffione	2021-01-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When netmap_fl_refill() is called at initialization time (e.g., during netmap_iflib_register()), nic_i must be 0, since the free list is reinitialized. At the end of the refill cycle, nic_i must still be zero, because exactly N descriptors (N is the ring size) are refilled. This patch therefore fixes the assertions to check on nic_i rather than on nm_i. The current netmap_reset() may in fact cause nm_i to be != 0 while the device is resetting: this may happen when multiple non-cooperating processes open different subsets of the available netmap rings. PR: 252518 MFC after: 1 week
*	netmap: iflib: stop krings during interface reset	Vincenzo Maffione	2021-01-09	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When different processes open separate subsets of the available rings of a same netmap interface, a device reset may be performed while one of the processes is actively using some rings (e.g., caused by another process executing a nmport_open()). With this patch, such situation will cause the active process to get a POLLERR, so that it can have a chance to detect the situation. We also guarantee that no process is running a txsync or rxsync (ioctl or poll) while an iflib device reset is in progress. PR: 252453 MFC after: 1 week
*	iflib: ensure that tx interrupts enabled and cleanups	Matt Macy	2021-01-07	2	-32/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Doing a 'dd' over iscsi will reliably cause stalls. Tx cleaning _should_ reliably happen as data is sent. However, currently if the transmit queue fills it will wait until the iflib timer (hz/2) runs. This change causes the the tx taskq thread to be run if there are completed descriptors. While here: - make timer interrupt delay a sysctl - simplify txd_db_check handling - comment on INTR types Background on the change: Initially doorbell updates were minimized by only writing to the register on every fourth packet. If txq_drain would return without writing to the doorbell it scheduled a callout on the next tick to do the doorbell write to ensure that the write otherwise happened "soon". At that time a sysctl was added for users to avoid the potential added latency by simply writing to the doorbell register on every packet. This worked perfectly well for e1000 and ixgbe ... and appeared to work well on ixl. However, as it turned out there was a race to this approach that would lockup the ixl MAC. It was possible for a lower producer index to be written after a higher one. On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial response was to add a lock around doorbell writes - fixing the problem but adding an unacceptable amount of lock contention. The next iteration was to use transmit interrupts to drive delayed doorbell writes. If there were no packets in the queue all doorbell writes would be immediate as the queue started to fill up we could delay doorbell writes further and further. At the start of drain if we've cleaned any packets we know we've moved the state machine along and we write the doorbell (an obvious missing optimization was to skip that doorbell write if db_pending is zero). This change required that tx interrupts be scheduled periodically as opposed to just when the hardware txq was full. However, that just leads to our next problem. Initially dedicated msix vectors were used for both tx and rx. However, it was often possible to use up all available vectors before we set up all the queues we wanted. By having rx and tx share a vector for a given queue we could halve the number of vectors used by a given configuration. The problem here is that with this change only e1000 passed the necessary value to have the fast interrupt drive tx when appropriate. Reported by: mav@ Tested by: mav@ Reviewed by: gallatin@ MFC after: 1 month Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D27683
*	Refactor rt_addrmsg() and rt_routemsg().	Alexander V. Chernikov	2021-01-07	6	-22/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011
*	pf: Convert pfi_kkif to use counter_u64	Kristof Provost	2021-01-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve caching behaviour by using counter_u64 rather than variables shared between cores. The result of converting all counters to counter(9) (i.e. this full patch series) is a significant improvement in throughput. As tested by olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see: x FreeBSD 20201223: inet packets-per-second + FreeBSD 20201223 with pf patches: inet packets-per-second +--------------------------------------------------------------------------+ \| + \| \| xx + \| \|xxx +++\| \|\|A\| \| \| \|A\|\| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 9216962 9526356 9343902 9371057.6 116720.36 + 5 19427190 19698400 19502922 19546509 109084.92 Difference at 95.0% confidence 1.01755e+07 +/- 164756 108.584% +/- 2.9359% (Student's t, pooled s = 112967) Reviewed by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27763
*	pf: Allocate and free pfi_kkif in separate functions	Kristof Provost	2021-01-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Factor out allocating and freeing pfi_kkif structures. This will be useful when we change the counters to be counter_u64, so we don't have to deal with that complexity in the multiple locations where we allocate pfi_kkif structures. No functional change. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27762