aboutsummaryrefslogtreecommitdiff
path: root/sys/net/iflib.c
Commit message (Collapse)AuthorAgeFilesLines
* iflib: netmap: honor netmap_irx_irq return valuesVincenzo Maffione2020-06-091-6/+8
| | | | | | | | | | | | | | | | | | | | | In the receive interrupt routine, always call netmap_rx_irq(). The latter function will return != NM_IRQ_PASS if netmap is not active on that specific receive queue, so that the driver can go on with iflib_rxeof(). Note that netmap supports partial opening, where only a subset of the RX or TX rings can be open in netmap mode. Checking the IFCAP_NETMAP flag is not enough to make sure that the queue is indeed in netmap mode. Moreover, in case netmap_rx_irq() returns NM_IRQ_RESCHED, it means that netmap expects the driver to call netmap_rx_irq() again as soon as possible. Currently, this may happen when the device is attached to a VALE switch. Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25167 Notes: svn path=/head/; revision=361982
* Fix panics when using iflib pseudo device supportMatt Macy2020-05-311-1/+3
| | | | | | | | | | Reviewed by: gallatin@, hselasky@ MFC after: 1 week Sponsored by: Netgate, Inc. Differential Revision: https://reviews.freebsd.org/D23710 Notes: svn path=/head/; revision=361665
* Increase the iflib txq callout mutex name length to 32 bytes.Mark Johnston2020-04-301-1/+1
| | | | | | | | | | | | With a length of 16, the name ("<if name>:TX(<qid>):callout") typically gets truncated. PR: 245712 Reported by: ghuckriede@blackberry.com MFC after: 1 week Notes: svn path=/head/; revision=360498
* iflib: Stop interface before (un)registering VLANEric Joyner2020-04-271-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is intended to solve a specific problem that iavf(4) encounters, but what it does can be extended to solve other issues. To summarize the iavf(4) issue, if the PF driver configures VLAN anti-spoof, then the VF driver needs to make sure no untagged traffic is sent if a VLAN is configured, and vice-versa. This can be an issue when a VLAN is being registered or unregistered, e.g. when a packet may be on the ring with a VLAN in it, but the VLANs are being unregistered. This can cause that tagged packet to go out and cause an MDD event. To fix this, include a new interface-dependent function that drivers can implement named IFDI_NEEDS_RESTART(). Right now, this function is called in iflib_vlan_unregister/register() to determine whether the interface needs to be stopped and started when a VLAN is registered or unregistered. The default return value of IFDI_NEEDS_RESTART() is true, so this fixes the MDD problem that iavf(4) encounters, since the interface rings are flushed during a stop/init. A future change to iavf(4) will implement that function just in case the default value changes, and to make it explicit that this interface reset is required when a VLAN is added or removed. Reviewed by: gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22086 Notes: svn path=/head/; revision=360398
* Simplify taskqgroup inititialization.Mark Johnston2020-03-301-44/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | taskqgroup initialization was broken into two steps: 1. allocate the taskqgroup structure, at SI_SUB_TASKQ; 2. initialize taskqueues, start taskqueue threads, enqueue "binder" tasks to bind threads to specific CPUs, at SI_SUB_SMP. Step 2 tries to handle the case where tasks have already been attached to a queue, by migrating them to their intended queue. In particular, tasks can't be enqueued before step 2 has completed. This breaks NFS mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP is not defined, since mountroot happens before SI_SUB_SMP in this case. Simplify initialization: do all initialization except for CPU binding at SI_SUB_TASKQ. This means that until CPU binding is completed, group tasks may be executed on a CPU other than that to which they were bound, but this should not be a problem for existing users of the taskqgroup KPIs. Reported by: sbruno Tested by: bdragon, sbruno MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24188 Notes: svn path=/head/; revision=359436
* iflib: simplify MPASS assertionEd Maste2020-03-241-7/+1
| | | | | | | Submitted by: andrew Notes: svn path=/head/; revision=359274
* iflib: split compound assertionEd Maste2020-03-241-1/+2
| | | | | | | | | ThunderX cluster systems are panicking on boot with a failed assertion MPASS(gtask != NULL && gtask->gt_taskqueue != NULL). Split the assertion so that it's clear which part is failing. Notes: svn path=/head/; revision=359273
* Remove extraneous code from iflibPatrick Kelsey2020-03-141-3/+0
| | | | | | | | | | | | ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is just dead code. Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23951 Notes: svn path=/head/; revision=359002
* Remove refill budget from iflibPatrick Kelsey2020-03-141-4/+4
| | | | | | | | | Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23948 Notes: svn path=/head/; revision=358999
* Allow iflib drivers to specify the buffer size used for each receive queuePatrick Kelsey2020-03-141-5/+28
| | | | | | | | | Reviewed by: erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23947 Notes: svn path=/head/; revision=358998
* Remove freelist contiguous-indexes assertion from rxd_frag_to_sd()Patrick Kelsey2020-03-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | The vmx driver is an example of an iflib driver that might report packets using non-contiguous descriptors (with unused descriptors either between received packets or between the fragments of a received packet), so this assertion needs to be removed. For such drivers, the freelist producer and consumer indexes don't relate directly to driver ring slots (the driver deals directly with freelist buffer indexes supplied by iflib during refill, and reports them with each fragment during packet reception), but do continue to be used by iflib for accounting, such as determining the number of ring slots that are refillable. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23946 Notes: svn path=/head/; revision=358997
* Fix iflib zero-length fragment handlingPatrick Kelsey2020-03-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | The dmamap for zero-length fragments should not be unloaded, as doing so breaks the the cluster-reuse logic in _iflib_fl_refill(). All zero-length fragments are now handled by the assemble_segments() path so that the cluster-reuse logic there does not have to be replicated in the small-single-fragment-packet path of iflib_rxd_pkt_get(). Packets consisting entirely of zero-length fragments (which result in a NULL mbuf pointer) are now properly tolerated. This allows drivers (such as the vmx driver) to pass such packets to iflib when a descriptor error occurs during packet reception, the advantage being that the refill of descriptors associated with the error packet are handled via the existing iflib machinery without having to duplicate parts of that machinery in the driver to handle that error case. Reviewed by: avg, erj, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23945 Notes: svn path=/head/; revision=358996
* Fix iflib freelist state corruptionPatrick Kelsey2020-03-141-1/+3
| | | | | | | | | | | | | | This fixes a bug in iflib freelist management that breaks the required correspondence between freelist indexes and driver ring slots. PR: 243126, 243392, 240628 Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer Reviewed by: avg, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23943 Notes: svn path=/head/; revision=358995
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-15/+16
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Although most of the NIC drivers are epoch ready, due to peer pressureGleb Smirnoff2020-02-241-1/+2
| | | | | | | | | | | | | | | | | | | switch over to opt-in instead of opt-out for epoch. Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch when processing its packets. Now this will create recursive entrance in epoch in >90% network drivers, but will guarantee safeness of the transition. Mark several tested drivers as IFF_KNOWSEPOCH. Reviewed by: hselasky, jeff, bz, gallatin Differential Revision: https://reviews.freebsd.org/D23674 Notes: svn path=/head/; revision=358301
* Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that processHans Petter Selasky2020-02-121-4/+4
| | | | | | | | | | | | incoming packets in taskqueue context. This patch extends r357772. Tested by: yp@mm.st Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=357800
* Make sure the so-called end of receive interrupts don't starve in iflib.Hans Petter Selasky2020-02-121-17/+42
| | | | | | | | | | | | | | | When the receive ring cannot be filled with mbufs, due to lack of memory, no more interrupts may be generated to fill the receive ring later on. Make sure to have a watchdog, to try refilling the receive ring from time to time, hopefully when more mbufs are available. Differential Revision: https://reviews.freebsd.org/D23315 MFC after: 1 week Reviewed by: gallatin@ Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=357799
* Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that processGleb Smirnoff2020-02-111-5/+2
| | | | | | | | | | incoming packets in taskqueue context. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D23518 Notes: svn path=/head/; revision=357772
* Enter network epoch in iflib rxeof task.Gleb Smirnoff2020-01-231-0/+8
| | | | | | | | | In upcoming changes ether_input() is going to be changed not to enter the network epoch. It is going to be responsibility of network interrupt. In case of iflib - its taskqueue. Notes: svn path=/head/; revision=357006
* iflib: Prevent watchdog from resetting idle queuesEric Joyner2020-01-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | While changing link state in iflib_link_state_change(), queues are marked as IFLIB_QUEUE_IDLE to disable watchdog. Currently, iflib_timer() watchdog does not check for previous queue status before marking it as IFLIB_QUEUE_HUNG. This patch adds check of queue status before marking it as hung. Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> PR: 239240 Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Reported by: ultima@ Reviewed by: gallatin@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21712 Notes: svn path=/head/; revision=356310
* iflib: properly release memory allocated for DMAEric Joyner2019-11-041-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DMA memory allocations using the bus_dma.h interface are not properly released in all cases for both Tx and Rx. This causes ~448 bytes of M_DEVBUF allocations to be leaked. First, the DMA maps for Rx are not properly destroyed. A slight attempt is made in iflib_fl_bufs_free to destroy the maps if we're detaching. However, this function may not be reliably called during detach. Indeed, there is a comment "asking" if this should be moved out. Fix this by moving the bus_dmamap_destroy call into iflib_rx_sds_free, where we already sync and unload the DMA. Second, the DMA tag associated with the ifr_ifdi descriptor DMA is not released properly anywhere. Add a call to iflib_dma_free in iflib_rx_structures_free. Third, use of NULL as a canary value on the map pointer returned by bus_dmamap_create is not valid. On some platforms, notably x86, this value may be NULL. In this case, we fail to properly release the related resources. Remove the NULL checks on map values in both iflib_fl_bufs_free and iflib_txsd_destroy. With all of these fixes applied, the leaks to M_DEVBUF are squelched, and iflib drivers now seem to properly cleanup when detaching. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22203 Notes: svn path=/head/; revision=354344
* iflib: cleanup memory leaks on driver detachEric Joyner2019-10-301-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: The iflib stack failed to release all of the memory allocated under M_IFLIB during device detach. Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx DMA info were not being released. Release this memory so that iflib won't leak memory when a device detaches. Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we need to call this only after iflib_dma_free in iflib_tx_structures_free. Additionally, also ensure that we destroy the callout mutex associated with each Tx queue when we free it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22157 Notes: svn path=/head/; revision=354207
* iflib: call ether_ifdetach and netmap_detach before stopEric Joyner2019-10-231-8/+24
| | | | | | | | | | | | | | | | | | | | | From Jake: Calling ether_ifdetach after iflib_stop leads to a potential race where a stale ifp pointer can remain in the route entry list for IPv6 traffic. This will potentially cause a page fault or other system instability if the ifp pointer is accessed. Move both iflib_netmap_detach and ether_ifdetach to be called prior to iflib_stop. This avoids the race above, and helps ensure that other ifp references are removed before stopping the interface. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, jhb@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D22071 Notes: svn path=/head/; revision=353967
* Split out a more generic debugnet(4) from netdump(4)Conrad Meyer2019-10-171-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Debugnet is a simplistic and specialized panic- or debug-time reliable datagram transport. It can drive a single connection at a time and is currently unidirectional (debug/panic machine transmit to remote server only). It is mostly a verbatim code lift from netdump(4). Netdump(4) remains the only consumer (until the rest of this patch series lands). The INET-specific logic has been extracted somewhat more thoroughly than previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as much as possible as is protocol-independent, remains in debugnet.c. The separation is not perfect and future improvement is welcome. Supporting INET6 is a long-term goal. Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to 'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the generic module would be more confusing than the refactoring. The only functional change here is the mbuf allocation / tracking. Instead of initiating solely on netdump-configured interface(s) at dumpon(8) configuration time, we watch for any debugnet-enabled NIC for link activation and query it for mbuf parameters at that time. If they exceed the existing high-water mark allocation, we re-allocate and track the new high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone. In a future patch in this series, this will allow initiating netdump from panic ddb(4) without pre-panic configuration. No other functional change intended. Reviewed by: markj (earlier version) Some discussion with: emaste, jhb Objection from: marius Differential Revision: https://reviews.freebsd.org/D21421 Notes: svn path=/head/; revision=353685
* Add IFLIB_SINGLE_IRQ_RX_ONLY.Mark Johnston2019-09-301-3/+6
| | | | | | | | | | | | | | | | | | As of r347221 the iflib legacy interrupt mode setup assumes that drivers perform both receive and transmit processing from the interrupt handler. This assumption is invalid in the vmxnet3 driver, so introduce the IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the interrupt handler. PR: 239118 Reported and tested by: Juraj Lutter <otis@sk.freebsd.org> Obtained from: marius Reviewed by: gallatin MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21831 Notes: svn path=/head/; revision=352906
* kTLS support for TLS 1.3Andrew Gallatin2019-09-271-4/+6
| | | | | | | | | | | | | | | | TLS 1.3 requires a few changes because 1.3 pretends to be 1.2 with a record type of application data. The "real" record type is then included at the end of the user-supplied plaintext data. This required adding a field to the mbuf_ext_pgs struct to save the record type, and passing the real record type to the sw_encrypt() ktls backend functions. Reviewed by: jhb, hselasky Sponsored by: Netflix Differential Revision: D21801 Notes: svn path=/head/; revision=352814
* iflib: Remove redundant VLAN events deregistrationEric Joyner2019-09-241-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | From Piotr: r351152 introduced iflib_deregister() function calling EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes duplicate of EVENTHANDLER_DEREGISTER() calls placed in iflib_device_deregister() as this function is now calling iflib_deregister(). This is to avoid deregistering same event twice. This patch also adds check in iflib_vlan_register() to prevent registering VLAN while being in detach. Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>, erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>. Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Reviewed by: gallatin@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21711 Notes: svn path=/head/; revision=352655
* iflib: add iflib_deregister to help cleanup on exitEric Joyner2019-08-161-14/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit message by Jake: The iflib_register function exists to allocate and setup some common structures used by both iflib_device_register and iflib_pseudo_register. There is no associated cleanup function used to undo the steps taken in this function. Both iflib_device_deregister and iflib_pseudo_deregister have some of the necessary steps scattered in their flow. However, most of the necessary cleanup is not done during the error path of iflib_device_register and iflib_pseudo_register. Some examples of missed cleanup include: the ifp pointer is not free'd during error cleanup the STATE and CTX locks are not destroyed during error cleanup the vlan event handlers are not removed during error cleanup media added to the ifmedia structure is not removed the kobject reference is never deleted Additionally, when initializing the kobject class reference counter is increased even though kobj_init already increases it. This results in the class never being free'd again because the reference count would never hit zero even after all driver instances are unloaded. To aid in proper cleanup, implement an iflib_deregister function that goes through the reverse steps taken by iflib_register. Call this function during the error cleanup for iflib_device_register and iflib_pseudo_register. Additionally call the function in the iflib_device_deregister and iflib_pseudo_deregister functions near the end of their flow. This helps reduce code duplication and ensures that proper steps are taken to cleanup allocations and references in both the regular and error cleanup flows. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21005 Notes: svn path=/head/; revision=351152
* iflib: Prevent kernel panic caused by loading driver with a specific ↵Eric Joyner2019-08-011-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interrupt configuration If a device has only 1 MSI-X interrupt available and does not support either MSI or legacy interrupts, iflib_device_register() will fail, leak memory and MSI resources, and the driver will not load. Worse, if another iflib-using driver tries to unload afterwards, a kernel panic will occur because the previous failed iflib driver loead did not properly call "taskqgroup_detach()" during it's cleanup. This patch is band-aid for this situation -- don't try allocating MSI or legacy interrupts if a single MSI-X interrupt was allocated, but fail to load instead. As well, during the cleanup, properly call taskqgroup_detach() on the admin task to prevent panics when other iflib drivers unload. This whole interrupt allocation process actually needs re-doing to properly support devices with only a single MSI-X interrupt, devices that only support MSI-X, non-PCI devices, and multiple non-MSIX interrupts, as well. Signed-off-by: Eric Joyner <erj@freebsd.org> Reviewed by: marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D20747 Notes: svn path=/head/; revision=350509
* iflib: remove kobject class reference incrementEric Joyner2019-08-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit message from Jake: In iflib_register, the context is initialized as a kobject using the device driver's "driver" kobject class. As part of this, the function mistakenly increments the ref counter. The ref counter is incremented twice, once in the code directly, and once again by kobj_class_compile. However, there is no associated decrement in the detach path. Because of this, the ref counter will never go back down to zero, and thus the kobject method table will never be released. Remove this unnecessary reference count increment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: jhb@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21125 Notes: svn path=/head/; revision=350507
* iflib: fix dangling device softc pointerEric Joyner2019-07-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit text by Jake: If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register function will free the ctx pointer. However, it does not reset the device softc pointer to NULL. This will result in memory corruption as a future access to the now invalid pointer will corrupt memory that is later allocated on top of the same memory location. The iflib_device_deregister function correctly resets the softc pointer by using device_set_softc(). This clears up the invalid dangling pointer and prevents memory corruption that could lead to a panic or undefined behavior if the device's driver failed to attach. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21003 Notes: svn path=/head/; revision=350306
* o In iflib_txq_drain():Marius Strobl2019-06-261-12/+7
| | | | | | | | | | | | | | | | | - Remove desc_used, which is only ever written to. - Remove a dead store to reclaimed. - Don't recycle avail. - Sort variables according to style(9). These changes will make a subsequent commit easier to read. o In iflib_tx_credits_update(), don't bother checking whether the ift_txd_credits_update method pointer is NULL; _iflib_pre_assert() asserts upfront that this method has been assigned and functions like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}() and _task_fn_tx() were already unconditionally relying on the method being callable. Notes: svn path=/head/; revision=349414
* V_ip6_forwarding and V_ipforwarding have been defined in ip6_var.h /Marko Zec2019-06-191-2/+2
| | | | | | | | | ip_var.h since at least 2008, so make use of those definitions here. MFC after: 3 days Notes: svn path=/head/; revision=349186
* Evaluating htons() at compile time is more efficient than doing ntohs()Marko Zec2019-06-191-7/+5
| | | | | | | | | | | at runtime. This change removes a dependency on a barrel shifter pass before branch resolution, while reducing the instruction stream size by 9 bytes on amd64. MFC after: 3 days Notes: svn path=/head/; revision=349185
* - Replace unused and only ever written to members of public iflib(9)Marius Strobl2019-06-151-17/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES etc. are also only ever used for these write-only members if at all, so both these macros and members can just go). Using these spares may render it possible to merge certain iflib(9) fixes to stable/12. Otherwise, changes extending struct if_irq or struct if_shared_ctx in any way would break KBI as instances of these are allocated by the driver front-ends (by contrast, struct if_pkt_info as well as struct if_softc_ctx instances are provided by iflib(9) and, thus, may grow at least at the end without breaking KBI). - Make the pvi_name in struct pci_vendor_info const char * as device identifiers in hardware lookup tables aren't to be expected to ever change at runtime. - Similarly, make the pci_vendor_info_t of struct if_shared_ctx which is used to point to the struct pci_vendor_info arrays provided by the driver front-ends const. - Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only consuming the latter macro. - Make the name argument of iflib_io_tqg_attach(9) const, matching the taskqgroup_attach_cpu(9) this function wraps as well as e. g. iflib_config_gtask_init(9). - Remove the orphaned iflib_qset_lock_get() prototype. - Remove some extraneous empty lines. Notes: svn path=/head/; revision=349055
* iflib: provide probe wrapper for vendor driversEric Joyner2019-05-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: Vendor drivers that exist out-of-tree generally should return BUS_PROBE_VENDOR from their device probe functions. This helps ensure that a vendor replacement driver will supersede the in-kernel driver for a given device. Currently, if a vendor wants to implement a driver based on iflib, it will always report BUS_PROBE_DEFAULT. Add a wrapper function, iflib_device_probe_vendor() which can be used in place of iflib_device_probe(). This function will just return BUS_PROBE_VENDOR whenever iflib_device_probe() would return BUS_PROBE_DEFAULT. While vendor drivers can already implement such a wrapper themselves, providing it in the iflib.h header makes it easier for the vendor driver to do the right thing. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D20221 Notes: svn path=/head/; revision=348372
* iflib: use default ntxd and nrxd when user value is not power of 2Eric Joyner2019-05-101-48/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: A user may set a sysctl to override the default number of Tx or Rx descriptors. However, certain calculations in the iflib core expect the number of descriptors to be a power of 2. Update _iflib_assert to verify that all of the shared context parameters for the number of descriptors are powers of 2. Modify iflib_reset_qvalues to check that the provided isc_nrxd value is a power of 2. If it's not, print a warning message and then use the default value. An alternative might be to try rounding the number down instead. However, this creates problems in case the rounded down value is below the minimum value that the driver would support. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19880 Notes: svn path=/head/; revision=347418
* Allow to build without INET and INET6 again after r347221.Marius Strobl2019-05-081-0/+2
| | | | | | | Submitted by: cam Notes: svn path=/head/; revision=347245
* o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx andMarius Strobl2019-05-071-62/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP TX throughput of EM-class devices on par with the MSI-X case and, thus, close to wirespeed/pre-iflib(4) times again. [1] Note that independently of the interrupt type, the UDP performance with these MACs still is abysmal and nowhere near to where it was before the conversion of em(4) to iflib(4). o In iflib_init_locked(), announce which free list failed to set up. o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt as the latter is valid with MSI-X only. o Instead of adding the missing - and apparently convoluted enough that a DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and make the checks fail gracefully rather than panic. This avoids invoking the checks at runtime over and over again in iflib_fast_intr_rxtx() and _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes these functions more readable. o In iflib_rx_structures_setup(), only initialize LRO resources if device and driver have LRO capability in order to not waste memory. Also, free the LRO resources again if setting them up fails for one of the queues. However, don't bother invoking iflib_rx_sds_free() in that case because iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case of failure via iflib_rx_structures_free(), but there definitely is some asymmetry left to be fixed, though). o Similarly, free LRO resources again in iflib_rx_structures_free(). o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully instead of panicing (but only in case of INVARIANTS). This is a follow- up to r344132, as such driver bugs shouldn't be fatal. o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic() gracefully, too. o Bring yet more sanity to iflib_msix_init(): - If the device doesn't provide enough MSI-X vectors or not all vectors can be allocate so the expected number of queues in addition to admin interrupts can't be supported, try MSI next (and then INTx) as proper MSI-X vector distribution can't be assured in such cases. In essence, this change brings r254008 forward to iflib(4). Also, this is the fix alluded to in the commit message of r343934. - If the MSI-X allocation has failed, don't prematurely announce MSI is going to be used as the latter in fact may not be available either. - When falling back to MSI, only release the MSI-X table resource again if it was allocated in iflib_msix_init(), i. e. isn't supplied by the driver, in the first place. o In mp_ndesc_handler(), handle unknown type arguments gracefully, too. PR: 235031 (likely) [1] Reviewed by: shurd Differential Revision: https://reviews.freebsd.org/D20175 Notes: svn path=/head/; revision=347221
* - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx.Marius Strobl2019-05-061-84/+66
| | | | | | | | | | | | | | | | | | | | | | - Remove the only ever written to ift_db_mtx_name member of struct iflib_txq. - Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen and ifr_lro_enabled members of struct iflib_rxq. - Consistently spell DMA, RX and TX uppercase in comments, messages etc. instead of mixing with some lowercase variants. - Consistently use if_t instead of a mix of if_t and struct ifnet pointers. - Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and iflib_fl_setup() in line with reality. - Judging problem reports, people are wondering what on earth messages like: "TX(0) desc avail = 1024, pidx = 0" are trying to indicate. Thus, extend this string to be more like that of non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due to which the interface will be reset. - Take advantage of the M_HAS_VLANTAG macro. - Use false/true rather than FALSE/TRUE for variables of type bool. - Use FALLTHROUGH as advocated by style(9). Notes: svn path=/head/; revision=347211
* Allow iflib drivers to pass a pointer to their own ifmedia structure.Matt Macy2019-05-031-8/+14
| | | | | | | | | Tested by: emaste@ Differential Revision: https://reviews.freebsd.org/D19946 Notes: svn path=/head/; revision=347057
* iflib: remove assertion that isc_capabilities is nonzeroEd Maste2019-05-021-2/+0
| | | | | | | | | | | | | It's atypical, but not invalid, for a driver to pass no capabilities. Submitted by: Gerald Aryeetey <aryeeteygerald_rogers.com> Reviewed by: shurd MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20142 Notes: svn path=/head/; revision=347031
* iflib: Better control over queue core assignmentStephen Hurd2019-04-251-3/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, cores are now assigned to queues in a sequential manner rather than all NICs starting at the first core. On a four-core system with two NICs each using two queue pairs, the nic:queue -> core mapping has changed from this: 0:0 -> 0, 0:1 -> 1 1:0 -> 0, 1:1 -> 1 To this: 0:0 -> 0, 0:1 -> 1 1:0 -> 2, 1:1 -> 3 Additionally, a device can now be configured to use separate cores for TX and RX queues. Two new tunables have been added, dev.X.Y.iflib.separate_txrx and dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part of the auto-assigned sequence. Reviewed by: marius MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D20029 Notes: svn path=/head/; revision=346708
* iflib: Add pfil hooksAndrew Gallatin2019-04-241-38/+135
| | | | | | | | | | | | | | | | | | | | | | As with mlx5en, the idea is to drop unwanted traffic as early in receive as possible, before mbufs are allocated and anything is passed up the stack. This can save considerable CPU time when a machine is under a flooding style DOS attack. The major change here is to remove the unneeded abstraction where callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and are responsible for NULL'ing that mbuf themselves. Now this happens directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us to use the decision (and potentially mbuf) returned by the pfil hooks. The driver can now recycle mbufs to avoid re-allocation when packets are dropped. Reviewed by: marius (shurd and erj also provided feedback) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19645 Notes: svn path=/head/; revision=346632
* iflib: Use new ether_gen_addr, restricting addresses to that subsetKyle Evans2019-04-171-41/+6
| | | | | | | Differential Revision: https://reviews.freebsd.org/D19587 Notes: svn path=/head/; revision=346326
* iflib: return ENETDOWN when the network device is downEric Joyner2019-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: iflib_if_transmit returns ENOBUFS when the device is down, or when the link isn't active. This was changed in r308792 from return (0), so that the function correctly reports an error that it was unable to transmit. However, using ENOBUFS can cause some network applications to produce the following or similar errors: "ping: sendto: No buffer space available" This is a bit confusing as the real cause of the issue is that the network device is down. Replace the ENOBUFS return with ENETDOWN to indicate more clearly that the reason for the failure to send is due to the network device is offline. This will cause the error message to be reported as "ping: sendto: Network is down" Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, sbruno@, bz@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19652 Notes: svn path=/head/; revision=345658
* iflib: hold the CTX lock in iflib_pseudo_registerEric Joyner2019-03-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | From Jake: The iflib_device_register function takes the CTX lock before calling IFDI_ATTACH_PRE, and releases it upon finishing the registration. Mirror this process in iflib_pseudo_register, so that we always hold the CTX lock during the attach process when registering a pseudo interface or a regular interface. This was caught by code inspection while attempting to analyze where the CTX lock was held. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19604 Notes: svn path=/head/; revision=345657
* iflib: mark isc_driver_version as constantEric Joyner2019-03-191-2/+2
| | | | | | | | | | | | | | | | From Jake: The iflib core never modifies the isc_driver_version string. Allow drivers to safely assign pointers to constant buffers by marking this parameter const. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@, jhb@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19577 Notes: svn path=/head/; revision=345312
* iflib: expose the Rx mbuf buffer size to driversEric Joyner2019-03-191-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based on the isc_max_frame_size value that drivers setup. This calculation is repeated by drivers when programming their hardware with the size of each Rx buffer. This can lead to a mismatch where the iflib mbuf size is different from the expected size of the buffer as programmed by the hardware. This can lead to unexpected results. If iflib ever wants to support mbuf sizes larger than one page, every driver must be updated to account for the new possible buffer sizes. Fix this by calculating the mbuf size prior to calling IFDI_INIT, and adding the iflib_get_rx_mbuf_sz function which will expose this value to drivers, so that they do not repeat the same calculation. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19489 Notes: svn path=/head/; revision=345305
* iflib: prevent possible infinite loop in iflib_encapEric Joyner2019-03-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Jake: iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an m_collapse and an m_defrag are attempted to shrink the mbuf cluster to fit within the DMA segment limitations. However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns EFBIG on the now defragmented mbuf, we will continuously re-call bus_dmamap_load_mbuf_sg over and over. This happens because m_head isn't NULL, and remap is >1, so we don't try to m_collapse or m_defrag again. The only way we exit the loop is if m_head is NULL. However, m_head can't be modified by the call to bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer. I believe this will be an incredibly rare occurrence, because it is unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second defragment with an EFBIG error. However, it still seems like a possibility that we should account for. Fix the exit check to ensure that if remap is >1, we will also exit, even if m_head is not NULL. Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D19468 Notes: svn path=/head/; revision=345303