aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/gve
Commit message (Collapse)AuthorAgeFilesLines
* gve: Relax a static assertionMark Johnston2025-06-141-1/+1
| | | | | | It's okay if MCLBYTES is larger than the default receive buffer size. Fixes: 71702df61262 ("gve: Add support for 4k RX Buffers when using DQO queue formats")
* gve: Add support for 4k RX Buffers when using DQO queue formatsVee Agarwal2025-06-136-17/+76
| | | | | | | | | | | | | This change adds support for using 4K RX Buffers when using DQO queue formats when a boot-time tunable flag is set to true by the user. When this flag is enabled, the driver will use 4K RX Buffer size either when HW LRO is enabled or mtu > 2048. Signed-off-by: Vee Agarwal <veethebee@google.com> Reviewed by: markj, ziaee MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50786
* gve: Fix timestamp invalidation for DQO queue formatsJasper Tran O'Leary2025-06-051-4/+3
| | | | | | | | | | | | | | | | We need to invalidate timestamps when a TX queue is cleared so that the TX timeout detection callout does not mistakenly fire for cleared packets. When using DQO queue formats, timestamps are set on the pending packet array whose length is not the same as the length of the descriptor ring itself. This commit fixes logic which invalidated the wrong number of pending packets. Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Fixes: 3d2957336c7d ("gve: Add callout to detect and handle TX timeouts") Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D50688
* gve: Add callout to detect and handle TX timeoutsJasper Tran O'Leary2025-05-206-4/+234
| | | | | | | | | | | | | | | | | | | | | | A TX timeout occurs when the driver allocates resources on a TX queue for a packet to be sent, prompts the hardware to send the packet, but does not receive a completion for the packet within a given timeout period. An accumulation of TX timeouts can cause one or more queues to run out of space and cause the entire driver to become stuck. This commit adds a lockless timer service that runs periodically and checks queues for timed out packets. In the event we detect a timeout, we prompt the completion phase taskqueue to process completions. Upon the next inspection of the queue we still detect timed out packets, if the last "kick" occurred within a fixed cooldown window, we opt to reset the driver, even if the prior kick successfully freed timed out packets. Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: markj, ziaee MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50385
* gve: Use load-acquire to fetch generation bitsJasper Tran O'Leary2025-05-204-20/+52
| | | | | | | | | | | | | | | | | | | When running the driver using the DQO queue format, we must load the generation bit and check it before possibly reading the rest of the descriptor's fields. Previously, we guarded against reordering of reads using an explicit thread fence. This commit changes the thread fence to a load with acquire semantics. Because the tx and rx generation fields are in a bitfield, we cannot explicitly address them in an atomic load. Instead we load the respective containing bytes in the descriptor and mask them appropriately. Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50384
* gve: Add feature to change TX/RX ring sizeVee Agarwal2025-04-045-6/+210
| | | | | | | | | | | | | | | | | This change introduces new sysctl handlers that allow the user to change RX/TX ring sizes. As before, the default ring sizes will come from the device (usually 1024). We also get the max/min limits from the device. In the case min values are not provided we have statically defined constants for the min values. Additionally, if the modify ring option is not enabled on the device, changing ring sizes via sysctl will not be possible. When changing ring sizes, the interface turns down momentarily while allocating/freeing resources as necessary. Signed-off-by: Vee Agarwal <veethebee@google.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49428
* gve: Add feature to adjust RX/TX queue countsVee Agarwal2025-04-046-39/+189
| | | | | | | | | | | | | This change introduces new sysctl handlers that allow the user to change RX/TX queue counts. As before, the default queue counts will be the max value the device can support. When chaning queue counts, the interface turns down momentarily while allocating/freeing resources as necessary. Signed-off-by: Vee Agarwal <veethebee@google.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49427
* gve: Allocate qpl per ring at ring allocation timeVee Agarwal2025-04-047-132/+114
| | | | | | | | | | | | | | Every tx and rx ring has its own queue-page-list (QPL) that serves as the bounce buffer. Previously we were allocating QPLs for all queues before the queues themselves were allocated and later associating a QPL with a queue. This is avoidable complexity: it is much more natural for each queue to allocate and free its own QPL. Signed-off-by: Vee Agarwal <veethebee@google.com> Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49426
* gve: Fix qpl_buf_head being initialized improperlyJasper Tran O'Leary2025-02-141-9/+16
| | | | | | | | | | | | | | | | | | | | Currently, for DQO QPL our MPASS assertion on qpl_buf_head for available pending_pkts (i.e. not holding a packet) fails due to incorrect initialization. The MPASS fails on the first run of packets through the ring when INVARIANTS is on, and when INVARIANTS is off, things work without a bug. The MPASS guards against improper reaping of "pending_pkt" objects, and thus was failing for the first run through the ring. By correctly initializing the objects in this patch we make the MPASS not fail on the first run too. Signed-off-by: Vee Agarwal <veethebee@google.com> Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: delphij, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D48968
* gve: Do minor cleanup and bump versionJasper Tran O'Leary2025-02-148-22/+20
| | | | | | | | | | | | | | | | | | | | This commit fixes several minor issues: - Removes an unnecessary function pointer parameter on gve_start_tx_ring - Adds a presubmit check against style(9) - Replaces mb() and rmb() macros with native atomic_thread_fence_seq_cst() and atomic_thread_fence_acq() respectively - Fixes various typos throughout - Increments the version number to 1.3.2 Co-authored-by: Vee Agarwal <veethebee@google.com> Signed-off-by: Vee Agarwal <veethebee@google.com> Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: delphij, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D48969
* gve: Fix memory leak during resetJasper Tran O'Leary2025-02-141-8/+41
| | | | | | | | | | | | | | Before this change, during reset we were allocating new memory for priv->ptype_lut_dqo, irq_db_array and the counter_array over the old memory. This change ensures we do not allocate new memory during reset and avoid memory leaks. Signed-off-by: Vee Agarwal <veethebee@google.com> Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: delphij, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D48970
* gve: Disallow MTUs within a problematic rangeJasper Tran O'Leary2025-02-141-0/+15
| | | | | | | | | | | | | If hardware LRO is enabled with GVE, then setting the driver's MTU to a range of values around 8000 will cause dropped packets and drastically degraded performance. While this issue is being investigated, we need to prohibit the driver's MTU being set to a value within this range. Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com> Reviewed by: delphij, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D48971
* gve: Fix TX livelockShailend Chand2024-11-065-26/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change the transmit taskqueue would enqueue itself when it cannot find space on the NIC ring with the hope that eventually space would be made. This results in the following livelock that only occurs after passing ~200Gbps of TCP traffic for many hours: 100% CPU ┌───────────┐wait on ┌──────────┐ ┌───────────┐ │user thread│ cpu │gve xmit │wait on │gve cleanup│ │with mbuf ├────────►│taskqueue ├────────►│taskqueue │ │uma lock │ │ │ NIC ring│ │ └───────────┘ └──────────┘ space └─────┬─────┘ ▲ │ │ wait on mbuf uma lock │ └───────────────────────────────────────────┘ Further details about the livelock are available on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560. After this change, the transmit taskqueue no longer spins till there is room on the NIC ring. It instead stops itself and lets the completion-processing taskqueue wake it up. Since I'm touching the trasnmit taskqueue I've also corrected the name of a counter and also fixed a bug where EINVAL mbufs were not being freed and were instead living forever on the bufring. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D47138
* gve: Add DQO QPL supportShailend Chand2024-11-0611-146/+981
| | | | | | | | | | | | | | | | | | | | | | DQO is the descriptor format for our next generation virtual NIC. It is necessary to make full use of the hardware bandwidth on many newer GCP VM shapes. This patch extends the previously introduced DQO descriptor format with a "QPL" mode. QPL stands for Queue Page List and refers to the fact that the hardware cannot access arbitrary regions of the host memory and instead expects a fixed bounce buffer comprising of a list of pages. The QPL aspects are similar to the already existing GQI queue queue format: in that the mbufs being input in the Rx path have external storage in the form of vm pages attached to them; and in the Tx path we always copy the mbuf payload into QPL pages. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D46691
* gve: Add DQO RDA supportShailend Chand2024-11-0612-138/+2410
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DQO is the descriptor format for our next generation virtual NIC. It is necessary to make full use of the hardware bandwidth on many newer GCP VM shapes. One major change with DQO from its predecessor GQI is that it uses dual descriptor rings for both TX and RX queues. The TX path uses a descriptor ring to send descriptors to HW, and receives packet completion events on a TX completion ring. The RX path posts buffers to HW using an RX descriptor ring and receives incoming packets on an RX completion ring. In GQI-QPL, the hardware could not access arbitrary regions of guest memory, which is why there was a pre-negotitated bounce buffer (QPL: Queue Page List). DQO-RDA has no such limitation. "RDA" is in contrast to QPL and stands for "Raw DMA Addressing" which just means that HW does not need a fixed bounce buffer and can DMA arbitrary regions of guest memory. A subsequent patch will introduce the DQO-QPL datapath that uses the same descriptor format as in this patch, but will have a fixed bounce buffer. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D46690
* Check for errors when detaching children first, not lastJohn Baldwin2024-11-051-1/+6
| | | | | | | | | | | | These detach routines in these drivers all ended with 'return (bus_generic_detach())' meaning that if any child device failed to detach, the parent driver was left in a mostly destroyed state, but still marked attached. Instead, bus drivers should detach child drivers first and return errors before destroying driver state in the parent. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D47387
* net: Remove unneeded NULL check for the allocated ifnetZhenlei Huang2024-06-281-11/+2
| | | | | | | | | | | Change 4787572d0580 made if_alloc_domain() never fail, then also do the wrappers if_alloc(), if_alloc_dev(), and if_gethandle(). No functional change intended. Reviewed by: kp, imp, glebius, stevek MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D45740
* gve: Make gve_free_qpls idempotentShailend Chand2024-06-181-0/+1
| | | | | | | | This fixes a panic caused by double free. PR: kern/279410 MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D45489
* gve: Make LRO work for jumbo packetsShailend Chand2023-09-073-5/+7
| | | | | | | | | | | | | Each Rx descriptor points to a packet buffer of size 2K, which means that MTUs greater than 2K see multi-descriptor packets. The TCP-hood of such packets was being incorrectly determined by looking for a flag on the last descriptor instead of the first descriptor. Also fixed and progressed the version number. Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D41754
* gve: Simplify tx loop over buffer ringShailend Chand2023-08-121-3/+2
| | | | | | Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D41281
* gve: Fix Tx tcpdump panicShailend Chand2023-07-271-3/+3
| | | | | | | | | Ringing the doorbell before making the BPF call can result in the mbuf being freed before the BPF call. Reviewed-by: markj MFC-after: 3 days Differential Revision: https://reviews.freebsd.org/D41189
* gve: Unobfuscate code by using nitems directly for loop.Xin LI2023-06-071-4/+3
| | | | | | | | | While there, also make MODULE_PNP_INFO to reflect that the device description is provided. Reported-by: jrtc27 Reviewed-by: jrtc27, imp Differential Revision: https://reviews.freebsd.org/D40430
* gve: Add PNP info to PCI attachment of gve(4) driver.Xin LI2023-06-061-4/+24
| | | | | Reviewed-by: imp Differential Revision: https://reviews.freebsd.org/D40429
* gve: Fix build on i386 and enable LINT builds.Xin LI2023-06-041-3/+3
| | | | | Reviewed-by: imp Differential Revision: https://reviews.freebsd.org/D40419
* Add gve, the driver for Google Virtual NIC (gVNIC)Shailend Chand2023-06-0212-0/+5248
gVNIC is a virtual network interface designed specifically for Google Compute Engine (GCE). It is required to support per-VM Tier_1 networking performance, and for using certain VM shapes on GCE. The NIC supports TSO, Rx and Tx checksum offloads, and RSS. It does not currently do hardware LRO, and thus the software-LRO in the host is used instead. It also supports jumbo frames. For each queue, the driver negotiates a set of pages with the NIC to serve as a fixed bounce buffer, this precludes the use of iflib. Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39873