| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
It's okay if MCLBYTES is larger than the default receive buffer size.
Fixes: 71702df61262 ("gve: Add support for 4k RX Buffers when using DQO queue formats")
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds support for using 4K RX Buffers when using DQO queue
formats when a boot-time tunable flag is set to true by the user.
When this flag is enabled, the driver will use 4K RX Buffer size either
when HW LRO is enabled or mtu > 2048.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Reviewed by: markj, ziaee
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50786
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to invalidate timestamps when a TX queue is cleared so that the
TX timeout detection callout does not mistakenly fire for cleared
packets. When using DQO queue formats, timestamps are set on the pending
packet array whose length is not the same as the length of the
descriptor ring itself. This commit fixes logic which invalidated the
wrong number of pending packets.
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Fixes: 3d2957336c7d ("gve: Add callout to detect and handle TX timeouts")
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D50688
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A TX timeout occurs when the driver allocates resources on a TX queue
for a packet to be sent, prompts the hardware to send the packet, but
does not receive a completion for the packet within a given timeout
period. An accumulation of TX timeouts can cause one or more queues to
run out of space and cause the entire driver to become stuck.
This commit adds a lockless timer service that runs periodically and
checks queues for timed out packets. In the event we detect a timeout,
we prompt the completion phase taskqueue to process completions. Upon
the next inspection of the queue we still detect timed out packets, if
the last "kick" occurred within a fixed cooldown window, we opt to
reset the driver, even if the prior kick successfully freed timed out
packets.
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: markj, ziaee
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50385
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running the driver using the DQO queue format, we must load the
generation bit and check it before possibly reading the rest of the
descriptor's fields.
Previously, we guarded against reordering of reads using an explicit
thread fence. This commit changes the thread fence to a load with
acquire semantics. Because the tx and rx generation fields are in a
bitfield, we cannot explicitly address them in an atomic load. Instead
we load the respective containing bytes in the descriptor and mask them
appropriately.
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50384
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces new sysctl handlers that allow the user to change
RX/TX ring sizes. As before, the default ring sizes will come from the
device (usually 1024). We also get the max/min limits from the device.
In the case min values are not provided we have statically defined
constants for the min values. Additionally, if the modify ring option is
not enabled on the device, changing ring sizes via sysctl will not be
possible. When changing ring sizes, the interface turns down
momentarily while allocating/freeing resources as necessary.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49428
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces new sysctl handlers that allow the user to change
RX/TX queue counts. As before, the default queue counts will be the max
value the device can support. When chaning queue counts, the interface turns
down momentarily while allocating/freeing resources as necessary.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49427
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every tx and rx ring has its own queue-page-list (QPL) that serves as
the bounce buffer. Previously we were allocating QPLs for all queues
before the queues themselves were allocated and later associating a QPL
with a queue. This is avoidable complexity: it is much more natural for
each queue to allocate and free its own QPL.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49426
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, for DQO QPL our MPASS assertion on qpl_buf_head for available
pending_pkts (i.e. not holding a packet) fails due to incorrect
initialization. The MPASS fails on the first run of packets through the
ring when INVARIANTS is on, and when INVARIANTS is off, things work
without a bug.
The MPASS guards against improper reaping of "pending_pkt" objects,
and thus was failing for the first run through the ring. By correctly
initializing the objects in this patch we make the MPASS not fail on the
first run too.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: delphij, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D48968
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes several minor issues:
- Removes an unnecessary function pointer parameter on gve_start_tx_ring
- Adds a presubmit check against style(9)
- Replaces mb() and rmb() macros with native
atomic_thread_fence_seq_cst() and atomic_thread_fence_acq()
respectively
- Fixes various typos throughout
- Increments the version number to 1.3.2
Co-authored-by: Vee Agarwal <veethebee@google.com>
Signed-off-by: Vee Agarwal <veethebee@google.com>
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: delphij, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D48969
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, during reset we were allocating new memory for
priv->ptype_lut_dqo, irq_db_array and the counter_array over the old
memory. This change ensures we do not allocate new memory during reset
and avoid memory leaks.
Signed-off-by: Vee Agarwal <veethebee@google.com>
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: delphij, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D48970
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If hardware LRO is enabled with GVE, then setting the driver's MTU to a
range of values around 8000 will cause dropped packets and drastically
degraded performance. While this issue is being investigated, we need
to prohibit the driver's MTU being set to a value within this range.
Signed-off-by: Jasper Tran O'Leary <jtranoleary@google.com>
Reviewed by: delphij, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D48971
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change the transmit taskqueue would enqueue itself when it
cannot find space on the NIC ring with the hope that eventually space
would be made. This results in the following livelock that only occurs
after passing ~200Gbps of TCP traffic for many hours:
100% CPU
┌───────────┐wait on ┌──────────┐ ┌───────────┐
│user thread│ cpu │gve xmit │wait on │gve cleanup│
│with mbuf ├────────►│taskqueue ├────────►│taskqueue │
│uma lock │ │ │ NIC ring│ │
└───────────┘ └──────────┘ space └─────┬─────┘
▲ │
│ wait on mbuf uma lock │
└───────────────────────────────────────────┘
Further details about the livelock are available on
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560.
After this change, the transmit taskqueue no longer spins till there is
room on the NIC ring. It instead stops itself and lets the
completion-processing taskqueue wake it up.
Since I'm touching the trasnmit taskqueue I've also corrected the name
of a counter and also fixed a bug where EINVAL mbufs were not being
freed and were instead living forever on the bufring.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47138
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DQO is the descriptor format for our next generation virtual NIC.
It is necessary to make full use of the hardware bandwidth on many
newer GCP VM shapes.
This patch extends the previously introduced DQO descriptor format
with a "QPL" mode. QPL stands for Queue Page List and refers to
the fact that the hardware cannot access arbitrary regions of the
host memory and instead expects a fixed bounce buffer comprising
of a list of pages.
The QPL aspects are similar to the already existing GQI queue
queue format: in that the mbufs being input in the Rx path have
external storage in the form of vm pages attached to them; and
in the Tx path we always copy the mbuf payload into QPL pages.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46691
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DQO is the descriptor format for our next generation virtual NIC.
It is necessary to make full use of the hardware bandwidth on many
newer GCP VM shapes.
One major change with DQO from its predecessor GQI is that it uses
dual descriptor rings for both TX and RX queues.
The TX path uses a descriptor ring to send descriptors to HW, and
receives packet completion events on a TX completion ring.
The RX path posts buffers to HW using an RX descriptor ring and
receives incoming packets on an RX completion ring.
In GQI-QPL, the hardware could not access arbitrary regions of
guest memory, which is why there was a pre-negotitated bounce buffer
(QPL: Queue Page List). DQO-RDA has no such limitation.
"RDA" is in contrast to QPL and stands for "Raw DMA Addressing" which
just means that HW does not need a fixed bounce buffer and can DMA
arbitrary regions of guest memory.
A subsequent patch will introduce the DQO-QPL datapath that uses the
same descriptor format as in this patch, but will have a fixed
bounce buffer.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46690
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These detach routines in these drivers all ended with 'return
(bus_generic_detach())' meaning that if any child device failed to
detach, the parent driver was left in a mostly destroyed state, but
still marked attached. Instead, bus drivers should detach child
drivers first and return errors before destroying driver state in the
parent.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47387
|
| |
|
|
|
|
|
|
|
|
|
| |
Change 4787572d0580 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
|
| |
|
|
|
|
|
|
| |
This fixes a panic caused by double free.
PR: kern/279410
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D45489
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Each Rx descriptor points to a packet buffer of size 2K, which means
that MTUs greater than 2K see multi-descriptor packets. The TCP-hood of
such packets was being incorrectly determined by looking for a flag on
the last descriptor instead of the first descriptor.
Also fixed and progressed the version number.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41754
|
| |
|
|
|
|
| |
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D41281
|
| |
|
|
|
|
|
|
|
| |
Ringing the doorbell before making the BPF call can result in the
mbuf being freed before the BPF call.
Reviewed-by: markj
MFC-after: 3 days
Differential Revision: https://reviews.freebsd.org/D41189
|
| |
|
|
|
|
|
|
|
| |
While there, also make MODULE_PNP_INFO to reflect that the device
description is provided.
Reported-by: jrtc27
Reviewed-by: jrtc27, imp
Differential Revision: https://reviews.freebsd.org/D40430
|
| |
|
|
|
| |
Reviewed-by: imp
Differential Revision: https://reviews.freebsd.org/D40429
|
| |
|
|
|
| |
Reviewed-by: imp
Differential Revision: https://reviews.freebsd.org/D40419
|
|
|
gVNIC is a virtual network interface designed specifically for
Google Compute Engine (GCE). It is required to support per-VM Tier_1
networking performance, and for using certain VM shapes on GCE.
The NIC supports TSO, Rx and Tx checksum offloads, and RSS.
It does not currently do hardware LRO, and thus the software-LRO
in the host is used instead. It also supports jumbo frames.
For each queue, the driver negotiates a set of pages with the NIC to
serve as a fixed bounce buffer, this precludes the use of iflib.
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39873
|