Skip to content

Commit 1738cd3

Browse files
Netanel Belgazaldavem330
authored andcommitted
net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
This is a driver for the ENA family of networking devices. Signed-off-by: Netanel Belgazal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 4330ea7 commit 1738cd3

20 files changed

+10858
-0
lines changed

Documentation/networking/00-INDEX

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@ dns_resolver.txt
7474
- The DNS resolver module allows kernel servies to make DNS queries.
7575
driver.txt
7676
- Softnet driver issues.
77+
ena.txt
78+
- info on Amazon's Elastic Network Adapter (ENA)
7779
e100.txt
7880
- info on Intel's EtherExpress PRO/100 line of 10/100 boards
7981
e1000.txt

Documentation/networking/ena.txt

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
Linux kernel driver for Elastic Network Adapter (ENA) family:
2+
=============================================================
3+
4+
Overview:
5+
=========
6+
ENA is a networking interface designed to make good use of modern CPU
7+
features and system architectures.
8+
9+
The ENA device exposes a lightweight management interface with a
10+
minimal set of memory mapped registers and extendable command set
11+
through an Admin Queue.
12+
13+
The driver supports a range of ENA devices, is link-speed independent
14+
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
15+
a negotiated and extendable feature set.
16+
17+
Some ENA devices support SR-IOV. This driver is used for both the
18+
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
19+
20+
ENA devices enable high speed and low overhead network traffic
21+
processing by providing multiple Tx/Rx queue pairs (the maximum number
22+
is advertised by the device via the Admin Queue), a dedicated MSI-X
23+
interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
24+
and CPU cacheline optimized data placement.
25+
26+
The ENA driver supports industry standard TCP/IP offload features such
27+
as checksum offload and TCP transmit segmentation offload (TSO).
28+
Receive-side scaling (RSS) is supported for multi-core scaling.
29+
30+
The ENA driver and its corresponding devices implement health
31+
monitoring mechanisms such as watchdog, enabling the device and driver
32+
to recover in a manner transparent to the application, as well as
33+
debug logs.
34+
35+
Some of the ENA devices support a working mode called Low-latency
36+
Queue (LLQ), which saves several more microseconds.
37+
38+
Supported PCI vendor ID/device IDs:
39+
===================================
40+
1d0f:0ec2 - ENA PF
41+
1d0f:1ec2 - ENA PF with LLQ support
42+
1d0f:ec20 - ENA VF
43+
1d0f:ec21 - ENA VF with LLQ support
44+
45+
ENA Source Code Directory Structure:
46+
====================================
47+
ena_com.[ch] - Management communication layer. This layer is
48+
responsible for the handling all the management
49+
(admin) communication between the device and the
50+
driver.
51+
ena_eth_com.[ch] - Tx/Rx data path.
52+
ena_admin_defs.h - Definition of ENA management interface.
53+
ena_eth_io_defs.h - Definition of ENA data path interface.
54+
ena_common_defs.h - Common definitions for ena_com layer.
55+
ena_regs_defs.h - Definition of ENA PCI memory-mapped (MMIO) registers.
56+
ena_netdev.[ch] - Main Linux kernel driver.
57+
ena_syfsfs.[ch] - Sysfs files.
58+
ena_ethtool.c - ethtool callbacks.
59+
ena_pci_id_tbl.h - Supported device IDs.
60+
61+
Management Interface:
62+
=====================
63+
ENA management interface is exposed by means of:
64+
- PCIe Configuration Space
65+
- Device Registers
66+
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
67+
- Asynchronous Event Notification Queue (AENQ)
68+
69+
ENA device MMIO Registers are accessed only during driver
70+
initialization and are not involved in further normal device
71+
operation.
72+
73+
AQ is used for submitting management commands, and the
74+
results/responses are reported asynchronously through ACQ.
75+
76+
ENA introduces a very small set of management commands with room for
77+
vendor-specific extensions. Most of the management operations are
78+
framed in a generic Get/Set feature command.
79+
80+
The following admin queue commands are supported:
81+
- Create I/O submission queue
82+
- Create I/O completion queue
83+
- Destroy I/O submission queue
84+
- Destroy I/O completion queue
85+
- Get feature
86+
- Set feature
87+
- Configure AENQ
88+
- Get statistics
89+
90+
Refer to ena_admin_defs.h for the list of supported Get/Set Feature
91+
properties.
92+
93+
The Asynchronous Event Notification Queue (AENQ) is a uni-directional
94+
queue used by the ENA device to send to the driver events that cannot
95+
be reported using ACQ. AENQ events are subdivided into groups. Each
96+
group may have multiple syndromes, as shown below
97+
98+
The events are:
99+
Group Syndrome
100+
Link state change - X -
101+
Fatal error - X -
102+
Notification Suspend traffic
103+
Notification Resume traffic
104+
Keep-Alive - X -
105+
106+
ACQ and AENQ share the same MSI-X vector.
107+
108+
Keep-Alive is a special mechanism that allows monitoring of the
109+
device's health. The driver maintains a watchdog (WD) handler which,
110+
if fired, logs the current state and statistics then resets and
111+
restarts the ENA device and driver. A Keep-Alive event is delivered by
112+
the device every second. The driver re-arms the WD upon reception of a
113+
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
114+
fire.
115+
116+
Data Path Interface:
117+
====================
118+
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
119+
SQ correspondingly). Each SQ has a completion queue (CQ) associated
120+
with it.
121+
122+
The SQs and CQs are implemented as descriptor rings in contiguous
123+
physical memory.
124+
125+
The ENA driver supports two Queue Operation modes for Tx SQs:
126+
- Regular mode
127+
* In this mode the Tx SQs reside in the host's memory. The ENA
128+
device fetches the ENA Tx descriptors and packet data from host
129+
memory.
130+
- Low Latency Queue (LLQ) mode or "push-mode".
131+
* In this mode the driver pushes the transmit descriptors and the
132+
first 128 bytes of the packet directly to the ENA device memory
133+
space. The rest of the packet payload is fetched by the
134+
device. For this operation mode, the driver uses a dedicated PCI
135+
device memory BAR, which is mapped with write-combine capability.
136+
137+
The Rx SQs support only the regular mode.
138+
139+
Note: Not all ENA devices support LLQ, and this feature is negotiated
140+
with the device upon initialization. If the ENA device does not
141+
support LLQ mode, the driver falls back to the regular mode.
142+
143+
The driver supports multi-queue for both Tx and Rx. This has various
144+
benefits:
145+
- Reduced CPU/thread/process contention on a given Ethernet interface.
146+
- Cache miss rate on completion is reduced, particularly for data
147+
cache lines that hold the sk_buff structures.
148+
- Increased process-level parallelism when handling received packets.
149+
- Increased data cache hit rate, by steering kernel processing of
150+
packets to the CPU, where the application thread consuming the
151+
packet is running.
152+
- In hardware interrupt re-direction.
153+
154+
Interrupt Modes:
155+
================
156+
The driver assigns a single MSI-X vector per queue pair (for both Tx
157+
and Rx directions). The driver assigns an additional dedicated MSI-X vector
158+
for management (for ACQ and AENQ).
159+
160+
Management interrupt registration is performed when the Linux kernel
161+
probes the adapter, and it is de-registered when the adapter is
162+
removed. I/O queue interrupt registration is performed when the Linux
163+
interface of the adapter is opened, and it is de-registered when the
164+
interface is closed.
165+
166+
The management interrupt is named:
167+
ena-mgmnt@pci:<PCI domain:bus:slot.function>
168+
and for each queue pair, an interrupt is named:
169+
<interface name>-Tx-Rx-<queue index>
170+
171+
The ENA device operates in auto-mask and auto-clear interrupt
172+
modes. That is, once MSI-X is delivered to the host, its Cause bit is
173+
automatically cleared and the interrupt is masked. The interrupt is
174+
unmasked by the driver after NAPI processing is complete.
175+
176+
Interrupt Moderation:
177+
=====================
178+
ENA driver and device can operate in conventional or adaptive interrupt
179+
moderation mode.
180+
181+
In conventional mode the driver instructs device to postpone interrupt
182+
posting according to static interrupt delay value. The interrupt delay
183+
value can be configured through ethtool(8). The following ethtool
184+
parameters are supported by the driver: tx-usecs, rx-usecs
185+
186+
In adaptive interrupt moderation mode the interrupt delay value is
187+
updated by the driver dynamically and adjusted every NAPI cycle
188+
according to the traffic nature.
189+
190+
By default ENA driver applies adaptive coalescing on Rx traffic and
191+
conventional coalescing on Tx traffic.
192+
193+
Adaptive coalescing can be switched on/off through ethtool(8)
194+
adaptive_rx on|off parameter.
195+
196+
The driver chooses interrupt delay value according to the number of
197+
bytes and packets received between interrupt unmasking and interrupt
198+
posting. The driver uses interrupt delay table that subdivides the
199+
range of received bytes/packets into 5 levels and assigns interrupt
200+
delay value to each level.
201+
202+
The user can enable/disable adaptive moderation, modify the interrupt
203+
delay table and restore its default values through sysfs.
204+
205+
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
206+
and can be configured by the ETHTOOL_STUNABLE command of the
207+
SIOCETHTOOL ioctl.
208+
209+
SKB:
210+
The driver-allocated SKB for frames received from Rx handling using
211+
NAPI context. The allocation method depends on the size of the packet.
212+
If the frame length is larger than rx_copybreak, napi_get_frags()
213+
is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer
214+
content is copied (by CPU) to the SKB, and the buffer is recycled.
215+
216+
Statistics:
217+
===========
218+
The user can obtain ENA device and driver statistics using ethtool.
219+
The driver can collect regular or extended statistics (including
220+
per-queue stats) from the device.
221+
222+
In addition the driver logs the stats to syslog upon device reset.
223+
224+
MTU:
225+
====
226+
The driver supports an arbitrarily large MTU with a maximum that is
227+
negotiated with the device. The driver configures MTU using the
228+
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
229+
via ip(8) and similar legacy tools.
230+
231+
Stateless Offloads:
232+
===================
233+
The ENA driver supports:
234+
- TSO over IPv4/IPv6
235+
- TSO with ECN
236+
- IPv4 header checksum offload
237+
- TCP/UDP over IPv4/IPv6 checksum offloads
238+
239+
RSS:
240+
====
241+
- The ENA device supports RSS that allows flexible Rx traffic
242+
steering.
243+
- Toeplitz and CRC32 hash functions are supported.
244+
- Different combinations of L2/L3/L4 fields can be configured as
245+
inputs for hash functions.
246+
- The driver configures RSS settings using the AQ SetFeature command
247+
(ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
248+
ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
249+
- If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
250+
function delivered in the Rx CQ descriptor is set in the received
251+
SKB.
252+
- The user can provide a hash key, hash function, and configure the
253+
indirection table through ethtool(8).
254+
255+
DATA PATH:
256+
==========
257+
Tx:
258+
---
259+
end_start_xmit() is called by the stack. This function does the following:
260+
- Maps data buffers (skb->data and frags).
261+
- Populates ena_buf for the push buffer (if the driver and device are
262+
in push mode.)
263+
- Prepares ENA bufs for the remaining frags.
264+
- Allocates a new request ID from the empty req_id ring. The request
265+
ID is the index of the packet in the Tx info. This is used for
266+
out-of-order TX completions.
267+
- Adds the packet to the proper place in the Tx ring.
268+
- Calls ena_com_prepare_tx(), an ENA communication layer that converts
269+
the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
270+
needed.)
271+
* This function also copies the ENA descriptors and the push buffer
272+
to the Device memory space (if in push mode.)
273+
- Writes doorbell to the ENA device.
274+
- When the ENA device finishes sending the packet, a completion
275+
interrupt is raised.
276+
- The interrupt handler schedules NAPI.
277+
- The ena_clean_tx_irq() function is called. This function handles the
278+
completion descriptors generated by the ENA, with a single
279+
completion descriptor per completed packet.
280+
* req_id is retrieved from the completion descriptor. The tx_info of
281+
the packet is retrieved via the req_id. The data buffers are
282+
unmapped and req_id is returned to the empty req_id ring.
283+
* The function stops when the completion descriptors are completed or
284+
the budget is reached.
285+
286+
Rx:
287+
---
288+
- When a packet is received from the ENA device.
289+
- The interrupt handler schedules NAPI.
290+
- The ena_clean_rx_irq() function is called. This function calls
291+
ena_rx_pkt(), an ENA communication layer function, which returns the
292+
number of descriptors used for a new unhandled packet, and zero if
293+
no new packet is found.
294+
- Then it calls the ena_clean_rx_irq() function.
295+
- ena_eth_rx_skb() checks packet length:
296+
* If the packet is small (len < rx_copybreak), the driver allocates
297+
a SKB for the new packet, and copies the packet payload into the
298+
SKB data buffer.
299+
- In this way the original data buffer is not passed to the stack
300+
and is reused for future Rx packets.
301+
* Otherwise the function unmaps the Rx buffer, then allocates the
302+
new SKB structure and hooks the Rx buffer to the SKB frags.
303+
- The new SKB is updated with the necessary information (protocol,
304+
checksum hw verify result, etc.), and then passed to the network
305+
stack, using the NAPI interface function napi_gro_receive().

MAINTAINERS

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -636,6 +636,15 @@ F: drivers/tty/serial/altera_jtaguart.c
636636
F: include/linux/altera_uart.h
637637
F: include/linux/altera_jtaguart.h
638638

639+
AMAZON ETHERNET DRIVERS
640+
M: Netanel Belgazal <[email protected]>
641+
R: Saeed Bishara <[email protected]>
642+
R: Zorik Machulsky <[email protected]>
643+
644+
S: Supported
645+
F: Documentation/networking/ena.txt
646+
F: drivers/net/ethernet/amazon/
647+
639648
AMD CRYPTOGRAPHIC COPROCESSOR (CCP) DRIVER
640649
M: Tom Lendacky <[email protected]>
641650
M: Gary Hook <[email protected]>

drivers/net/ethernet/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ source "drivers/net/ethernet/agere/Kconfig"
2424
source "drivers/net/ethernet/allwinner/Kconfig"
2525
source "drivers/net/ethernet/alteon/Kconfig"
2626
source "drivers/net/ethernet/altera/Kconfig"
27+
source "drivers/net/ethernet/amazon/Kconfig"
2728
source "drivers/net/ethernet/amd/Kconfig"
2829
source "drivers/net/ethernet/apm/Kconfig"
2930
source "drivers/net/ethernet/apple/Kconfig"

drivers/net/ethernet/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ obj-$(CONFIG_NET_VENDOR_AGERE) += agere/
1010
obj-$(CONFIG_NET_VENDOR_ALLWINNER) += allwinner/
1111
obj-$(CONFIG_NET_VENDOR_ALTEON) += alteon/
1212
obj-$(CONFIG_ALTERA_TSE) += altera/
13+
obj-$(CONFIG_NET_VENDOR_AMAZON) += amazon/
1314
obj-$(CONFIG_NET_VENDOR_AMD) += amd/
1415
obj-$(CONFIG_NET_XGENE) += apm/
1516
obj-$(CONFIG_NET_VENDOR_APPLE) += apple/

drivers/net/ethernet/amazon/Kconfig

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#
2+
# Amazon network device configuration
3+
#
4+
5+
config NET_VENDOR_AMAZON
6+
bool "Amazon Devices"
7+
default y
8+
---help---
9+
If you have a network (Ethernet) device belonging to this class, say Y.
10+
11+
Note that the answer to this question doesn't directly affect the
12+
kernel: saying N will just cause the configurator to skip all
13+
the questions about Amazon devices. If you say Y, you will be asked
14+
for your specific device in the following questions.
15+
16+
if NET_VENDOR_AMAZON
17+
18+
config ENA_ETHERNET
19+
tristate "Elastic Network Adapter (ENA) support"
20+
depends on (PCI_MSI && X86)
21+
---help---
22+
This driver supports Elastic Network Adapter (ENA)"
23+
24+
To compile this driver as a module, choose M here.
25+
The module will be called ena.
26+
27+
endif #NET_VENDOR_AMAZON

drivers/net/ethernet/amazon/Makefile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#
2+
# Makefile for the Amazon network device drivers.
3+
#
4+
5+
obj-$(CONFIG_ENA_ETHERNET) += ena/
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#
2+
# Makefile for the Elastic Network Adapter (ENA) device drivers.
3+
#
4+
5+
obj-$(CONFIG_ENA_ETHERNET) += ena.o
6+
7+
ena-y := ena_netdev.o ena_com.o ena_eth_com.o ena_ethtool.o

0 commit comments

Comments
 (0)