Skip to content

Commit 6af49c7

Browse files
koen0607NipaLocal
authored andcommitted
sched: Add dualpi2 qdisc
DualPI2 provides L4S-type low latency & loss to traffic that uses a scalable congestion controller (e.g. TCP-Prague, DCTCP) without degrading the performance of 'classic' traffic (e.g. Reno, Cubic etc.). It is intended to be the reference implementation of the IETF's DualQ Coupled AQM. The qdisc provides two queues called low latency and classic. It classifies packets based on the ECN field in the IP headers. By default it directs non-ECN and ECT(0) into the classic queue and ECT(1) and CE into the low latency queue, as per the IETF spec. Each queue runs its own AQM: * The classic AQM is called PI2, which is similar to the PIE AQM but more responsive and simpler. Classic traffic requires a decent target queue (default 15ms for Internet deployment) to fully utilize the link and to avoid high drop rates. * The low latency AQM is, by default, a very shallow ECN marking threshold (1ms) similar to that used for DCTCP. The DualQ isolates the low queuing delay of the Low Latency queue from the larger delay of the 'Classic' queue. However, from a bandwidth perspective, flows in either queue will share out the link capacity as if there was just a single queue. This bandwidth pooling effect is achieved by coupling together the drop and ECN-marking probabilities of the two AQMs. The PI2 AQM has two main parameters in addition to its target delay. The integral gain factor alpha is used to slowly correct any persistent standing queue error from the target delay, while the proportional gain factor beta is used to quickly compensate for queue changes (growth or shrinkage). Either alpha and beta are given as a parameter, or they can be calculated by tc from alternative typical and maximum RTT parameters. Internally, the output of a linear Proportional Integral (PI) controller is used for both queues. This output is squared to calculate the drop or ECN-marking probability of the classic queue. This counterbalances the square-root rate equation of Reno/Cubic, which is the trick that balances flow rates across the queues. For the ECN-marking probability of the low latency queue, the output of the base AQM is multiplied by a coupling factor. This determines the balance between the flow rates in each queue. The default setting makes the flow rates roughly equal, which should be generally applicable. If DUALPI2 AQM has detected overload (due to excessive non-responsive traffic in either queue), it will switch to signaling congestion solely using drop, irrespective of the ECN field. Alternatively, it can be configured to limit the drop probability and let the queue grow and eventually overflow (like tail-drop). GSO splitting in DUALPI2 is configurable from userspace while the default behavior is to split gso. When running DUALPI2 at unshaped 10gigE with 4 download streams test, splitting gso apart results in halving the latency with no loss in throughput: Summary of tcp_4down run 'no_split_gso':      avg       median      # data pts Ping (ms) ICMP   :     0.53     0.30 ms         350 TCP download avg :   2326.86    N/A Mbits/s    350 TCP download sum :  9307.42    N/A Mbits/s    350 TCP download::1  :    2672.99 2568.73 Mbits/s   350 TCP download::2  :    2586.96  2570.51 Mbits/s   350 TCP download::3  :    1786.26  1798.82 Mbits/s   350 TCP download::4  :    2261.21   2309.49 Mbits/s   350 Summart of tcp_4down run 'split_gso':       avg     median      # data pts Ping (ms) ICMP   :       0.22    0.23 ms         350 TCP download avg :   2335.02       N/A Mbits/s    350 TCP download sum : 9340.09       N/A Mbits/s    350 TCP download::1  :    2335.30 2334.22 Mbits/s    350 TCP download::2  :    2334.72   2334.20 Mbits/s    350 TCP download::3  :    2335.28   2334.58 Mbits/s    350 TCP download::4  :    2334.79   2334.39 Mbits/s    350 A similar result is observed when running DUALPI2 at unshaped 1gigE with 1 download stream test: Summary of tcp_1down run 'no_split_gso': avg  median      # data pts Ping (ms) ICMP :         1.13      1.25 ms         350 TCP download   :     941.41    941.46 Mbits/s    350 Summart of tcp_1down run 'split_gso':       avg     median      # data pts Ping (ms) ICMP :      0.51      0.55 ms         350 TCP download   :       941.41   941.45 Mbits/s    350 Additional details can be found in the draft: https://datatracker.ietf.org/doc/html/rfc9332 Signed-off-by: Koen De Schepper <[email protected]> Co-developed-by: Olga Albisser <[email protected]> Signed-off-by: Olga Albisser <[email protected]> Co-developed-by: Olivier Tilmans <[email protected]> Signed-off-by: Olivier Tilmans <[email protected]> Co-developed-by: Henrik Steen <[email protected]> Signed-off-by: Henrik Steen <[email protected]> Signed-off-by: Bob Briscoe <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Co-developed-by: Chia-Yu Chang <[email protected]> Signed-off-by: Chia-Yu Chang <[email protected]> Signed-off-by: NipaLocal <nipa@local>
1 parent a2aae8c commit 6af49c7

File tree

8 files changed

+1423
-0
lines changed

8 files changed

+1423
-0
lines changed

Documentation/netlink/specs/tc.yaml

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -816,6 +816,58 @@ definitions:
816816
-
817817
name: drop-overmemory
818818
type: u32
819+
-
820+
name: tc-dualpi2-xstats
821+
type: struct
822+
members:
823+
-
824+
name: prob
825+
type: u32
826+
doc: Current probability
827+
-
828+
name: delay_c
829+
type: u32
830+
doc: Current C-queue delay in microseconds
831+
-
832+
name: delay_l
833+
type: u32
834+
doc: Current L-queue delay in microseconds
835+
-
836+
name: pkts_in_c
837+
type: u32
838+
doc: Number of packets enqueued in the C-queue
839+
-
840+
name: pkts_in_l
841+
type: u32
842+
doc: Number of packets enqueued in the L-queue
843+
-
844+
name: maxq
845+
type: u32
846+
doc: Maximum number of packets seen by the DualPI2
847+
-
848+
name: ecn_mark
849+
type: u32
850+
doc: All packets marked with ecn
851+
-
852+
name: step_mark
853+
type: u32
854+
doc: Only packets marked with ecn due to L-queue step AQM
855+
-
856+
name: credit
857+
type: s32
858+
doc: Current credit value for WRR
859+
-
860+
name: memory_used
861+
type: u32
862+
doc: Memory used in bytes by the DualPI2
863+
-
864+
name: max_memory_used
865+
type: u32
866+
doc: Maximum memory used in bytes by the DualPI2
867+
-
868+
name: memory_limit
869+
type: u32
870+
doc: Memory limit in bytes
819871
-
820872
name: tc-fq-pie-xstats
821873
type: struct
@@ -2299,6 +2351,88 @@ attribute-sets:
22992351
-
23002352
name: quantum
23012353
type: u32
2354+
-
2355+
name: tc-dualpi2-attrs
2356+
attributes:
2357+
-
2358+
name: limit
2359+
type: u32
2360+
doc: Limit of total number of packets in queue
2361+
-
2362+
name: memlimit
2363+
type: u32
2364+
doc: Memory limit of total number of packets in queue
2365+
-
2366+
name: target
2367+
type: u32
2368+
doc: Classic target delay in microseconds
2369+
-
2370+
name: tupdate
2371+
type: u32
2372+
doc: Drop probability update interval time in microseconds
2373+
-
2374+
name: alpha
2375+
type: u32
2376+
doc: Integral gain factor in Hz for PI controller
2377+
-
2378+
name: beta
2379+
type: u32
2380+
doc: Proportional gain factor in Hz for PI controller
2381+
-
2382+
name: step_thresh
2383+
type: u32
2384+
doc: L4S step marking threshold in microseconds or in packet (see step_packets)
2385+
-
2386+
name: step_packets
2387+
type: flags
2388+
doc: L4S Step marking threshold unit
2389+
entries:
2390+
- microseconds
2391+
- packets
2392+
-
2393+
name: coupling_factor
2394+
type: u8
2395+
doc: Probability coupling factor between Classic and L4S (2 is recommended)
2396+
-
2397+
name: drop_overload
2398+
type: flags
2399+
doc: Control the overload strategy (drop to preserve latency or let the queue overflow)
2400+
entries:
2401+
- drop_on_overload
2402+
- overflow
2403+
-
2404+
name: drop_early
2405+
type: flags
2406+
doc: Decide where the Classic packets are PI-based dropped or marked
2407+
entries:
2408+
- drop_enqueue
2409+
- drop_dequeue
2410+
-
2411+
name: classic_protection
2412+
type: u8
2413+
doc: Classic WRR weight in percentage (from 0 to 100)
2414+
-
2415+
name: ecn_mask
2416+
type: flags
2417+
doc: Configure the L-queue ECN classifier
2418+
entries:
2419+
- l4s_ect
2420+
- any_ect
2421+
-
2422+
name: gso_split
2423+
type: flags
2424+
doc: Split aggregated skb or not
2425+
entries:
2426+
- split_gso
2427+
- no_split_gso
2428+
-
2429+
name: max_rtt
2430+
type: u32
2431+
doc: The maximum expected RTT of the traffic that is controlled by DualPI2 in usec
2432+
-
2433+
name: typical_rtt
2434+
type: u32
2435+
doc: The typical base RTT of the traffic that is controlled by DualPI2 in usec
23022436
-
23032437
name: tc-ematch-attrs
23042438
attributes:
@@ -3679,6 +3813,9 @@ sub-messages:
36793813
-
36803814
value: drr
36813815
attribute-set: tc-drr-attrs
3816+
-
3817+
value: dualpi2
3818+
attribute-set: tc-dualpi2-attrs
36823819
-
36833820
value: etf
36843821
attribute-set: tc-etf-attrs
@@ -3846,6 +3983,9 @@ sub-messages:
38463983
-
38473984
value: codel
38483985
fixed-header: tc-codel-xstats
3986+
-
3987+
value: dualpi2
3988+
fixed-header: tc-dualpi2-xstats
38493989
-
38503990
value: fq
38513991
fixed-header: tc-fq-qd-stats

include/linux/netdevice.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include <asm/byteorder.h>
3131
#include <asm/local.h>
3232

33+
#include <linux/netdev_features.h>
3334
#include <linux/percpu.h>
3435
#include <linux/rculist.h>
3536
#include <linux/workqueue.h>

include/uapi/linux/pkt_sched.h

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1210,4 +1210,42 @@ enum {
12101210

12111211
#define TCA_ETS_MAX (__TCA_ETS_MAX - 1)
12121212

1213+
/* DUALPI2 */
1214+
enum {
1215+
TCA_DUALPI2_UNSPEC,
1216+
TCA_DUALPI2_LIMIT, /* Packets */
1217+
TCA_DUALPI2_MEMORY_LIMIT, /* Bytes */
1218+
TCA_DUALPI2_TARGET, /* us */
1219+
TCA_DUALPI2_TUPDATE, /* us */
1220+
TCA_DUALPI2_ALPHA, /* Hz scaled up by 256 */
1221+
TCA_DUALPI2_BETA, /* HZ scaled up by 256 */
1222+
TCA_DUALPI2_STEP_THRESH, /* Packets or us */
1223+
TCA_DUALPI2_STEP_PACKETS, /* Whether STEP_THRESH is in packets */
1224+
TCA_DUALPI2_COUPLING, /* Coupling factor between queues */
1225+
TCA_DUALPI2_DROP_OVERLOAD, /* Whether to drop on overload */
1226+
TCA_DUALPI2_DROP_EARLY, /* Whether to drop on enqueue */
1227+
TCA_DUALPI2_C_PROTECTION, /* Percentage */
1228+
TCA_DUALPI2_ECN_MASK, /* L4S queue classification mask */
1229+
TCA_DUALPI2_SPLIT_GSO, /* Split GSO packets at enqueue */
1230+
TCA_DUALPI2_PAD,
1231+
__TCA_DUALPI2_MAX
1232+
};
1233+
1234+
#define TCA_DUALPI2_MAX (__TCA_DUALPI2_MAX - 1)
1235+
1236+
struct tc_dualpi2_xstats {
1237+
__u32 prob; /* current probability */
1238+
__u32 delay_c; /* current delay in C queue */
1239+
__u32 delay_l; /* current delay in L queue */
1240+
__s32 credit; /* current c_protection credit */
1241+
__u32 packets_in_c; /* number of packets enqueued in C queue */
1242+
__u32 packets_in_l; /* number of packets enqueued in L queue */
1243+
__u32 maxq; /* maximum queue size */
1244+
__u32 ecn_mark; /* packets marked with ecn*/
1245+
__u32 step_marks; /* ECN marks due to the step AQM */
1246+
__u32 memory_used; /* Meory used of both queues */
1247+
__u32 max_memory_used; /* Maximum used memory */
1248+
__u32 memory_limit; /* Memory limit of both queues */
1249+
};
1250+
12131251
#endif

net/sched/Kconfig

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,18 @@ config NET_SCH_ETS
403403

404404
If unsure, say N.
405405

406+
config NET_SCH_DUALPI2
407+
tristate "Dual Queue PI Square (DUALPI2) scheduler"
408+
help
409+
Say Y here if you want to use the Dual Queue Proportional Integral
410+
Controller Improved with a Square scheduling algorithm.
411+
For more information, please see https://tools.ietf.org/html/rfc9332
412+
413+
To compile this driver as a module, choose M here: the module
414+
will be called sch_dualpi2.
415+
416+
If unsure, say N.
417+
406418
menuconfig NET_SCH_DEFAULT
407419
bool "Allow override default queue discipline"
408420
help

net/sched/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ obj-$(CONFIG_NET_SCH_FQ_PIE) += sch_fq_pie.o
6262
obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o
6363
obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o
6464
obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o
65+
obj-$(CONFIG_NET_SCH_DUALPI2) += sch_dualpi2.o
6566

6667
obj-$(CONFIG_NET_CLS_U32) += cls_u32.o
6768
obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o

0 commit comments

Comments
 (0)