Skip to content

Commit 73d4d64

Browse files
committed
Merge branch 'net-openvswitch-add-sample-multicasting'
Adrian Moreno says: ==================== net: openvswitch: Add sample multicasting. ** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath. A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation. ** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems. Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy. Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata. ** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie. The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample. Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 064fbc4 + 30d772a commit 73d4d64

File tree

13 files changed

+566
-16
lines changed

13 files changed

+566
-16
lines changed

Documentation/netlink/specs/ovs_flow.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -727,6 +727,12 @@ attribute-sets:
727727
name: dec-ttl
728728
type: nest
729729
nested-attributes: dec-ttl-attrs
730+
-
731+
name: psample
732+
type: nest
733+
nested-attributes: psample-attrs
734+
doc: |
735+
Sends a packet sample to psample for external observation.
730736
-
731737
name: tunnel-key-attrs
732738
enum-name: ovs-tunnel-key-attr
@@ -938,6 +944,17 @@ attribute-sets:
938944
-
939945
name: gbp
940946
type: u32
947+
-
948+
name: psample-attrs
949+
enum-name: ovs-psample-attr
950+
name-prefix: ovs-psample-attr-
951+
attributes:
952+
-
953+
name: group
954+
type: u32
955+
-
956+
name: cookie
957+
type: binary
941958

942959
operations:
943960
name-prefix: ovs-flow-cmd-

include/net/psample.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,10 @@ struct psample_metadata {
2424
u8 out_tc_valid:1,
2525
out_tc_occ_valid:1,
2626
latency_valid:1,
27-
unused:5;
27+
rate_as_probability:1,
28+
unused:4;
29+
const u8 *user_cookie;
30+
u32 user_cookie_len;
2831
};
2932

3033
struct psample_group *psample_group_get(struct net *net, u32 group_num);

include/uapi/linux/openvswitch.h

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,7 +649,8 @@ enum ovs_flow_attr {
649649
* Actions are passed as nested attributes.
650650
*
651651
* Executes the specified actions with the given probability on a per-packet
652-
* basis.
652+
* basis. Nested actions will be able to access the probability value of the
653+
* parent @OVS_ACTION_ATTR_SAMPLE.
653654
*/
654655
enum ovs_sample_attr {
655656
OVS_SAMPLE_ATTR_UNSPEC,
@@ -914,6 +915,31 @@ struct check_pkt_len_arg {
914915
};
915916
#endif
916917

918+
#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
919+
/**
920+
* enum ovs_psample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE
921+
* action.
922+
*
923+
* @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
924+
* sample.
925+
* @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that
926+
* contains user-defined metadata. The maximum length is
927+
* OVS_PSAMPLE_COOKIE_MAX_SIZE bytes.
928+
*
929+
* Sends the packet to the psample multicast group with the specified group and
930+
* cookie. It is possible to combine this action with the
931+
* %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample.
932+
*/
933+
enum ovs_psample_attr {
934+
OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */
935+
OVS_PSAMPLE_ATTR_COOKIE, /* Optional, user specified cookie. */
936+
937+
/* private: */
938+
__OVS_PSAMPLE_ATTR_MAX
939+
};
940+
941+
#define OVS_PSAMPLE_ATTR_MAX (__OVS_PSAMPLE_ATTR_MAX - 1)
942+
917943
/**
918944
* enum ovs_action_attr - Action types.
919945
*
@@ -966,6 +992,8 @@ struct check_pkt_len_arg {
966992
* of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS
967993
* argument.
968994
* @OVS_ACTION_ATTR_DROP: Explicit drop action.
995+
* @OVS_ACTION_ATTR_PSAMPLE: Send a sample of the packet to external observers
996+
* via psample.
969997
*
970998
* Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all
971999
* fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -1004,6 +1032,7 @@ enum ovs_action_attr {
10041032
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
10051033
OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */
10061034
OVS_ACTION_ATTR_DROP, /* u32 error code. */
1035+
OVS_ACTION_ATTR_PSAMPLE, /* Nested OVS_PSAMPLE_ATTR_*. */
10071036

10081037
__OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted
10091038
* from userspace. */

include/uapi/linux/psample.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ enum {
88
PSAMPLE_ATTR_ORIGSIZE,
99
PSAMPLE_ATTR_SAMPLE_GROUP,
1010
PSAMPLE_ATTR_GROUP_SEQ,
11-
PSAMPLE_ATTR_SAMPLE_RATE,
11+
PSAMPLE_ATTR_SAMPLE_RATE, /* u32, ratio between observed and
12+
* sampled packets or scaled probability
13+
* if PSAMPLE_ATTR_SAMPLE_PROBABILITY
14+
* is set.
15+
*/
1216
PSAMPLE_ATTR_DATA,
1317
PSAMPLE_ATTR_GROUP_REFCOUNT,
1418
PSAMPLE_ATTR_TUNNEL,
@@ -19,6 +23,11 @@ enum {
1923
PSAMPLE_ATTR_LATENCY, /* u64, nanoseconds */
2024
PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
2125
PSAMPLE_ATTR_PROTO, /* u16 */
26+
PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */
27+
PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in
28+
* PSAMPLE_ATTR_SAMPLE_RATE as a
29+
* probability scaled 0 - U32_MAX.
30+
*/
2231

2332
__PSAMPLE_ATTR_MAX
2433
};

net/openvswitch/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ config OPENVSWITCH
1010
(NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
1111
(!NF_NAT || NF_NAT) && \
1212
(!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
13+
depends on PSAMPLE || !PSAMPLE
1314
select LIBCRC32C
1415
select MPLS
1516
select NET_MPLS_GSO

net/openvswitch/actions.c

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@
2424
#include <net/checksum.h>
2525
#include <net/dsfield.h>
2626
#include <net/mpls.h>
27+
28+
#if IS_ENABLED(CONFIG_PSAMPLE)
29+
#include <net/psample.h>
30+
#endif
31+
2732
#include <net/sctp/checksum.h>
2833

2934
#include "datapath.h"
@@ -1043,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
10431048
struct nlattr *sample_arg;
10441049
int rem = nla_len(attr);
10451050
const struct sample_arg *arg;
1051+
u32 init_probability;
10461052
bool clone_flow_key;
1053+
int err;
10471054

10481055
/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
10491056
sample_arg = nla_data(attr);
10501057
arg = nla_data(sample_arg);
10511058
actions = nla_next(sample_arg, &rem);
1059+
init_probability = OVS_CB(skb)->probability;
10521060

10531061
if ((arg->probability != U32_MAX) &&
10541062
(!arg->probability || get_random_u32() > arg->probability)) {
@@ -1057,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
10571065
return 0;
10581066
}
10591067

1068+
OVS_CB(skb)->probability = arg->probability;
1069+
10601070
clone_flow_key = !arg->exec;
1061-
return clone_execute(dp, skb, key, 0, actions, rem, last,
1062-
clone_flow_key);
1071+
err = clone_execute(dp, skb, key, 0, actions, rem, last,
1072+
clone_flow_key);
1073+
1074+
if (!last)
1075+
OVS_CB(skb)->probability = init_probability;
1076+
1077+
return err;
10631078
}
10641079

10651080
/* When 'last' is true, clone() should always consume the 'skb'.
@@ -1299,6 +1314,44 @@ static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
12991314
return 0;
13001315
}
13011316

1317+
#if IS_ENABLED(CONFIG_PSAMPLE)
1318+
static void execute_psample(struct datapath *dp, struct sk_buff *skb,
1319+
const struct nlattr *attr)
1320+
{
1321+
struct psample_group psample_group = {};
1322+
struct psample_metadata md = {};
1323+
const struct nlattr *a;
1324+
u32 rate;
1325+
int rem;
1326+
1327+
nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) {
1328+
switch (nla_type(a)) {
1329+
case OVS_PSAMPLE_ATTR_GROUP:
1330+
psample_group.group_num = nla_get_u32(a);
1331+
break;
1332+
1333+
case OVS_PSAMPLE_ATTR_COOKIE:
1334+
md.user_cookie = nla_data(a);
1335+
md.user_cookie_len = nla_len(a);
1336+
break;
1337+
}
1338+
}
1339+
1340+
psample_group.net = ovs_dp_get_net(dp);
1341+
md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex;
1342+
md.trunc_size = skb->len - OVS_CB(skb)->cutlen;
1343+
md.rate_as_probability = 1;
1344+
1345+
rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX;
1346+
1347+
psample_sample_packet(&psample_group, skb, rate, &md);
1348+
}
1349+
#else
1350+
static void execute_psample(struct datapath *dp, struct sk_buff *skb,
1351+
const struct nlattr *attr)
1352+
{}
1353+
#endif
1354+
13021355
/* Execute a list of actions against 'skb'. */
13031356
static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
13041357
struct sw_flow_key *key,
@@ -1502,6 +1555,15 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
15021555
ovs_kfree_skb_reason(skb, reason);
15031556
return 0;
15041557
}
1558+
1559+
case OVS_ACTION_ATTR_PSAMPLE:
1560+
execute_psample(dp, skb, a);
1561+
OVS_CB(skb)->cutlen = 0;
1562+
if (nla_is_last(a, rem)) {
1563+
consume_skb(skb);
1564+
return 0;
1565+
}
1566+
break;
15051567
}
15061568

15071569
if (unlikely(err)) {

net/openvswitch/datapath.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,12 +115,15 @@ struct datapath {
115115
* fragmented.
116116
* @acts_origlen: The netlink size of the flow actions applied to this skb.
117117
* @cutlen: The number of bytes from the packet end to be removed.
118+
* @probability: The sampling probability that was applied to this skb; 0 means
119+
* no sampling has occurred; U32_MAX means 100% probability.
118120
*/
119121
struct ovs_skb_cb {
120122
struct vport *input_vport;
121123
u16 mru;
122124
u16 acts_origlen;
123125
u32 cutlen;
126+
u32 probability;
124127
};
125128
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
126129

net/openvswitch/flow_netlink.c

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ static bool actions_may_change_flow(const struct nlattr *actions)
6464
case OVS_ACTION_ATTR_TRUNC:
6565
case OVS_ACTION_ATTR_USERSPACE:
6666
case OVS_ACTION_ATTR_DROP:
67+
case OVS_ACTION_ATTR_PSAMPLE:
6768
break;
6869

6970
case OVS_ACTION_ATTR_CT:
@@ -2409,7 +2410,7 @@ static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len)
24092410
/* Whenever new actions are added, the need to update this
24102411
* function should be considered.
24112412
*/
2412-
BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 24);
2413+
BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 25);
24132414

24142415
if (!actions)
24152416
return;
@@ -3157,6 +3158,28 @@ static int validate_and_copy_check_pkt_len(struct net *net,
31573158
return 0;
31583159
}
31593160

3161+
static int validate_psample(const struct nlattr *attr)
3162+
{
3163+
static const struct nla_policy policy[OVS_PSAMPLE_ATTR_MAX + 1] = {
3164+
[OVS_PSAMPLE_ATTR_GROUP] = { .type = NLA_U32 },
3165+
[OVS_PSAMPLE_ATTR_COOKIE] = {
3166+
.type = NLA_BINARY,
3167+
.len = OVS_PSAMPLE_COOKIE_MAX_SIZE,
3168+
},
3169+
};
3170+
struct nlattr *a[OVS_PSAMPLE_ATTR_MAX + 1];
3171+
int err;
3172+
3173+
if (!IS_ENABLED(CONFIG_PSAMPLE))
3174+
return -EOPNOTSUPP;
3175+
3176+
err = nla_parse_nested(a, OVS_PSAMPLE_ATTR_MAX, attr, policy, NULL);
3177+
if (err)
3178+
return err;
3179+
3180+
return a[OVS_PSAMPLE_ATTR_GROUP] ? 0 : -EINVAL;
3181+
}
3182+
31603183
static int copy_action(const struct nlattr *from,
31613184
struct sw_flow_actions **sfa, bool log)
31623185
{
@@ -3212,6 +3235,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
32123235
[OVS_ACTION_ATTR_ADD_MPLS] = sizeof(struct ovs_action_add_mpls),
32133236
[OVS_ACTION_ATTR_DEC_TTL] = (u32)-1,
32143237
[OVS_ACTION_ATTR_DROP] = sizeof(u32),
3238+
[OVS_ACTION_ATTR_PSAMPLE] = (u32)-1,
32153239
};
32163240
const struct ovs_action_push_vlan *vlan;
32173241
int type = nla_type(a);
@@ -3490,6 +3514,12 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
34903514
return -EINVAL;
34913515
break;
34923516

3517+
case OVS_ACTION_ATTR_PSAMPLE:
3518+
err = validate_psample(a);
3519+
if (err)
3520+
return err;
3521+
break;
3522+
34933523
default:
34943524
OVS_NLERR(log, "Unknown Action type %d", type);
34953525
return -EINVAL;

net/openvswitch/vport.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
500500
OVS_CB(skb)->input_vport = vport;
501501
OVS_CB(skb)->mru = 0;
502502
OVS_CB(skb)->cutlen = 0;
503+
OVS_CB(skb)->probability = 0;
503504
if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) {
504505
u32 mark;
505506

net/psample/psample.c

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
376376
void *data;
377377
int ret;
378378

379+
if (!genl_has_listeners(&psample_nl_family, group->net,
380+
PSAMPLE_NL_MCGRP_SAMPLE))
381+
return;
382+
379383
meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
380384
(out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
381385
(md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
@@ -386,7 +390,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
386390
nla_total_size(sizeof(u32)) + /* group_num */
387391
nla_total_size(sizeof(u32)) + /* seq */
388392
nla_total_size_64bit(sizeof(u64)) + /* timestamp */
389-
nla_total_size(sizeof(u16)); /* protocol */
393+
nla_total_size(sizeof(u16)) + /* protocol */
394+
(md->user_cookie_len ?
395+
nla_total_size(md->user_cookie_len) : 0); /* user cookie */
390396

391397
#ifdef CONFIG_INET
392398
tun_info = skb_tunnel_info(skb);
@@ -486,6 +492,14 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
486492
}
487493
#endif
488494

495+
if (md->user_cookie && md->user_cookie_len &&
496+
nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
497+
md->user_cookie))
498+
goto error;
499+
500+
if (md->rate_as_probability)
501+
nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY);
502+
489503
genlmsg_end(nl_skb, data);
490504
genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
491505
PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);

net/sched/act_sample.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
167167
{
168168
struct tcf_sample *s = to_sample(a);
169169
struct psample_group *psample_group;
170+
u8 cookie_data[TC_COOKIE_MAX_SIZE];
170171
struct psample_metadata md = {};
172+
struct tc_cookie *user_cookie;
171173
int retval;
172174

173175
tcf_lastuse_update(&s->tcf_tm);
@@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
189191
if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
190192
skb_push(skb, skb->mac_len);
191193

194+
rcu_read_lock();
195+
user_cookie = rcu_dereference(a->user_cookie);
196+
if (user_cookie) {
197+
memcpy(cookie_data, user_cookie->data,
198+
user_cookie->len);
199+
md.user_cookie = cookie_data;
200+
md.user_cookie_len = user_cookie->len;
201+
}
202+
rcu_read_unlock();
203+
192204
md.trunc_size = s->truncate ? s->trunc_size : skb->len;
193205
psample_sample_packet(psample_group, skb, s->rate, &md);
194206

0 commit comments

Comments
 (0)