Skip to content

Commit c6b5fb8

Browse files
qmonnetborkmann
authored andcommitted
bpf: add documentation for eBPF helpers (42-50)
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Kaixu: - bpf_perf_event_read() Helpers from Martin: - bpf_skb_under_cgroup() - bpf_xdp_adjust_head() Helpers from Sargun: - bpf_probe_write_user() - bpf_current_task_under_cgroup() Helper from Thomas: - bpf_skb_change_head() Helper from Gianluca: - bpf_probe_read_str() Helpers from Chenbo: - bpf_get_socket_cookie() - bpf_get_socket_uid() v4: - bpf_perf_event_read(): State that bpf_perf_event_read_value() should be preferred over this helper. - bpf_skb_change_head(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier checks. - bpf_probe_write_user(): Add that dst must be a valid user space address. - bpf_get_socket_cookie(): Improve description by making clearer that the cockie belongs to the socket, and state that it remains stable for the life of the socket. v3: - bpf_perf_event_read(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". Cc: Martin KaFai Lau <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Thomas Graf <[email protected]> Cc: Gianluca Borello <[email protected]> Cc: Chenbo Feng <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()] Signed-off-by: Daniel Borkmann <[email protected]>
1 parent fa15601 commit c6b5fb8

File tree

1 file changed

+172
-0
lines changed

1 file changed

+172
-0
lines changed

include/uapi/linux/bpf.h

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,35 @@ union bpf_attr {
810810
* Return
811811
* 0 on success, or a negative error in case of failure.
812812
*
813+
* u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
814+
* Description
815+
* Read the value of a perf event counter. This helper relies on a
816+
* *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of
817+
* the perf event counter is selected when *map* is updated with
818+
* perf event file descriptors. The *map* is an array whose size
819+
* is the number of available CPUs, and each cell contains a value
820+
* relative to one CPU. The value to retrieve is indicated by
821+
* *flags*, that contains the index of the CPU to look up, masked
822+
* with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
823+
* **BPF_F_CURRENT_CPU** to indicate that the value for the
824+
* current CPU should be retrieved.
825+
*
826+
* Note that before Linux 4.13, only hardware perf event can be
827+
* retrieved.
828+
*
829+
* Also, be aware that the newer helper
830+
* **bpf_perf_event_read_value**\ () is recommended over
831+
* **bpf_perf_event_read*\ () in general. The latter has some ABI
832+
* quirks where error and counter value are used as a return code
833+
* (which is wrong to do since ranges may overlap). This issue is
834+
* fixed with bpf_perf_event_read_value(), which at the same time
835+
* provides more features over the **bpf_perf_event_read**\ ()
836+
* interface. Please refer to the description of
837+
* **bpf_perf_event_read_value**\ () for details.
838+
* Return
839+
* The value of the perf event counter read from the map, or a
840+
* negative error code in case of failure.
841+
*
813842
* int bpf_redirect(u32 ifindex, u64 flags)
814843
* Description
815844
* Redirect the packet to another net device of index *ifindex*.
@@ -1071,6 +1100,17 @@ union bpf_attr {
10711100
* Return
10721101
* 0 on success, or a negative error in case of failure.
10731102
*
1103+
* int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 index)
1104+
* Description
1105+
* Check whether *skb* is a descendant of the cgroup2 held by
1106+
* *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
1107+
* Return
1108+
* The return value depends on the result of the test, and can be:
1109+
*
1110+
* * 0, if the *skb* failed the cgroup2 descendant test.
1111+
* * 1, if the *skb* succeeded the cgroup2 descendant test.
1112+
* * A negative error code, if an error occurred.
1113+
*
10741114
* u32 bpf_get_hash_recalc(struct sk_buff *skb)
10751115
* Description
10761116
* Retrieve the hash of the packet, *skb*\ **->hash**. If it is
@@ -1091,6 +1131,37 @@ union bpf_attr {
10911131
* Return
10921132
* A pointer to the current task struct.
10931133
*
1134+
* int bpf_probe_write_user(void *dst, const void *src, u32 len)
1135+
* Description
1136+
* Attempt in a safe way to write *len* bytes from the buffer
1137+
* *src* to *dst* in memory. It only works for threads that are in
1138+
* user context, and *dst* must be a valid user space address.
1139+
*
1140+
* This helper should not be used to implement any kind of
1141+
* security mechanism because of TOC-TOU attacks, but rather to
1142+
* debug, divert, and manipulate execution of semi-cooperative
1143+
* processes.
1144+
*
1145+
* Keep in mind that this feature is meant for experiments, and it
1146+
* has a risk of crashing the system and running programs.
1147+
* Therefore, when an eBPF program using this helper is attached,
1148+
* a warning including PID and process name is printed to kernel
1149+
* logs.
1150+
* Return
1151+
* 0 on success, or a negative error in case of failure.
1152+
*
1153+
* int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
1154+
* Description
1155+
* Check whether the probe is being run is the context of a given
1156+
* subset of the cgroup2 hierarchy. The cgroup2 to test is held by
1157+
* *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
1158+
* Return
1159+
* The return value depends on the result of the test, and can be:
1160+
*
1161+
* * 0, if the *skb* task belongs to the cgroup2.
1162+
* * 1, if the *skb* task does not belong to the cgroup2.
1163+
* * A negative error code, if an error occurred.
1164+
*
10941165
* int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
10951166
* Description
10961167
* Resize (trim or grow) the packet associated to *skb* to the
@@ -1182,6 +1253,107 @@ union bpf_attr {
11821253
* Return
11831254
* The id of current NUMA node.
11841255
*
1256+
* int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
1257+
* Description
1258+
* Grows headroom of packet associated to *skb* and adjusts the
1259+
* offset of the MAC header accordingly, adding *len* bytes of
1260+
* space. It automatically extends and reallocates memory as
1261+
* required.
1262+
*
1263+
* This helper can be used on a layer 3 *skb* to push a MAC header
1264+
* for redirection into a layer 2 device.
1265+
*
1266+
* All values for *flags* are reserved for future usage, and must
1267+
* be left at zero.
1268+
*
1269+
* A call to this helper is susceptible to change the underlaying
1270+
* packet buffer. Therefore, at load time, all checks on pointers
1271+
* previously done by the verifier are invalidated and must be
1272+
* performed again, if the helper is used in combination with
1273+
* direct packet access.
1274+
* Return
1275+
* 0 on success, or a negative error in case of failure.
1276+
*
1277+
* int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
1278+
* Description
1279+
* Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
1280+
* it is possible to use a negative value for *delta*. This helper
1281+
* can be used to prepare the packet for pushing or popping
1282+
* headers.
1283+
*
1284+
* A call to this helper is susceptible to change the underlaying
1285+
* packet buffer. Therefore, at load time, all checks on pointers
1286+
* previously done by the verifier are invalidated and must be
1287+
* performed again, if the helper is used in combination with
1288+
* direct packet access.
1289+
* Return
1290+
* 0 on success, or a negative error in case of failure.
1291+
*
1292+
* int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
1293+
* Description
1294+
* Copy a NUL terminated string from an unsafe address
1295+
* *unsafe_ptr* to *dst*. The *size* should include the
1296+
* terminating NUL byte. In case the string length is smaller than
1297+
* *size*, the target is not padded with further NUL bytes. If the
1298+
* string length is larger than *size*, just *size*-1 bytes are
1299+
* copied and the last byte is set to NUL.
1300+
*
1301+
* On success, the length of the copied string is returned. This
1302+
* makes this helper useful in tracing programs for reading
1303+
* strings, and more importantly to get its length at runtime. See
1304+
* the following snippet:
1305+
*
1306+
* ::
1307+
*
1308+
* SEC("kprobe/sys_open")
1309+
* void bpf_sys_open(struct pt_regs *ctx)
1310+
* {
1311+
* char buf[PATHLEN]; // PATHLEN is defined to 256
1312+
* int res = bpf_probe_read_str(buf, sizeof(buf),
1313+
* ctx->di);
1314+
*
1315+
* // Consume buf, for example push it to
1316+
* // userspace via bpf_perf_event_output(); we
1317+
* // can use res (the string length) as event
1318+
* // size, after checking its boundaries.
1319+
* }
1320+
*
1321+
* In comparison, using **bpf_probe_read()** helper here instead
1322+
* to read the string would require to estimate the length at
1323+
* compile time, and would often result in copying more memory
1324+
* than necessary.
1325+
*
1326+
* Another useful use case is when parsing individual process
1327+
* arguments or individual environment variables navigating
1328+
* *current*\ **->mm->arg_start** and *current*\
1329+
* **->mm->env_start**: using this helper and the return value,
1330+
* one can quickly iterate at the right offset of the memory area.
1331+
* Return
1332+
* On success, the strictly positive length of the string,
1333+
* including the trailing NUL character. On error, a negative
1334+
* value.
1335+
*
1336+
* u64 bpf_get_socket_cookie(struct sk_buff *skb)
1337+
* Description
1338+
* If the **struct sk_buff** pointed by *skb* has a known socket,
1339+
* retrieve the cookie (generated by the kernel) of this socket.
1340+
* If no cookie has been set yet, generate a new cookie. Once
1341+
* generated, the socket cookie remains stable for the life of the
1342+
* socket. This helper can be useful for monitoring per socket
1343+
* networking traffic statistics as it provides a unique socket
1344+
* identifier per namespace.
1345+
* Return
1346+
* A 8-byte long non-decreasing number on success, or 0 if the
1347+
* socket field is missing inside *skb*.
1348+
*
1349+
* u32 bpf_get_socket_uid(struct sk_buff *skb)
1350+
* Return
1351+
* The owner UID of the socket associated to *skb*. If the socket
1352+
* is **NULL**, or if it is not a full socket (i.e. if it is a
1353+
* time-wait or a request socket instead), **overflowuid** value
1354+
* is returned (note that **overflowuid** might also be the actual
1355+
* UID value for the socket).
1356+
*
11851357
* u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
11861358
* Description
11871359
* Set the full hash for *skb* (set the field *skb*\ **->hash**)

0 commit comments

Comments
 (0)