Skip to content

Commit 0e227d6

Browse files
author
Yonghong Song
committed
[RFC][BPF] Do atomic_fetch_*() pattern matching with memory ordering
For atomic fetch_and_*() operations, do pattern matching with memory ordering seq_cst and monotonic (relaxed). For fetch_and_*() operations with seq_cst ordering, atomic_fetch_*() instructions are generated. For monotonic ordering, locked insns are generated. The main motivation is to resolve the kernel issue [1]. The following are memory ordering which could be supported: seq_cst, acq_rel, release, acquire, relaxed This patch only supports seq_cst and relaxed. We could pattern match other memory ordering as well. Current gcc style __sync_fetch_and_*() operations are all seq_cst. To use relaxed memory ordering, the _Atomic type is needed. The following is an example: ``` $ cat test.c \#include <stdatomic.h> void f1(_Atomic int *i) { (void)__c11_atomic_fetch_and(i, 10, memory_order_relaxed); } void f2(_Atomic int *i) { (void)__c11_atomic_fetch_and(i, 10, memory_order_seq_cst); } $ cat run.sh clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test.c -o test.o && llvm-objdump -d test.o $ ./run.sh test.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <f1>: 0: b4 02 00 00 0a 00 00 00 w2 = 0xa 1: c3 21 00 00 50 00 00 00 lock *(u32 *)(r1 + 0x0) &= w2 2: 95 00 00 00 00 00 00 00 exit 0000000000000018 <f2>: 3: b4 02 00 00 0a 00 00 00 w2 = 0xa 4: c3 21 00 00 51 00 00 00 w2 = atomic_fetch_and((u32 *)(r1 + 0x0), w2) 5: 95 00 00 00 00 00 00 00 exit ``` Here is another example with global _Atomic variable: ``` $ cat test1.c \#include <stdatomic.h> _Atomic int i; void f1(void) { (void)__c11_atomic_fetch_and(&i, 10, memory_order_relaxed); } void f2(void) { (void)__c11_atomic_fetch_and(&i, 10, memory_order_seq_cst); } $ cat run.sh clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test1.c -o test1.o && llvm-objdump -d test1.o $ ./run.sh test1.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <f1>: 0: b4 01 00 00 0a 00 00 00 w1 = 0xa 1: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 3: c3 12 00 00 50 00 00 00 lock *(u32 *)(r2 + 0x0) &= w1 4: 95 00 00 00 00 00 00 00 exit 0000000000000028 <f2>: 5: b4 01 00 00 0a 00 00 00 w1 = 0xa 6: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 8: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1) 9: 95 00 00 00 00 00 00 00 exit ``` Note that in the above compilations, '-g' is not used. The reason is due to the following IR related to _Atomic type: ``` $clang -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -g -S -emit-llvm test1.c ``` The related debug info for test1.c: ``` !0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression()) !1 = distinct !DIGlobalVariable(name: "i", scope: !2, file: !3, line: 3, type: !16, isLocal: false, isDefinition: true) ... !16 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !17) !17 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) ... If compiling test.c, the related debug info: ``` ... !19 = distinct !DISubprogram(name: "f1", scope: !1, file: !1, line: 3, type: !20, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !25) !20 = !DISubroutineType(types: !21) !21 = !{null, !22} !22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !23, size: 64) !23 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !24) !24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) ... !26 = !DILocalVariable(name: "i", arg: 1, scope: !19, file: !1, line: 3, type: !22) ``` All the above suggests _Atomic behaves like a modifier (e.g. const, restrict, volatile). This seems true based on doc [1]. Without proper handling DW_TAG_atomic_type, BTF generation will be incorrect since the current implementation assumes no existence of DW_TAG_atomic_type. So we have two choices here: (1). llvm bpf backend processes DW_TAG_atomic_type but ignores it in BTF encoding. (2). Add another type, e.g., BTF_KIND_ATOMIC to BTF. BTF_KIND_ATOMIC behaves as a modifier like const/volatile/restrict. For choice (1), the following is a hack which can make '-g' work for test1.c: ``` diff --git a/llvm/lib/Target/BPF/BTFDebug.cpp b/llvm/lib/Target/BPF/BTFDebug.cpp index 4d847ab..fd61bb811111 100644 --- a/llvm/lib/Target/BPF/BTFDebug.cpp +++ b/llvm/lib/Target/BPF/BTFDebug.cpp @@ -1444,8 +1444,14 @@ void BTFDebug::processGlobals(bool ProcessingMapDef) { DIGlobal = GVE->getVariable(); if (SecName.starts_with(".maps")) visitMapDefType(DIGlobal->getType(), GVTypeId); - else - visitTypeEntry(DIGlobal->getType(), GVTypeId, false, false); + else { + const DIType *Ty = DIGlobal->getType(); + auto *DTy = dyn_cast<DIDerivedType>(Ty); + if (DTy && DTy->getTag() == dwarf::DW_TAG_atomic_type) + visitTypeEntry(DTy->getBaseType(), GVTypeId, false, false); + else + visitTypeEntry(Ty, GVTypeId, false, false); + } break; } ``` You can see that basicaly dwarf::DW_TAG_atomic_type is skipped during BTF generation. Other changes are needed to avoid other usages of dwarf::DW_TAG_atomic_type. But I prefer adding BTF_KIND_ATOMIC if we indeed intends to use _Atomic in bpf programs. This probably is the only way if the _Atomic type is used e.g. at global variable where corresponding types in skeleton needs also be _Atomic. Please let me know your opinion. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://dwarfstd.org/issues/131112.1.html
1 parent ef1ef03 commit 0e227d6

File tree

1 file changed

+34
-10
lines changed

1 file changed

+34
-10
lines changed

llvm/lib/Target/BPF/BPFInstrInfo.td

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -864,26 +864,50 @@ class XFALU32<BPFWidthModifer SizeOp, BPFArithOp Opc, string OpcodeStr,
864864

865865
let Constraints = "$dst = $val" in {
866866
let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
867-
def XFADDW32 : XFALU32<BPF_W, BPF_ADD, "u32", "add", atomic_load_add_i32>;
868-
def XFANDW32 : XFALU32<BPF_W, BPF_AND, "u32", "and", atomic_load_and_i32>;
869-
def XFORW32 : XFALU32<BPF_W, BPF_OR, "u32", "or", atomic_load_or_i32>;
870-
def XFXORW32 : XFALU32<BPF_W, BPF_XOR, "u32", "xor", atomic_load_xor_i32>;
867+
def XFADDW32 : XFALU32<BPF_W, BPF_ADD, "u32", "add", atomic_load_add_i32_seq_cst>;
868+
def XFANDW32 : XFALU32<BPF_W, BPF_AND, "u32", "and", atomic_load_and_i32_seq_cst>;
869+
def XFORW32 : XFALU32<BPF_W, BPF_OR, "u32", "or", atomic_load_or_i32_seq_cst>;
870+
def XFXORW32 : XFALU32<BPF_W, BPF_XOR, "u32", "xor", atomic_load_xor_i32_seq_cst>;
871871
}
872872

873873
let Predicates = [BPFHasALU32] in {
874-
def XFADDD : XFALU64<BPF_DW, BPF_ADD, "u64", "add", atomic_load_add_i64>;
874+
def XFADDD : XFALU64<BPF_DW, BPF_ADD, "u64", "add", atomic_load_add_i64_seq_cst>;
875875
}
876-
def XFANDD : XFALU64<BPF_DW, BPF_AND, "u64", "and", atomic_load_and_i64>;
877-
def XFORD : XFALU64<BPF_DW, BPF_OR, "u64", "or", atomic_load_or_i64>;
878-
def XFXORD : XFALU64<BPF_DW, BPF_XOR, "u64", "xor", atomic_load_xor_i64>;
876+
def XFANDD : XFALU64<BPF_DW, BPF_AND, "u64", "and", atomic_load_and_i64_seq_cst>;
877+
def XFORD : XFALU64<BPF_DW, BPF_OR, "u64", "or", atomic_load_or_i64_seq_cst>;
878+
def XFXORD : XFALU64<BPF_DW, BPF_XOR, "u64", "xor", atomic_load_xor_i64_seq_cst>;
879+
}
880+
881+
let Predicates = [BPFHasALU32] in {
882+
def : Pat<(atomic_load_add_i32_monotonic ADDRri:$addr, GPR32:$val),
883+
(XADDW32 ADDRri:$addr, GPR32:$val)>;
884+
def : Pat<(atomic_load_add_i64_monotonic ADDRri:$addr, GPR:$val),
885+
(XADDD ADDRri:$addr, GPR:$val)>;
879886
}
880887

881888
// atomic_load_sub can be represented as a neg followed
882889
// by an atomic_load_add.
883-
def : Pat<(atomic_load_sub_i32 ADDRri:$addr, GPR32:$val),
890+
def : Pat<(atomic_load_sub_i32_seq_cst ADDRri:$addr, GPR32:$val),
884891
(XFADDW32 ADDRri:$addr, (NEG_32 GPR32:$val))>;
885-
def : Pat<(atomic_load_sub_i64 ADDRri:$addr, GPR:$val),
892+
def : Pat<(atomic_load_sub_i64_seq_cst ADDRri:$addr, GPR:$val),
886893
(XFADDD ADDRri:$addr, (NEG_64 GPR:$val))>;
894+
def : Pat<(atomic_load_sub_i32_monotonic ADDRri:$addr, GPR32:$val),
895+
(XADDW32 ADDRri:$addr, (NEG_32 GPR32:$val))>;
896+
def : Pat<(atomic_load_sub_i64_monotonic ADDRri:$addr, GPR:$val),
897+
(XADDD ADDRri:$addr, (NEG_64 GPR:$val))>;
898+
899+
def : Pat<(atomic_load_and_i32_monotonic ADDRri:$addr, GPR32:$val),
900+
(XANDW32 ADDRri:$addr, GPR32:$val)>;
901+
def : Pat<(atomic_load_and_i64_monotonic ADDRri:$addr, GPR:$val),
902+
(XANDD ADDRri:$addr, GPR:$val)>;
903+
def : Pat<(atomic_load_or_i32_monotonic ADDRri:$addr, GPR32:$val),
904+
(XORW32 ADDRri:$addr, GPR32:$val)>;
905+
def : Pat<(atomic_load_or_i64_monotonic ADDRri:$addr, GPR:$val),
906+
(XORD ADDRri:$addr, GPR:$val)>;
907+
def : Pat<(atomic_load_xor_i32_monotonic ADDRri:$addr, GPR32:$val),
908+
(XXORW32 ADDRri:$addr, GPR32:$val)>;
909+
def : Pat<(atomic_load_xor_i64_monotonic ADDRri:$addr, GPR:$val),
910+
(XXORD ADDRri:$addr, GPR:$val)>;
887911

888912
// Atomic Exchange
889913
class XCHG<BPFWidthModifer SizeOp, string OpcodeStr, PatFrag OpNode>

0 commit comments

Comments
 (0)