Skip to content

[BPF] fix sub-register handling for bpf_fastcall #110618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 1, 2024

Conversation

eddyz87
Copy link
Contributor

@eddyz87 eddyz87 commented Oct 1, 2024

bpf_fastcall induced spill/fill pairs should be generated for sub-register as well as for sub-registers. At the moment this is not the case, e.g.:

$ cat t.c
extern int foo(void) __attribute__((bpf_fastcall));

int bar(int a) {
  foo();
  return a;
}

$ clang --target=bpf -mcpu=v3 -O2 -S t.c -o -
...
call foo
w0 = w1
exit

Modify BPFMIPeephole.cpp:collectBPFFastCalls() to check sub-registers liveness and thus produce correct code for example above:

*(u64 *)(r10 - 8) = r1
call foo
r1 = *(u64 *)(r10 - 8)
w0 = w1
exit

@eddyz87 eddyz87 requested review from yonghong-song and 4ast October 1, 2024 01:55
Copy link

github-actions bot commented Oct 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@eddyz87
Copy link
Contributor Author

eddyz87 commented Oct 1, 2024

This issue was causing Linux kernel BPF selftest failure:

arena_htab/arena_htab_asm execution log
# ./test_progs -t arena_htab/arena_htab_asm
Validating htab_update_elem() func#1...
93: R1=scalar() R2=scalar() R3=scalar() R10=fp0
; __weak int htab_update_elem(htab_t *htab __arg_arena, int key, int value) @ bpf_arena_htab.h:69
93: (bc) w7 = w2                      ; R2=scalar() R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
; cast_kern(htab); @ bpf_arena_htab.h:74
94: (bf) r1 = addr_space_cast(r1, 0, 1)       ; R1_w=arena
; htab_bucket_t *b = htab->buckets; @ bpf_arena_htab.h:21
95: (79) r8 = *(u64 *)(r1 +0)         ; R1_w=arena R8_w=scalar()
; cast_kern(b); @ bpf_arena_htab.h:23
96: (bf) r8 = addr_space_cast(r8, 0, 1)       ; R8_w=arena
; return &b[hash & (htab->n_buckets - 1)]; @ bpf_arena_htab.h:24
97: (61) r1 = *(u32 *)(r1 +8)         ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
98: (04) w1 += -1                     ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
99: (5c) w1 &= w7                     ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
100: (67) r1 <<= 3                    ; R1_w=scalar(smin=0,smax=umax=0x7fffffff8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x7fffffff8))
101: (0f) r8 += r1                    ; R1_w=scalar(smin=0,smax=umax=0x7fffffff8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x7fffffff8)) R8_w=arena
102: (b7) r6 = 0                      ; R6_w=0
; list_for_each_entry(l, head, hash_node) @ bpf_arena_htab.h:44
103: (79) r1 = *(u64 *)(r8 +0)        ; R1_w=scalar() R8_w=arena
104: (15) if r1 == 0x0 goto pc+16     ; R1_w=scalar(umin=1)
105: (bf) r1 = addr_space_cast(r1, 0, 1)      ; R1_w=arena
106: (05) goto pc+11
118: (b7) r6 = 0                      ; R6=0
119: (e5) may_goto pc+1
; __u32 cpu = bpf_get_smp_processor_id(); @ bpf_arena_alloc.h:23
121: (85) call bpf_get_smp_processor_id#8     ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7))
; void __arena *page = page_frag_cur_page[cpu]; @ bpf_arena_alloc.h:24
122: (bc) w1 = w0                     ; R0_w=scalar(id=6,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7)) R1_w=scalar(id=6,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7))
; int __arena *cur_offset = &page_frag_cur_offset[cpu]; @ bpf_arena_alloc.h:25
123: (bf) r2 = r1                     ; R1_w=scalar(id=6,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7)) R2_w=scalar(id=6,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7))
124: (67) r2 <<= 2                    ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=28,var_off=(0x0; 0x1c))
125: (18) r4 = 0xffffc9000018de98     ; R4_w=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18e98)
127: (0f) r4 += r2                    ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=28,var_off=(0x0; 0x1c)) R4_w=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18e98,smin=smin32=0,smax=umax=smax32=umax32=28,var_off=(0x0; 0x1c))
; void __arena *page = page_frag_cur_page[cpu]; @ bpf_arena_alloc.h:24
128: (67) r1 <<= 3                    ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=56,var_off=(0x0; 0x38))
129: (18) r9 = 0xffffc9000018da98     ; R9_w=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18a98)
131: (0f) r9 += r1                    ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=56,var_off=(0x0; 0x38)) R9_w=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18a98,smin=smin32=0,smax=umax=smax32=umax32=56,var_off=(0x0; 0x38))
132: (79) r0 = *(u64 *)(r9 +0)        ; R0=scalar() R9=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18a98,smin=smin32=0,smax=umax=smax32=umax32=56,var_off=(0x0; 0x38))
; if (!page) { @ bpf_arena_alloc.h:31
133: (55) if r0 != 0x0 goto pc+20     ; R0=0
134: (7b) *(u64 *)(r10 -24) = r4      ; R4=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18e98,smin=smin32=0,smax=umax=smax32=umax32=28,var_off=(0x0; 0x1c)) R10=fp0 fp-24_w=map_value(map=arena_ht.bss,ks=4,vs=102552,off=0x18e98,smin=smin32=0,smax=umax=smax32=umax32=28,var_off=(0x0; 0x1c))
135: (63) *(u32 *)(r10 -16) = r3
R3 !read_ok
processed 198 insns (limit 1000000) max_states_per_insn 1 total_states 20 peak_states 20 mark_read 3
-- END PROG LOAD LOG --
...
#4/2     arena_htab/arena_htab_asm:FAIL
#4       arena_htab:FAIL

Here register r3 is not properly spilled before a call to bpf_get_smp_processor_id.

bpf_fastcall induced spill/fill pairs should be generated for
sub-register as well as for sub-registers. At the moment this is not
the case, e.g.:

    $ cat t.c
    extern int foo(void) __attribute__((bpf_fastcall));

    int bar(int a) {
      foo();
      return a;
    }

    $ clang --target=bpf -mcpu=v3 -O2 -S t.c -o -
    ...
    call foo
    w0 = w1
    exit

Modify BPFMIPeephole.cpp:collectBPFFastCalls() to check sub-registers
liveness and thus produce correct code for example above:

    *(u64 *)(r10 - 8) = r1
    call foo
    r1 = *(u64 *)(r10 - 8)
    w0 = w1
    exit
@eddyz87 eddyz87 force-pushed the bpf-fastcall-subreg-fix branch from 8a11bee to 407143c Compare October 1, 2024 02:04
@eddyz87 eddyz87 marked this pull request as ready for review October 1, 2024 02:04
As pointed by @4ast, there is no need to do two calls to
`IsLiveBeforeInsn`, LivePhysRegs::contains() is documented as follows:

  /// ... This also
  /// works if only the super register of \p Reg has been defined, because
  /// addReg() always adds all sub-registers to the set as well ...
No changes in behaviour, just makes the code more terse.
Copy link
Member

@4ast 4ast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@eddyz87
Copy link
Contributor Author

eddyz87 commented Oct 1, 2024

lgtm

Thank you, I'll wait for comments from Yonghong and land after that.

Copy link
Contributor

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@eddyz87 eddyz87 merged commit 2fca0ef into llvm:main Oct 1, 2024
8 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
bpf_fastcall induced spill/fill pairs should be generated for
sub-register as well as for sub-registers. At the moment this is not the
case, e.g.:

    $ cat t.c
    extern int foo(void) __attribute__((bpf_fastcall));

    int bar(int a) {
      foo();
      return a;
    }

    $ clang --target=bpf -mcpu=v3 -O2 -S t.c -o -
    ...
    call foo
    w0 = w1
    exit

Modify BPFMIPeephole.cpp:collectBPFFastCalls() to check sub-registers
liveness and thus produce correct code for example above:

    *(u64 *)(r10 - 8) = r1
    call foo
    r1 = *(u64 *)(r10 - 8)
    w0 = w1
    exit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants