Skip to content

[CodeGen][NFC] Fix quadratic c-t for large jump tables #144108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

aengelke
Copy link
Contributor

@aengelke aengelke commented Jun 13, 2025

Deleting a basic block removes all references from jump tables, which
is O(n). When freeing a MachineFunction, all basic blocks are deleted
before the jump tables, causing O(n^2) runtime. Fix this by deallocating
the jump table first.

Test case generator:

import sys

n = int(sys.argv[1])
print("define void @f(i64 %c, ptr %p) {")
print("  switch i64 %c, label %d [")
for i in range(n):
    print(f"    i64 {i}, label %h{i}")
print(f"  ]")
for i in range(n):
    print(f'h{i}:')
    print(f'  store i64 {i*i}, ptr %p')
    print(f'  ret void')
print('d:')
print('  ret void')
print('}')

Improvement at 5000 entries:

Benchmark 1: ./llc.pre -filetype=obj -O0 <switch5k.bc
  Time (mean ± σ):      49.7 ms ±   1.0 ms
  Range (min … max):    48.0 ms …  52.1 ms    57 runs

Benchmark 2: ./llc.post -filetype=obj -O0 <switch5k.bc
  Time (mean ± σ):      39.4 ms ±   0.8 ms
  Range (min … max):    37.1 ms …  41.1 ms    72 runs

Summary
  ./llc.post -filetype=obj -O0 <switch5k.bc ran
    1.26 ± 0.04 times faster than ./llc.pre -filetype=obj -O0 <switch5k.bc

Improvement at 20000 entries:

Benchmark 1: ./llc.pre -filetype=obj -O0 <switch20k.bc
  Time (mean ± σ):     281.7 ms ±   1.0 ms
  Range (min … max):   280.2 ms … 283.0 ms    10 runs

Benchmark 2: ./llc.post -filetype=obj -O0 <switch20k.bc
  Time (mean ± σ):     123.9 ms ±   1.5 ms
  Range (min … max):   121.4 ms … 129.2 ms    23 runs

Summary
  ./llc.post -filetype=obj -O0 <switch20k.bc ran
    2.27 ± 0.03 times faster than ./llc.pre -filetype=obj -O0 <switch20k.bc

abidh and others added 2 commits June 13, 2025 16:10
Created using spr 1.3.5-bogner

[skip ci]
Created using spr 1.3.5-bogner
@aengelke aengelke requested review from aeubanks and arsenm June 13, 2025 16:11
Copy link
Contributor

@aeubanks aeubanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. can you put some compile time numbers in the description before and after this patch?

kparzysz and others added 2 commits June 18, 2025 16:56
Created using spr 1.3.5-bogner

[skip ci]
Created using spr 1.3.5-bogner
@aengelke aengelke changed the base branch from users/aengelke/spr/main.codegennfc-fix-quadratic-c-t-for-large-jump-tables to main June 18, 2025 16:56
@aengelke aengelke merged commit 2a8c65e into main Jun 18, 2025
6 of 8 checks passed
@aengelke aengelke deleted the users/aengelke/spr/codegennfc-fix-quadratic-c-t-for-large-jump-tables branch June 18, 2025 16:56
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 18, 2025
Deleting a basic block removes all references from jump tables, which
is O(n). When freeing a MachineFunction, all basic blocks are deleted
before the jump tables, causing O(n^2) runtime. Fix this by deallocating
the jump table first.

Test case generator:

    import sys

    n = int(sys.argv[1])
    print("define void @f(i64 %c, ptr %p) {")
    print("  switch i64 %c, label %d [")
    for i in range(n):
        print(f"    i64 {i}, label %h{i}")
    print(f"  ]")
    for i in range(n):
        print(f'h{i}:')
        print(f'  store i64 {i*i}, ptr %p')
        print(f'  ret void')
    print('d:')
    print('  ret void')
    print('}')

Improvement at 5000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      49.7 ms ±   1.0 ms
      Range (min … max):    48.0 ms …  52.1 ms    57 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch5k.bc
      Time (mean ± σ):      39.4 ms ±   0.8 ms
      Range (min … max):    37.1 ms …  41.1 ms    72 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch5k.bc ran
        1.26 ± 0.04 times faster than ./llc.pre -filetype=obj -O0 <switch5k.bc

Improvement at 20000 entries:

    Benchmark 1: ./llc.pre -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     281.7 ms ±   1.0 ms
      Range (min … max):   280.2 ms … 283.0 ms    10 runs

    Benchmark 2: ./llc.post -filetype=obj -O0 <switch20k.bc
      Time (mean ± σ):     123.9 ms ±   1.5 ms
      Range (min … max):   121.4 ms … 129.2 ms    23 runs

    Summary
      ./llc.post -filetype=obj -O0 <switch20k.bc ran
        2.27 ± 0.03 times faster than ./llc.pre -filetype=obj -O0 <switch20k.bc

Pull Request: llvm/llvm-project#144108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants