Skip to content

Commit c9b6e01

Browse files
authored
[AMDGPU] Graph-based Module Splitting Rewrite (#104763)
Major rewrite of the AMDGPUSplitModule pass in order to better support it long-term. Highlights: - Removal of the "SML" logging system in favor of just using CL options and LLVM_DEBUG, like any other pass in LLVM. - The SML system started from good intentions, but it was too flawed and messy to be of any real use. It was also a real pain to use and made the code more annoying to maintain. - Graph-based module representation with DOTGraph printing support - The graph represents the module accurately, with bidirectional, typed edges between nodes (a node usually represents one function). - Nodes are assigned IDs starting from 0, which allows us to represent a set of nodes as a BitVector. This makes comparing 2 sets of nodes to find common dependencies a trivial task. Merging two clusters of nodes together is also really trivial. - No more defaulting to "P0" for external calls - Roots that can reach non-copyable dependencies (such as external calls) are now grouped together in a single "cluster" that can go into any partition. - No more defaulting to "P0" for indirect calls - New representation for module splitting proposals that can be graded and compared. - Graph-search algorithm that can explore multiple branches/assignments for a cluster of functions, up to a maximum depth. - With the default max depth of 8, we can create up to 256 propositions to try and find the best one. - We can still fall back to a greedy approach upon reaching max depth. That greedy approach uses almost identical heuristics to the previous version of the pass. All of this gives us a lot of room to experiment with new heuristics or even entirely different splitting strategies if we need to. For instance, the graph representation has room for abstract nodes, e.g. if we need to represent some global variables or external constraints. We could also introduce more edge types to model other type of relations between nodes, etc. I also designed the graph representation & the splitting strategies to be as fast as possible, and it seems to have paid off. Some quick tests showed that we spend pretty much all of our time in the CloneModule function, with the actual splitting logic being >1% of the runtime.
1 parent ae34257 commit c9b6e01

17 files changed

+1624
-738
lines changed

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp

Lines changed: 1272 additions & 530 deletions
Large diffs are not rendered by default.

llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,24 @@
1-
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=0
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
1+
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
55

66
; 3 kernels:
77
; - A does a direct call to HelperA
88
; - B is storing @HelperA
99
; - C does a direct call to HelperA
1010
;
11-
; The helper functions will get externalized, which will force A and C into P0 as
12-
; external functions cannot be duplicated.
13-
14-
; CHECK0: define hidden void @HelperA()
15-
; CHECK0: define amdgpu_kernel void @A()
16-
; CHECK0: declare amdgpu_kernel void @B(ptr)
17-
; CHECK0: define amdgpu_kernel void @C()
18-
19-
; CHECK1: declare hidden void @HelperA()
20-
; CHECK1: declare amdgpu_kernel void @A()
21-
; CHECK1: declare amdgpu_kernel void @B(ptr)
22-
; CHECK1: declare amdgpu_kernel void @C()
23-
24-
; CHECK2: declare hidden void @HelperA()
25-
; CHECK2: declare amdgpu_kernel void @A()
26-
; CHECK2: define amdgpu_kernel void @B(ptr %dst)
27-
; CHECK2: declare amdgpu_kernel void @C()
11+
; The helper functions will get externalized, so C/A will end up
12+
; in the same partition.
13+
14+
; P0 is empty.
15+
; CHECK0: declare
16+
17+
; CHECK1: define amdgpu_kernel void @B(ptr %dst)
18+
19+
; CHECK2: define hidden void @HelperA()
20+
; CHECK2: define amdgpu_kernel void @A()
21+
; CHECK2: define amdgpu_kernel void @C()
2822

2923
define internal void @HelperA() {
3024
ret void

llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=0
1+
; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0
22
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
33
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
44

llvm/test/tools/llvm-split/AMDGPU/debug-name-hiding.ll

Lines changed: 0 additions & 20 deletions
This file was deleted.

llvm/test/tools/llvm-split/AMDGPU/debug-non-kernel-root.ll

Lines changed: 0 additions & 36 deletions
This file was deleted.
Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,12 @@
11
; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa
22
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
3+
; RUN: not llvm-dis -o - %t1
44

5-
; Check that all declarations are put into each partition.
5+
; Empty module without any defs should result in a single output module that is
6+
; an exact copy of the input.
67

78
; CHECK0: declare void @A
89
; CHECK0: declare void @B
910

10-
; CHECK1: declare void @A
11-
; CHECK1: declare void @B
12-
1311
declare void @A()
14-
1512
declare void @B()

llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
44

55
; 3 kernels:
66
; - A calls nothing
@@ -13,16 +13,12 @@
1313
; Additionally, @PerryThePlatypus gets externalized as
1414
; the alias counts as taking its address.
1515

16-
; CHECK0-NOT: define
17-
; CHECK0: @Perry = internal alias ptr (), ptr @PerryThePlatypus
18-
; CHECK0: define hidden void @PerryThePlatypus()
19-
; CHECK0: define amdgpu_kernel void @B
20-
; CHECK0: define amdgpu_kernel void @C
21-
; CHECK0-NOT: define
16+
; CHECK0: define amdgpu_kernel void @A
2217

23-
; CHECK1-NOT: define
24-
; CHECK1: define amdgpu_kernel void @A
25-
; CHECK1-NOT: define
18+
; CHECK1: @Perry = internal alias ptr (), ptr @PerryThePlatypus
19+
; CHECK1: define hidden void @PerryThePlatypus()
20+
; CHECK1: define amdgpu_kernel void @B
21+
; CHECK1: define amdgpu_kernel void @C
2622

2723
@Perry = internal alias ptr(), ptr @PerryThePlatypus
2824

llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,21 @@
11
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
55

66
; 3 kernels with each their own dependencies should go into 3
77
; distinct partitions. The most expensive kernel should be
88
; seen first and go into the last partition.
99

10-
; CHECK0-NOT: define
1110
; CHECK0: define amdgpu_kernel void @C
1211
; CHECK0: define internal void @HelperC
1312
; CHECK0-NOT: define
1413

15-
; CHECK1-NOT: define
1614
; CHECK1: define amdgpu_kernel void @A
1715
; CHECK1: define internal void @HelperA
18-
; CHECK1-NOT: define
1916

20-
; CHECK2-NOT: define
2117
; CHECK2: define amdgpu_kernel void @B
2218
; CHECK2: define internal void @HelperB
23-
; CHECK2-NOT: define
24-
2519

2620
define amdgpu_kernel void @A() {
2721
call void @HelperA()

llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll

Lines changed: 12 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,20 @@
11
; RUN: llvm-split -o %t %s -j 4 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
5-
; RUN: llvm-dis -o - %t3 | FileCheck --check-prefix=CHECK3 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
5+
; RUN: llvm-dis -o - %t3 | FileCheck --check-prefix=CHECK3 --implicit-check-not=define %s
66

7-
; Both overridable helper should go in P0.
7+
; CHECK0: define internal void @PrivateHelper1()
8+
; CHECK0: define amdgpu_kernel void @D
89

9-
; CHECK0-NOT: define
10-
; CHECK0: define available_externally void @OverridableHelper0()
11-
; CHECK0: define internal void @OverridableHelper1()
12-
; CHECK0: define amdgpu_kernel void @A
13-
; CHECK0: define amdgpu_kernel void @B
14-
; CHECK0-NOT: define
10+
; CHECK1: define internal void @PrivateHelper0()
11+
; CHECK1: define amdgpu_kernel void @C
1512

16-
; CHECK1-NOT: define
13+
; CHECK2: define internal void @OverridableHelper1()
14+
; CHECK2: define amdgpu_kernel void @B
1715

18-
; CHECK2-NOT: define
19-
; CHECK2: define internal void @PrivateHelper1()
20-
; CHECK2: define amdgpu_kernel void @D
21-
; CHECK2-NOT: define
22-
23-
; CHECK3-NOT: define
24-
; CHECK3: define internal void @PrivateHelper0()
25-
; CHECK3: define amdgpu_kernel void @C
26-
; CHECK3-NOT: define
16+
; CHECK3: define available_externally void @OverridableHelper0()
17+
; CHECK3: define amdgpu_kernel void @A
2718

2819
define available_externally void @OverridableHelper0() {
2920
ret void

llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
55

66
; We have 4 kernels:
77
; - Each kernel has an internal helper
@@ -15,25 +15,19 @@
1515
; indirect call. HelperC/D should also end up in P0 as they
1616
; are dependencies of HelperB.
1717

18-
; CHECK0-NOT: define
19-
; CHECK0: define hidden void @HelperA
20-
; CHECK0: define hidden void @HelperB
21-
; CHECK0: define hidden void @CallCandidate
22-
; CHECK0: define internal void @HelperC
2318
; CHECK0: define internal void @HelperD
24-
; CHECK0: define amdgpu_kernel void @A
25-
; CHECK0: define amdgpu_kernel void @B
26-
; CHECK0-NOT: define
19+
; CHECK0: define amdgpu_kernel void @D
2720

28-
; CHECK1-NOT: define
29-
; CHECK1: define internal void @HelperD
30-
; CHECK1: define amdgpu_kernel void @D
31-
; CHECK1-NOT: define
21+
; CHECK1: define internal void @HelperC
22+
; CHECK1: define amdgpu_kernel void @C
3223

33-
; CHECK2-NOT: define
24+
; CHECK2: define hidden void @HelperA
25+
; CHECK2: define hidden void @HelperB
26+
; CHECK2: define hidden void @CallCandidate
3427
; CHECK2: define internal void @HelperC
35-
; CHECK2: define amdgpu_kernel void @C
36-
; CHECK2-NOT: define
28+
; CHECK2: define internal void @HelperD
29+
; CHECK2: define amdgpu_kernel void @A
30+
; CHECK2: define amdgpu_kernel void @B
3731

3832
@addrthief = global [3 x ptr] [ptr @HelperA, ptr @HelperB, ptr @CallCandidate]
3933

llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,15 @@
11
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
5-
6-
; CHECK0-NOT: define
7-
; CHECK0: define void @ExternalHelper
8-
; CHECK0: define amdgpu_kernel void @A
9-
; CHECK0: define amdgpu_kernel void @B
10-
; CHECK0-NOT: define
11-
12-
; CHECK1-NOT: define
13-
; CHECK1: define amdgpu_kernel void @D
14-
; CHECK1-NOT: define
15-
16-
; CHECK2-NOT: define
17-
; CHECK2: define amdgpu_kernel void @C
18-
; CHECK2-NOT: define
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
5+
6+
; CHECK0: define amdgpu_kernel void @D
7+
8+
; CHECK1: define amdgpu_kernel void @C
9+
10+
; CHECK2: define void @ExternalHelper
11+
; CHECK2: define amdgpu_kernel void @A
12+
; CHECK2: define amdgpu_kernel void @B
1913

2014
define void @ExternalHelper() {
2115
ret void

llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,20 @@
11
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-globals
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
55

66
; 3 kernels use private/internal global variables.
77
; The GVs should be copied in each partition as needed.
88

9-
; CHECK0-NOT: define
109
; CHECK0: @bar = internal constant ptr
1110
; CHECK0: define amdgpu_kernel void @C
12-
; CHECK0-NOT: define
1311

14-
; CHECK1-NOT: define
1512
; CHECK1: @foo = private constant ptr
1613
; CHECK1: define amdgpu_kernel void @A
17-
; CHECK1-NOT: define
1814

19-
; CHECK2-NOT: define
2015
; CHECK2: @foo = private constant ptr
2116
; CHECK2: @bar = internal constant ptr
2217
; CHECK2: define amdgpu_kernel void @B
23-
; CHECK2-NOT: define
2418

2519
@foo = private constant ptr poison
2620
@bar = internal constant ptr poison

llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,22 @@
11
; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa
2-
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s
3-
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s
4-
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s
2+
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
3+
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
4+
; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
55

66
; 3 kernels use private/internal global variables.
77
; The GVs should be copied in each partition as needed.
88

9-
; CHECK0-NOT: define
109
; CHECK0: @foo = hidden constant ptr poison
1110
; CHECK0: @bar = hidden constant ptr poison
1211
; CHECK0: define amdgpu_kernel void @C
13-
; CHECK0-NOT: define
1412

15-
; CHECK1-NOT: define
1613
; CHECK1: @foo = external hidden constant ptr{{$}}
1714
; CHECK1: @bar = external hidden constant ptr{{$}}
1815
; CHECK1: define amdgpu_kernel void @A
19-
; CHECK1-NOT: define
2016

21-
; CHECK2-NOT: define
2217
; CHECK2: @foo = external hidden constant ptr{{$}}
2318
; CHECK2: @bar = external hidden constant ptr{{$}}
2419
; CHECK2: define amdgpu_kernel void @B
25-
; CHECK2-NOT: define
2620

2721
@foo = private constant ptr poison
2822
@bar = internal constant ptr poison

0 commit comments

Comments
 (0)