Skip to content

Commit 18f8106

Browse files
authored
[KernelInfo] Implement new LLVM IR pass for GPU code analysis (llvm#102944)
This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.
1 parent 15412d7 commit 18f8106

34 files changed

+2289
-13
lines changed

llvm/docs/KernelInfo.rst

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
==========
2+
KernelInfo
3+
==========
4+
5+
.. contents::
6+
:local:
7+
8+
Introduction
9+
============
10+
11+
This LLVM IR pass reports various statistics for codes compiled for GPUs. The
12+
goal of these statistics is to help identify bad code patterns and ways to
13+
mitigate them. The pass operates at the LLVM IR level so that it can, in
14+
theory, support any LLVM-based compiler for programming languages supporting
15+
GPUs.
16+
17+
By default, the pass runs at the end of LTO, and options like
18+
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
19+
command lines appear in the next section.
20+
21+
Remarks include summary statistics (e.g., total size of static allocas) and
22+
individual occurrences (e.g., source location of each alloca). Examples of the
23+
output appear in tests in `llvm/test/Analysis/KernelInfo`.
24+
25+
Example Command Lines
26+
=====================
27+
28+
To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:
29+
30+
.. code-block:: shell
31+
32+
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
33+
-Rpass=kernel-info
34+
35+
To analyze specified LLVM IR, perhaps previously generated by something like
36+
``clang -save-temps -g -fopenmp --offload-arch=native test.c``:
37+
38+
.. code-block:: shell
39+
40+
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
41+
-pass-remarks=kernel-info -passes=kernel-info
42+
43+
When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
44+
runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that
45+
behavior so you can position ``kernel-info`` explicitly:
46+
47+
.. code-block:: shell
48+
49+
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
50+
-Rpass=kernel-info \
51+
-Xoffload-linker --lto-newpm-passes='lto<O2>'
52+
53+
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
54+
-Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
55+
-Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'
56+
57+
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
58+
-pass-remarks=kernel-info \
59+
-passes='lto<O2>'
60+
61+
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
62+
-pass-remarks=kernel-info -no-kernel-info-end-lto \
63+
-passes='module(kernel-info),lto<O2>'

llvm/docs/Passes.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ LLVM's Analysis and Transform Passes
55
.. contents::
66
:local:
77

8+
.. toctree::
9+
:hidden:
10+
11+
KernelInfo
12+
813
Introduction
914
============
1015
.. warning:: This document is not updated frequently, and the list of passes
@@ -148,6 +153,12 @@ This pass collects the count of all instructions and reports them.
148153
Bookkeeping for "interesting" users of expressions computed from induction
149154
variables.
150155

156+
``kernel-info``: GPU Kernel Info
157+
--------------------------------
158+
159+
Reports various statistics for codes compiled for GPUs. This pass is
160+
:doc:`documented separately<KernelInfo>`.
161+
151162
``lazy-value-info``: Lazy Value Information Analysis
152163
----------------------------------------------------
153164

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
//=- KernelInfo.h - Kernel Analysis -------------------------------*- C++ -*-=//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file defines the KernelInfoPrinter class used to emit remarks about
10+
// function properties from a GPU kernel.
11+
//
12+
// See llvm/docs/KernelInfo.rst.
13+
// ===---------------------------------------------------------------------===//
14+
15+
#ifndef LLVM_ANALYSIS_KERNELINFO_H
16+
#define LLVM_ANALYSIS_KERNELINFO_H
17+
18+
#include "llvm/IR/PassManager.h"
19+
20+
namespace llvm {
21+
22+
class TargetMachine;
23+
24+
class KernelInfoPrinter : public PassInfoMixin<KernelInfoPrinter> {
25+
TargetMachine *TM;
26+
27+
public:
28+
explicit KernelInfoPrinter(TargetMachine *TM) : TM(TM) {}
29+
30+
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
31+
32+
static bool isRequired() { return true; }
33+
};
34+
} // namespace llvm
35+
#endif // LLVM_ANALYSIS_KERNELINFO_H

llvm/include/llvm/Analysis/TargetTransformInfo.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1891,6 +1891,11 @@ class TargetTransformInfo {
18911891

18921892
/// @}
18931893

1894+
/// Collect kernel launch bounds for \p F into \p LB.
1895+
void collectKernelLaunchBounds(
1896+
const Function &F,
1897+
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const;
1898+
18941899
private:
18951900
/// The abstract base class used to type erase specific TTI
18961901
/// implementations.
@@ -2329,6 +2334,9 @@ class TargetTransformInfo::Concept {
23292334
virtual unsigned getMaxNumArgs() const = 0;
23302335
virtual unsigned getNumBytesToPadGlobalArray(unsigned Size,
23312336
Type *ArrayType) const = 0;
2337+
virtual void collectKernelLaunchBounds(
2338+
const Function &F,
2339+
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const = 0;
23322340
};
23332341

23342342
template <typename T>
@@ -3174,6 +3182,12 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
31743182
Type *ArrayType) const override {
31753183
return Impl.getNumBytesToPadGlobalArray(Size, ArrayType);
31763184
}
3185+
3186+
void collectKernelLaunchBounds(
3187+
const Function &F,
3188+
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const override {
3189+
Impl.collectKernelLaunchBounds(F, LB);
3190+
}
31773191
};
31783192

31793193
template <typename T>

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1049,6 +1049,10 @@ class TargetTransformInfoImplBase {
10491049
return 0;
10501050
}
10511051

1052+
void collectKernelLaunchBounds(
1053+
const Function &F,
1054+
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const {}
1055+
10521056
protected:
10531057
// Obtain the minimum required size to hold the value (without the sign)
10541058
// In case of a vector it returns the min required size for one element.

llvm/include/llvm/IR/Function.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,18 @@ class LLVM_ABI Function : public GlobalObject, public ilist_node<Function> {
284284
setValueSubclassData((getSubclassDataFromValue() & 0xc00f) | (ID << 4));
285285
}
286286

287+
/// Does it have a kernel calling convention?
288+
bool hasKernelCallingConv() const {
289+
switch (getCallingConv()) {
290+
default:
291+
return false;
292+
case CallingConv::PTX_Kernel:
293+
case CallingConv::AMDGPU_KERNEL:
294+
case CallingConv::SPIR_KERNEL:
295+
return true;
296+
}
297+
}
298+
287299
enum ProfileCountType { PCT_Real, PCT_Synthetic };
288300

289301
/// Class to represent profile counts.

llvm/include/llvm/Target/TargetMachine.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
#include "llvm/MC/MCStreamer.h"
2020
#include "llvm/Support/Allocator.h"
2121
#include "llvm/Support/CodeGen.h"
22+
#include "llvm/Support/CommandLine.h"
2223
#include "llvm/Support/Error.h"
2324
#include "llvm/Support/PGOOptions.h"
2425
#include "llvm/Target/CGPassBuilderOption.h"
@@ -28,6 +29,8 @@
2829
#include <string>
2930
#include <utility>
3031

32+
extern llvm::cl::opt<bool> NoKernelInfoEndLTO;
33+
3134
namespace llvm {
3235

3336
class AAManager;

llvm/lib/Analysis/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ add_llvm_component_library(LLVMAnalysis
7979
InstructionPrecedenceTracking.cpp
8080
InstructionSimplify.cpp
8181
InteractiveModelRunner.cpp
82+
KernelInfo.cpp
8283
LastRunTrackingAnalysis.cpp
8384
LazyBranchProbabilityInfo.cpp
8485
LazyBlockFrequencyInfo.cpp

0 commit comments

Comments
 (0)