Skip to content

[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5a671f6
[KernelInfo] Implement new LLVM IR pass for GPU code analysis
jdenny-ornl Aug 12, 2024
a7656de
Move docs to KernelInfo.rst
jdenny-ornl Aug 12, 2024
d92856e
Move conditional outside registration call
jdenny-ornl Aug 12, 2024
6ac3f41
Use llvm::SmallString
jdenny-ornl Aug 12, 2024
6367ad7
Use TTI.getFlatAddressSpace for addrspace(0)
jdenny-ornl Aug 12, 2024
78446bb
Avoid repetition between amdgpu and nvptx tests
jdenny-ornl Aug 12, 2024
fede524
Use named values in tests
jdenny-ornl Aug 12, 2024
4c30b8a
Say flat address space instead of addrspace(0)
jdenny-ornl Aug 13, 2024
33f0d4d
Cache the flat address space
jdenny-ornl Aug 13, 2024
a2a512c
Link KernelInfo.rst from Passes.rst
jdenny-ornl Aug 13, 2024
de04ac4
Don't filter out cpus
jdenny-ornl Aug 13, 2024
ec5d2bd
Include less in header
jdenny-ornl Aug 16, 2024
c06b905
Removed unused comparison operators
jdenny-ornl Aug 16, 2024
d83d22a
Remove redundant null check
jdenny-ornl Aug 16, 2024
1649cf8
Move KernelInfo to KernelInfo.cpp, remove KernelInfoAnalysis
jdenny-ornl Aug 16, 2024
1a3c0ae
Use printAsOperand not getName to identify instruction
jdenny-ornl Aug 16, 2024
ea89a81
Use printAsOperand to report indirect callee
jdenny-ornl Aug 16, 2024
8da602b
Report inline assembly calls
jdenny-ornl Aug 16, 2024
45114fd
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
eea139c
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
8bf6e4e
getKernelInfo -> emitKernelInfo because return is unused
jdenny-ornl Aug 16, 2024
d2ee05d
Merge branch 'main' into kernel-info-pr
jdenny-ornl Aug 21, 2024
9b865f4
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 5, 2024
39979f7
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 12, 2024
62d494d
Clean up launch bounds
jdenny-ornl Sep 13, 2024
e4d3fca
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 16, 2024
94d90d1
Adjust forEachLaunchBound param
jdenny-ornl Sep 16, 2024
762a217
Reuse Function::getFnAttributeAsParsedInteger
jdenny-ornl Sep 16, 2024
df66a3d
Move forEachLaunchBound to TargetTransformInfo
jdenny-ornl Sep 16, 2024
5488764
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 26, 2024
3f63d53
forEachLaunchBound -> collectLaunchBounds
jdenny-ornl Sep 26, 2024
3b6ce07
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 28, 2024
feeaa37
Remove redundant private
jdenny-ornl Sep 28, 2024
b9b95a2
Merge branch 'main' into kernel-info-pr
jdenny-ornl Oct 11, 2024
116f1c9
Remove todos, as requested
jdenny-ornl Oct 11, 2024
2094465
Combine registerFullLinkTimeOptimizationLastEPCallback calls
jdenny-ornl Oct 11, 2024
39bce7c
collectLaunchBounds -> collectKernelLaunchBounds
jdenny-ornl Oct 11, 2024
14345cf
Spell kernel-info properties like their IR attributes
jdenny-ornl Oct 11, 2024
ad393d2
Replace -kernel-info-end-lto with -no-kernel-info-end-lto
jdenny-ornl Oct 11, 2024
d3beccf
Apply clang-format
jdenny-ornl Oct 11, 2024
5a4b873
Avoid auto, as requested
jdenny-ornl Oct 14, 2024
571181b
For function name, use debug info or keep @
jdenny-ornl Oct 14, 2024
a5ce547
Use anonymous namespace
jdenny-ornl Oct 16, 2024
4d60911
Remove currently unused capabilities, as requested
jdenny-ornl Oct 16, 2024
0c30e7c
Rename test files without LLVM IR to .test
jdenny-ornl Oct 16, 2024
f5a6fbd
Regenerate OpenMP tests from current clang
jdenny-ornl Oct 17, 2024
baad223
Include LLVM value name in alloca report
jdenny-ornl Oct 17, 2024
86f9683
Merge branch 'main' into kernel-info-pr
jdenny-ornl Nov 27, 2024
c9aebce
Update expected amdgpu-max-num-workgroups default values
jdenny-ornl Nov 27, 2024
8982f8f
Merge branch 'main' into kernel-info-pr
jdenny-ornl Dec 27, 2024
151bfb3
Regenerate OpenMP tests from current clang
jdenny-ornl Dec 27, 2024
ff33eb3
Merge branch 'main' into kernel-info-pr
jdenny-ornl Jan 6, 2025
bb9d5c2
Relocate and use llvm::omp::getDeviceKernels
jdenny-ornl Jan 6, 2025
0a347cf
Extend test to cover dyn and non-entry allocas
jdenny-ornl Jan 7, 2025
2d321ce
Revert "Relocate and use llvm::omp::getDeviceKernels"
jdenny-ornl Jan 27, 2025
b9447c0
Merge branch 'main' into kernel-info-pr
jdenny-ornl Jan 27, 2025
1f1ca6c
Relocate and use OpenMPOpt.cpp's isKernelCC
jdenny-ornl Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions llvm/docs/KernelInfo.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
==========
KernelInfo
==========

.. contents::
:local:

Introduction
============

This LLVM IR pass reports various statistics for codes compiled for GPUs. The
goal of these statistics is to help identify bad code patterns and ways to
mitigate them. The pass operates at the LLVM IR level so that it can, in
theory, support any LLVM-based compiler for programming languages supporting
GPUs.

By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
command lines appear in the next section.

Remarks include summary statistics (e.g., total size of static allocas) and
individual occurrences (e.g., source location of each alloca). Examples of the
output appear in tests in `llvm/test/Analysis/KernelInfo`.

Example Command Lines
=====================

To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:

.. code-block:: shell

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info

To analyze specified LLVM IR, perhaps previously generated by something like
``clang -save-temps -g -fopenmp --offload-arch=native test.c``:

.. code-block:: shell

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info -passes=kernel-info

When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that
behavior so you can position ``kernel-info`` explicitly:

.. code-block:: shell

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info \
-Xoffload-linker --lto-newpm-passes='lto<O2>'

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
-Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info \
-passes='lto<O2>'

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info -no-kernel-info-end-lto \
-passes='module(kernel-info),lto<O2>'
11 changes: 11 additions & 0 deletions llvm/docs/Passes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ LLVM's Analysis and Transform Passes
.. contents::
:local:

.. toctree::
:hidden:

KernelInfo

Introduction
============
.. warning:: This document is not updated frequently, and the list of passes
Expand Down Expand Up @@ -148,6 +153,12 @@ This pass collects the count of all instructions and reports them.
Bookkeeping for "interesting" users of expressions computed from induction
variables.

``kernel-info``: GPU Kernel Info
--------------------------------

Reports various statistics for codes compiled for GPUs. This pass is
:doc:`documented separately<KernelInfo>`.

``lazy-value-info``: Lazy Value Information Analysis
----------------------------------------------------

Expand Down
35 changes: 35 additions & 0 deletions llvm/include/llvm/Analysis/KernelInfo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
//=- KernelInfo.h - Kernel Analysis -------------------------------*- C++ -*-=//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file defines the KernelInfoPrinter class used to emit remarks about
// function properties from a GPU kernel.
//
// See llvm/docs/KernelInfo.rst.
// ===---------------------------------------------------------------------===//

#ifndef LLVM_ANALYSIS_KERNELINFO_H
#define LLVM_ANALYSIS_KERNELINFO_H

#include "llvm/IR/PassManager.h"

namespace llvm {

class TargetMachine;

class KernelInfoPrinter : public PassInfoMixin<KernelInfoPrinter> {
TargetMachine *TM;

public:
explicit KernelInfoPrinter(TargetMachine *TM) : TM(TM) {}

PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

static bool isRequired() { return true; }
};
} // namespace llvm
#endif // LLVM_ANALYSIS_KERNELINFO_H
14 changes: 14 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1886,6 +1886,11 @@ class TargetTransformInfo {

/// @}

/// Collect kernel launch bounds for \p F into \p LB.
void collectKernelLaunchBounds(
const Function &F,
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const;

private:
/// The abstract base class used to type erase specific TTI
/// implementations.
Expand Down Expand Up @@ -2324,6 +2329,9 @@ class TargetTransformInfo::Concept {
virtual unsigned getMaxNumArgs() const = 0;
virtual unsigned getNumBytesToPadGlobalArray(unsigned Size,
Type *ArrayType) const = 0;
virtual void collectKernelLaunchBounds(
const Function &F,
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const = 0;
};

template <typename T>
Expand Down Expand Up @@ -3169,6 +3177,12 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
Type *ArrayType) const override {
return Impl.getNumBytesToPadGlobalArray(Size, ArrayType);
}

void collectKernelLaunchBounds(
const Function &F,
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const override {
Impl.collectKernelLaunchBounds(F, LB);
}
};

template <typename T>
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -1049,6 +1049,10 @@ class TargetTransformInfoImplBase {
return 0;
}

void collectKernelLaunchBounds(
const Function &F,
SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const {}

protected:
// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.
Expand Down
12 changes: 12 additions & 0 deletions llvm/include/llvm/IR/Function.h
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,18 @@ class LLVM_ABI Function : public GlobalObject, public ilist_node<Function> {
setValueSubclassData((getSubclassDataFromValue() & 0xc00f) | (ID << 4));
}

/// Does it have a kernel calling convention?
bool hasKernelCallingConv() const {
switch (getCallingConv()) {
default:
return false;
case CallingConv::PTX_Kernel:
case CallingConv::AMDGPU_KERNEL:
case CallingConv::SPIR_KERNEL:
return true;
}
}

enum ProfileCountType { PCT_Real, PCT_Synthetic };

/// Class to represent profile counts.
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/Target/TargetMachine.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "llvm/MC/MCStreamer.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/PGOOptions.h"
#include "llvm/Target/CGPassBuilderOption.h"
Expand All @@ -28,6 +29,8 @@
#include <string>
#include <utility>

extern llvm::cl::opt<bool> NoKernelInfoEndLTO;

namespace llvm {

class AAManager;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Analysis/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ add_llvm_component_library(LLVMAnalysis
InstructionPrecedenceTracking.cpp
InstructionSimplify.cpp
InteractiveModelRunner.cpp
KernelInfo.cpp
LastRunTrackingAnalysis.cpp
LazyBranchProbabilityInfo.cpp
LazyBlockFrequencyInfo.cpp
Expand Down
Loading
Loading