Skip to content

[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5a671f6
[KernelInfo] Implement new LLVM IR pass for GPU code analysis
jdenny-ornl Aug 12, 2024
a7656de
Move docs to KernelInfo.rst
jdenny-ornl Aug 12, 2024
d92856e
Move conditional outside registration call
jdenny-ornl Aug 12, 2024
6ac3f41
Use llvm::SmallString
jdenny-ornl Aug 12, 2024
6367ad7
Use TTI.getFlatAddressSpace for addrspace(0)
jdenny-ornl Aug 12, 2024
78446bb
Avoid repetition between amdgpu and nvptx tests
jdenny-ornl Aug 12, 2024
fede524
Use named values in tests
jdenny-ornl Aug 12, 2024
4c30b8a
Say flat address space instead of addrspace(0)
jdenny-ornl Aug 13, 2024
33f0d4d
Cache the flat address space
jdenny-ornl Aug 13, 2024
a2a512c
Link KernelInfo.rst from Passes.rst
jdenny-ornl Aug 13, 2024
de04ac4
Don't filter out cpus
jdenny-ornl Aug 13, 2024
ec5d2bd
Include less in header
jdenny-ornl Aug 16, 2024
c06b905
Removed unused comparison operators
jdenny-ornl Aug 16, 2024
d83d22a
Remove redundant null check
jdenny-ornl Aug 16, 2024
1649cf8
Move KernelInfo to KernelInfo.cpp, remove KernelInfoAnalysis
jdenny-ornl Aug 16, 2024
1a3c0ae
Use printAsOperand not getName to identify instruction
jdenny-ornl Aug 16, 2024
ea89a81
Use printAsOperand to report indirect callee
jdenny-ornl Aug 16, 2024
8da602b
Report inline assembly calls
jdenny-ornl Aug 16, 2024
45114fd
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
eea139c
Use llvm::SmallString
jdenny-ornl Aug 16, 2024
8bf6e4e
getKernelInfo -> emitKernelInfo because return is unused
jdenny-ornl Aug 16, 2024
d2ee05d
Merge branch 'main' into kernel-info-pr
jdenny-ornl Aug 21, 2024
9b865f4
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 5, 2024
39979f7
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 12, 2024
62d494d
Clean up launch bounds
jdenny-ornl Sep 13, 2024
e4d3fca
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 16, 2024
94d90d1
Adjust forEachLaunchBound param
jdenny-ornl Sep 16, 2024
762a217
Reuse Function::getFnAttributeAsParsedInteger
jdenny-ornl Sep 16, 2024
df66a3d
Move forEachLaunchBound to TargetTransformInfo
jdenny-ornl Sep 16, 2024
5488764
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 26, 2024
3f63d53
forEachLaunchBound -> collectLaunchBounds
jdenny-ornl Sep 26, 2024
3b6ce07
Merge branch 'main' into kernel-info-pr
jdenny-ornl Sep 28, 2024
feeaa37
Remove redundant private
jdenny-ornl Sep 28, 2024
b9b95a2
Merge branch 'main' into kernel-info-pr
jdenny-ornl Oct 11, 2024
116f1c9
Remove todos, as requested
jdenny-ornl Oct 11, 2024
2094465
Combine registerFullLinkTimeOptimizationLastEPCallback calls
jdenny-ornl Oct 11, 2024
39bce7c
collectLaunchBounds -> collectKernelLaunchBounds
jdenny-ornl Oct 11, 2024
14345cf
Spell kernel-info properties like their IR attributes
jdenny-ornl Oct 11, 2024
ad393d2
Replace -kernel-info-end-lto with -no-kernel-info-end-lto
jdenny-ornl Oct 11, 2024
d3beccf
Apply clang-format
jdenny-ornl Oct 11, 2024
5a4b873
Avoid auto, as requested
jdenny-ornl Oct 14, 2024
571181b
For function name, use debug info or keep @
jdenny-ornl Oct 14, 2024
a5ce547
Use anonymous namespace
jdenny-ornl Oct 16, 2024
4d60911
Remove currently unused capabilities, as requested
jdenny-ornl Oct 16, 2024
0c30e7c
Rename test files without LLVM IR to .test
jdenny-ornl Oct 16, 2024
f5a6fbd
Regenerate OpenMP tests from current clang
jdenny-ornl Oct 17, 2024
baad223
Include LLVM value name in alloca report
jdenny-ornl Oct 17, 2024
86f9683
Merge branch 'main' into kernel-info-pr
jdenny-ornl Nov 27, 2024
c9aebce
Update expected amdgpu-max-num-workgroups default values
jdenny-ornl Nov 27, 2024
8982f8f
Merge branch 'main' into kernel-info-pr
jdenny-ornl Dec 27, 2024
151bfb3
Regenerate OpenMP tests from current clang
jdenny-ornl Dec 27, 2024
ff33eb3
Merge branch 'main' into kernel-info-pr
jdenny-ornl Jan 6, 2025
bb9d5c2
Relocate and use llvm::omp::getDeviceKernels
jdenny-ornl Jan 6, 2025
0a347cf
Extend test to cover dyn and non-entry allocas
jdenny-ornl Jan 7, 2025
2d321ce
Revert "Relocate and use llvm::omp::getDeviceKernels"
jdenny-ornl Jan 27, 2025
b9447c0
Merge branch 'main' into kernel-info-pr
jdenny-ornl Jan 27, 2025
1f1ca6c
Relocate and use OpenMPOpt.cpp's isKernelCC
jdenny-ornl Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions llvm/docs/KernelInfo.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
==========
KernelInfo
==========

.. contents::
:local:

Introduction
============

This LLVM IR pass reports various statistics for codes compiled for GPUs. The
goal of these statistics is to help identify bad code patterns and ways to
mitigate them. The pass operates at the LLVM IR level so that it can, in
theory, support any LLVM-based compiler for programming languages supporting
GPUs.

By default, the pass is disabled. For convenience, the command-line option
``-kernel-info-end-lto`` inserts it at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
command lines appear in the next section.

Remarks include summary statistics (e.g., total size of static allocas) and
individual occurrences (e.g., source location of each alloca). Examples of the
output appear in tests in `llvm/test/Analysis/KernelInfo`.

Example Command Lines
=====================

To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:

.. code-block:: shell

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info -mllvm -kernel-info-end-lto

To analyze specified LLVM IR, perhaps previously generated by something like
``clang -save-temps -g -fopenmp --offload-arch=native test.c``:

.. code-block:: shell

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info -passes=kernel-info

kernel-info can also be inserted into a specified LLVM pass pipeline using
``-kernel-info-end-lto``, or it can be positioned explicitly in that pipeline:

.. code-block:: shell

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info -mllvm -kernel-info-end-lto \
-Xoffload-linker --lto-newpm-passes='lto<O2>'

$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info \
-Xoffload-linker --lto-newpm-passes='lto<O2>,module(kernel-info)'

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info -kernel-info-end-lto -passes='lto<O2>'

$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
-pass-remarks=kernel-info -passes='lto<O2>,module(kernel-info)'
123 changes: 123 additions & 0 deletions llvm/include/llvm/Analysis/KernelInfo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
//=- KernelInfo.h - Kernel Analysis -------------------------------*- C++ -*-=//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file defines the KernelInfo, KernelInfoAnalysis, and KernelInfoPrinter
// classes used to extract function properties from a GPU kernel.
//
// See llvm/docs/KernelInfo.rst.
// ===---------------------------------------------------------------------===//

#ifndef LLVM_ANALYSIS_KERNELINFO_H
#define LLVM_ANALYSIS_KERNELINFO_H

#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/TargetTransformInfo.h"

namespace llvm {
class DominatorTree;
class Function;

/// Data structure holding function info for kernels.
class KernelInfo {
void updateForBB(const BasicBlock &BB, int64_t Direction,
OptimizationRemarkEmitter &ORE,
const TargetTransformInfo &TTI);

public:
static KernelInfo getKernelInfo(Function &F, FunctionAnalysisManager &FAM);

bool operator==(const KernelInfo &FPI) const {
return std::memcmp(this, &FPI, sizeof(KernelInfo)) == 0;
}

bool operator!=(const KernelInfo &FPI) const { return !(*this == FPI); }

/// If false, nothing was recorded here because the supplied function didn't
/// appear in a module compiled for a GPU.
bool IsValid = false;

/// Whether the function has external linkage and is not a kernel function.
bool ExternalNotKernel = false;

/// OpenMP Launch bounds.
///@{
std::optional<int64_t> OmpTargetNumTeams;
std::optional<int64_t> OmpTargetThreadLimit;
///@}

/// AMDGPU launch bounds.
///@{
std::optional<int64_t> AmdgpuMaxNumWorkgroupsX;
std::optional<int64_t> AmdgpuMaxNumWorkgroupsY;
std::optional<int64_t> AmdgpuMaxNumWorkgroupsZ;
std::optional<int64_t> AmdgpuFlatWorkGroupSizeMin;
std::optional<int64_t> AmdgpuFlatWorkGroupSizeMax;
std::optional<int64_t> AmdgpuWavesPerEuMin;
std::optional<int64_t> AmdgpuWavesPerEuMax;
///@}

/// NVPTX launch bounds.
///@{
std::optional<int64_t> Maxclusterrank;
std::optional<int64_t> Maxntidx;
///@}

/// The number of alloca instructions inside the function, the number of those
/// with allocation sizes that cannot be determined at compile time, and the
/// sum of the sizes that can be.
///
/// With the current implementation for at least some GPU archs,
/// AllocasDyn > 0 might not be possible, but we report AllocasDyn anyway in
/// case the implementation changes.
int64_t Allocas = 0;
int64_t AllocasDyn = 0;
int64_t AllocasStaticSizeSum = 0;

/// Number of direct/indirect calls (anything derived from CallBase).
int64_t DirectCalls = 0;
int64_t IndirectCalls = 0;

/// Number of direct calls made from this function to other functions
/// defined in this module.
int64_t DirectCallsToDefinedFunctions = 0;

/// Number of calls of type InvokeInst.
int64_t Invokes = 0;

/// Number of addrspace(0) memory accesses (via load, store, etc.).
int64_t AddrspaceZeroAccesses = 0;
};

/// Analysis class for KernelInfo.
class KernelInfoAnalysis : public AnalysisInfoMixin<KernelInfoAnalysis> {
public:
static AnalysisKey Key;

using Result = const KernelInfo;

KernelInfo run(Function &F, FunctionAnalysisManager &FAM) {
return KernelInfo::getKernelInfo(F, FAM);
}
};

/// Printer pass for KernelInfoAnalysis.
///
/// It just calls KernelInfoAnalysis, which prints remarks if they are enabled.
class KernelInfoPrinter : public PassInfoMixin<KernelInfoPrinter> {
public:
explicit KernelInfoPrinter() {}

PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
AM.getResult<KernelInfoAnalysis>(F);
return PreservedAnalyses::all();
}

static bool isRequired() { return true; }
};
} // namespace llvm
#endif // LLVM_ANALYSIS_KERNELINFO_H
3 changes: 3 additions & 0 deletions llvm/include/llvm/Target/TargetMachine.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "llvm/IR/PassManager.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/PGOOptions.h"
#include "llvm/Target/CGPassBuilderOption.h"
Expand All @@ -27,6 +28,8 @@
#include <string>
#include <utility>

extern llvm::cl::opt<bool> KernelInfoEndLTO;

namespace llvm {

class AAManager;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Analysis/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ add_llvm_component_library(LLVMAnalysis
InstructionPrecedenceTracking.cpp
InstructionSimplify.cpp
InteractiveModelRunner.cpp
KernelInfo.cpp
LazyBranchProbabilityInfo.cpp
LazyBlockFrequencyInfo.cpp
LazyCallGraph.cpp
Expand Down
Loading
Loading