-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
5a671f6
[KernelInfo] Implement new LLVM IR pass for GPU code analysis
jdenny-ornl a7656de
Move docs to KernelInfo.rst
jdenny-ornl d92856e
Move conditional outside registration call
jdenny-ornl 6ac3f41
Use llvm::SmallString
jdenny-ornl 6367ad7
Use TTI.getFlatAddressSpace for addrspace(0)
jdenny-ornl 78446bb
Avoid repetition between amdgpu and nvptx tests
jdenny-ornl fede524
Use named values in tests
jdenny-ornl 4c30b8a
Say flat address space instead of addrspace(0)
jdenny-ornl 33f0d4d
Cache the flat address space
jdenny-ornl a2a512c
Link KernelInfo.rst from Passes.rst
jdenny-ornl de04ac4
Don't filter out cpus
jdenny-ornl ec5d2bd
Include less in header
jdenny-ornl c06b905
Removed unused comparison operators
jdenny-ornl d83d22a
Remove redundant null check
jdenny-ornl 1649cf8
Move KernelInfo to KernelInfo.cpp, remove KernelInfoAnalysis
jdenny-ornl 1a3c0ae
Use printAsOperand not getName to identify instruction
jdenny-ornl ea89a81
Use printAsOperand to report indirect callee
jdenny-ornl 8da602b
Report inline assembly calls
jdenny-ornl 45114fd
Use llvm::SmallString
jdenny-ornl eea139c
Use llvm::SmallString
jdenny-ornl 8bf6e4e
getKernelInfo -> emitKernelInfo because return is unused
jdenny-ornl d2ee05d
Merge branch 'main' into kernel-info-pr
jdenny-ornl 9b865f4
Merge branch 'main' into kernel-info-pr
jdenny-ornl 39979f7
Merge branch 'main' into kernel-info-pr
jdenny-ornl 62d494d
Clean up launch bounds
jdenny-ornl e4d3fca
Merge branch 'main' into kernel-info-pr
jdenny-ornl 94d90d1
Adjust forEachLaunchBound param
jdenny-ornl 762a217
Reuse Function::getFnAttributeAsParsedInteger
jdenny-ornl df66a3d
Move forEachLaunchBound to TargetTransformInfo
jdenny-ornl 5488764
Merge branch 'main' into kernel-info-pr
jdenny-ornl 3f63d53
forEachLaunchBound -> collectLaunchBounds
jdenny-ornl 3b6ce07
Merge branch 'main' into kernel-info-pr
jdenny-ornl feeaa37
Remove redundant private
jdenny-ornl b9b95a2
Merge branch 'main' into kernel-info-pr
jdenny-ornl 116f1c9
Remove todos, as requested
jdenny-ornl 2094465
Combine registerFullLinkTimeOptimizationLastEPCallback calls
jdenny-ornl 39bce7c
collectLaunchBounds -> collectKernelLaunchBounds
jdenny-ornl 14345cf
Spell kernel-info properties like their IR attributes
jdenny-ornl ad393d2
Replace -kernel-info-end-lto with -no-kernel-info-end-lto
jdenny-ornl d3beccf
Apply clang-format
jdenny-ornl 5a4b873
Avoid auto, as requested
jdenny-ornl 571181b
For function name, use debug info or keep @
jdenny-ornl a5ce547
Use anonymous namespace
jdenny-ornl 4d60911
Remove currently unused capabilities, as requested
jdenny-ornl 0c30e7c
Rename test files without LLVM IR to .test
jdenny-ornl f5a6fbd
Regenerate OpenMP tests from current clang
jdenny-ornl baad223
Include LLVM value name in alloca report
jdenny-ornl 86f9683
Merge branch 'main' into kernel-info-pr
jdenny-ornl c9aebce
Update expected amdgpu-max-num-workgroups default values
jdenny-ornl 8982f8f
Merge branch 'main' into kernel-info-pr
jdenny-ornl 151bfb3
Regenerate OpenMP tests from current clang
jdenny-ornl ff33eb3
Merge branch 'main' into kernel-info-pr
jdenny-ornl bb9d5c2
Relocate and use llvm::omp::getDeviceKernels
jdenny-ornl 0a347cf
Extend test to cover dyn and non-entry allocas
jdenny-ornl 2d321ce
Revert "Relocate and use llvm::omp::getDeviceKernels"
jdenny-ornl b9447c0
Merge branch 'main' into kernel-info-pr
jdenny-ornl 1f1ca6c
Relocate and use OpenMPOpt.cpp's isKernelCC
jdenny-ornl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
========== | ||
KernelInfo | ||
========== | ||
|
||
.. contents:: | ||
:local: | ||
|
||
Introduction | ||
============ | ||
|
||
This LLVM IR pass reports various statistics for codes compiled for GPUs. The | ||
goal of these statistics is to help identify bad code patterns and ways to | ||
mitigate them. The pass operates at the LLVM IR level so that it can, in | ||
theory, support any LLVM-based compiler for programming languages supporting | ||
GPUs. | ||
|
||
By default, the pass is disabled. For convenience, the command-line option | ||
``-kernel-info-end-lto`` inserts it at the end of LTO, and options like | ||
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang`` | ||
command lines appear in the next section. | ||
|
||
Remarks include summary statistics (e.g., total size of static allocas) and | ||
individual occurrences (e.g., source location of each alloca). Examples of the | ||
output appear in tests in `llvm/test/Analysis/KernelInfo`. | ||
|
||
Example Command Lines | ||
===================== | ||
|
||
To analyze a C program as it appears to an LLVM GPU backend at the end of LTO: | ||
|
||
.. code-block:: shell | ||
|
||
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ | ||
-Rpass=kernel-info -mllvm -kernel-info-end-lto | ||
|
||
To analyze specified LLVM IR, perhaps previously generated by something like | ||
``clang -save-temps -g -fopenmp --offload-arch=native test.c``: | ||
|
||
.. code-block:: shell | ||
|
||
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ | ||
-pass-remarks=kernel-info -passes=kernel-info | ||
|
||
kernel-info can also be inserted into a specified LLVM pass pipeline using | ||
``-kernel-info-end-lto``, or it can be positioned explicitly in that pipeline: | ||
|
||
.. code-block:: shell | ||
|
||
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ | ||
-Rpass=kernel-info -mllvm -kernel-info-end-lto \ | ||
-Xoffload-linker --lto-newpm-passes='lto<O2>' | ||
|
||
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ | ||
-Rpass=kernel-info \ | ||
-Xoffload-linker --lto-newpm-passes='lto<O2>,module(kernel-info)' | ||
|
||
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ | ||
-pass-remarks=kernel-info -kernel-info-end-lto -passes='lto<O2>' | ||
|
||
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ | ||
-pass-remarks=kernel-info -passes='lto<O2>,module(kernel-info)' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
//=- KernelInfo.h - Kernel Analysis -------------------------------*- C++ -*-=// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This file defines the KernelInfo, KernelInfoAnalysis, and KernelInfoPrinter | ||
// classes used to extract function properties from a GPU kernel. | ||
// | ||
// See llvm/docs/KernelInfo.rst. | ||
// ===---------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_ANALYSIS_KERNELINFO_H | ||
#define LLVM_ANALYSIS_KERNELINFO_H | ||
|
||
#include "llvm/Analysis/OptimizationRemarkEmitter.h" | ||
#include "llvm/Analysis/TargetTransformInfo.h" | ||
|
||
namespace llvm { | ||
class DominatorTree; | ||
class Function; | ||
|
||
/// Data structure holding function info for kernels. | ||
class KernelInfo { | ||
jdenny-ornl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
void updateForBB(const BasicBlock &BB, int64_t Direction, | ||
OptimizationRemarkEmitter &ORE, | ||
const TargetTransformInfo &TTI); | ||
|
||
public: | ||
static KernelInfo getKernelInfo(Function &F, FunctionAnalysisManager &FAM); | ||
|
||
bool operator==(const KernelInfo &FPI) const { | ||
return std::memcmp(this, &FPI, sizeof(KernelInfo)) == 0; | ||
jdenny-ornl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
bool operator!=(const KernelInfo &FPI) const { return !(*this == FPI); } | ||
|
||
/// If false, nothing was recorded here because the supplied function didn't | ||
/// appear in a module compiled for a GPU. | ||
bool IsValid = false; | ||
|
||
/// Whether the function has external linkage and is not a kernel function. | ||
bool ExternalNotKernel = false; | ||
|
||
/// OpenMP Launch bounds. | ||
///@{ | ||
std::optional<int64_t> OmpTargetNumTeams; | ||
jdenny-ornl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
std::optional<int64_t> OmpTargetThreadLimit; | ||
///@} | ||
|
||
/// AMDGPU launch bounds. | ||
///@{ | ||
std::optional<int64_t> AmdgpuMaxNumWorkgroupsX; | ||
std::optional<int64_t> AmdgpuMaxNumWorkgroupsY; | ||
std::optional<int64_t> AmdgpuMaxNumWorkgroupsZ; | ||
std::optional<int64_t> AmdgpuFlatWorkGroupSizeMin; | ||
jdenny-ornl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
std::optional<int64_t> AmdgpuFlatWorkGroupSizeMax; | ||
std::optional<int64_t> AmdgpuWavesPerEuMin; | ||
std::optional<int64_t> AmdgpuWavesPerEuMax; | ||
jdenny-ornl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
///@} | ||
|
||
/// NVPTX launch bounds. | ||
///@{ | ||
std::optional<int64_t> Maxclusterrank; | ||
std::optional<int64_t> Maxntidx; | ||
///@} | ||
|
||
/// The number of alloca instructions inside the function, the number of those | ||
/// with allocation sizes that cannot be determined at compile time, and the | ||
/// sum of the sizes that can be. | ||
/// | ||
/// With the current implementation for at least some GPU archs, | ||
/// AllocasDyn > 0 might not be possible, but we report AllocasDyn anyway in | ||
/// case the implementation changes. | ||
int64_t Allocas = 0; | ||
int64_t AllocasDyn = 0; | ||
int64_t AllocasStaticSizeSum = 0; | ||
|
||
/// Number of direct/indirect calls (anything derived from CallBase). | ||
int64_t DirectCalls = 0; | ||
int64_t IndirectCalls = 0; | ||
|
||
/// Number of direct calls made from this function to other functions | ||
/// defined in this module. | ||
int64_t DirectCallsToDefinedFunctions = 0; | ||
|
||
/// Number of calls of type InvokeInst. | ||
int64_t Invokes = 0; | ||
|
||
/// Number of addrspace(0) memory accesses (via load, store, etc.). | ||
int64_t AddrspaceZeroAccesses = 0; | ||
}; | ||
|
||
/// Analysis class for KernelInfo. | ||
class KernelInfoAnalysis : public AnalysisInfoMixin<KernelInfoAnalysis> { | ||
public: | ||
static AnalysisKey Key; | ||
|
||
using Result = const KernelInfo; | ||
|
||
KernelInfo run(Function &F, FunctionAnalysisManager &FAM) { | ||
return KernelInfo::getKernelInfo(F, FAM); | ||
} | ||
}; | ||
|
||
/// Printer pass for KernelInfoAnalysis. | ||
/// | ||
/// It just calls KernelInfoAnalysis, which prints remarks if they are enabled. | ||
class KernelInfoPrinter : public PassInfoMixin<KernelInfoPrinter> { | ||
public: | ||
explicit KernelInfoPrinter() {} | ||
|
||
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) { | ||
AM.getResult<KernelInfoAnalysis>(F); | ||
return PreservedAnalyses::all(); | ||
} | ||
|
||
static bool isRequired() { return true; } | ||
}; | ||
} // namespace llvm | ||
#endif // LLVM_ANALYSIS_KERNELINFO_H |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.