Skip to content

[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 #93365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
530eb98
Add profiling functions to libomptarget
EthanLuisMcDonough Dec 16, 2023
fb067d4
Fix PGO instrumentation for GPU targets
EthanLuisMcDonough Dec 16, 2023
7a0e0ef
Change global visibility on GPU targets
EthanLuisMcDonough Dec 21, 2023
fddc079
Make names global public on GPU
EthanLuisMcDonough Dec 23, 2023
e9db03c
Read and print GPU device PGO globals
EthanLuisMcDonough Dec 29, 2023
aa83bd2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Dec 29, 2023
e468760
Fix rebase bug
EthanLuisMcDonough Jan 3, 2024
ec18ce9
Refactor portions to be more idiomatic
EthanLuisMcDonough Jan 3, 2024
0872556
Reformat DeviceRTL prof functions
EthanLuisMcDonough Jan 3, 2024
94f47f3
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 3, 2024
62f31d1
Style changes + catch name error
EthanLuisMcDonough Jan 9, 2024
0c4bbeb
Add GPU PGO test
EthanLuisMcDonough Jan 18, 2024
c7ae2a7
Fix PGO test formatting
EthanLuisMcDonough Jan 18, 2024
9e66bfb
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 19, 2024
8bb2207
Refactor visibility logic
EthanLuisMcDonough Jan 19, 2024
9f13943
Add LLVM instrumentation support
EthanLuisMcDonough Jan 24, 2024
b28d4a9
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jan 24, 2024
23d7fe2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 14, 2024
0606f0d
Use explicit addrspace instead of unqual
EthanLuisMcDonough Feb 14, 2024
23f75b2
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 15, 2024
c1f9be3
Remove redundant namespaces
EthanLuisMcDonough Feb 16, 2024
721dac6
Merge branch 'main' into gpuprof
EthanLuisMcDonough Feb 16, 2024
6a3ae40
Clang format
EthanLuisMcDonough Feb 16, 2024
6866862
Use getAddrSpaceCast
EthanLuisMcDonough Feb 16, 2024
62a5ee1
Revert "Use getAddrSpaceCast"
EthanLuisMcDonough Feb 27, 2024
052394f
Revert "Use getAddrSpaceCast"
EthanLuisMcDonough Feb 27, 2024
612d5a5
Write PGO
EthanLuisMcDonough Mar 1, 2024
b8c9163
Fix tests
EthanLuisMcDonough Mar 14, 2024
4be80e5
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 7, 2024
b2fe222
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 7, 2024
7770b37
Fix params
EthanLuisMcDonough May 7, 2024
f6a1545
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 9, 2024
92260d8
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 9, 2024
1dbde8e
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 13, 2024
1278989
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 13, 2024
ed2a289
Merge branch 'main' into gpuprof_ptrcastfix
EthanLuisMcDonough May 13, 2024
aa895a1
Fix elf obj file
EthanLuisMcDonough Mar 19, 2024
2031e49
Add more addrspace casts for GPU targets
EthanLuisMcDonough May 7, 2024
5de6082
Merge branch 'gpuprof_ptrcastfix' into gpuprofwrite
EthanLuisMcDonough May 13, 2024
be6524b
Have test read from profraw instead of dump
EthanLuisMcDonough May 13, 2024
2ba27e8
Merge branch 'main' into gpuprof
EthanLuisMcDonough May 21, 2024
c754f7f
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 24, 2024
2b8eb29
Fix PGO test format
EthanLuisMcDonough May 25, 2024
67f3009
Refactor profile writer
EthanLuisMcDonough May 25, 2024
cee07bc
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough May 27, 2024
e8ad132
Fix refactor bug
EthanLuisMcDonough May 27, 2024
4c9f814
Make requested clang-format change
EthanLuisMcDonough May 28, 2024
9cddcf4
Merge branch 'main' into gpuprof
EthanLuisMcDonough Jun 1, 2024
53d6309
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Jun 1, 2024
344e357
Tighten PGO test requirements
EthanLuisMcDonough Jun 1, 2024
2f75142
Tighten PGO test requirements
EthanLuisMcDonough Jun 1, 2024
488cb4a
Apply requested formatting changes
EthanLuisMcDonough Jun 26, 2024
b90c015
Add memop function shim to DeviceRTL
EthanLuisMcDonough Jun 26, 2024
dc90a5c
Merge branch 'gpuprof' into gpuprofwrite
EthanLuisMcDonough Jun 27, 2024
c68c6e2
Make requested changes
EthanLuisMcDonough Jun 27, 2024
ca52c58
Only dump counters if PGODump flag is set
EthanLuisMcDonough Jun 27, 2024
0da7627
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Aug 10, 2024
ee4431a
Update requirements
EthanLuisMcDonough Aug 10, 2024
efe70ad
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Aug 10, 2024
fb699b6
Merge changes
EthanLuisMcDonough Aug 10, 2024
10e6c48
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Sep 4, 2024
1d0a961
Add llvm-profdata substitution to offload tests
EthanLuisMcDonough Oct 25, 2024
0ac2d5f
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Oct 25, 2024
c6b34ad
Prepend target prefix to basename
EthanLuisMcDonough Oct 28, 2024
94ed55b
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Nov 15, 2024
26f5428
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Dec 10, 2024
d9e864e
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Dec 27, 2024
55d673d
Merge branch 'main' into gpuprofwrite
EthanLuisMcDonough Feb 11, 2025
fa7589e
Move version check to front
EthanLuisMcDonough Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions compiler-rt/lib/profile/InstrProfiling.h
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,17 @@ int __llvm_profile_get_padding_sizes_for_counters(
*/
void __llvm_profile_set_dumped(void);

/*!
* \brief Write custom target-specific profiling data to a seperate file.
* Used by offload PGO.
*/
int __llvm_write_custom_profile(const char *Target,
const __llvm_profile_data *DataBegin,
const __llvm_profile_data *DataEnd,
const char *CountersBegin,
const char *CountersEnd, const char *NamesBegin,
const char *NamesEnd);

/*!
* This variable is defined in InstrProfilingRuntime.cpp as a hidden
* symbol. Its main purpose is to enable profile runtime user to
Expand Down
124 changes: 115 additions & 9 deletions compiler-rt/lib/profile/InstrProfilingFile.c
Original file line number Diff line number Diff line change
Expand Up @@ -541,6 +541,17 @@ static FILE *getFileObject(const char *OutputName) {
return fopen(OutputName, "ab");
}

static void closeFileObject(FILE *OutputFile) {
if (OutputFile == getProfileFile()) {
fflush(OutputFile);
if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) {
lprofUnlockFileHandle(OutputFile);
}
} else {
fclose(OutputFile);
}
}

/* Write profile data to file \c OutputName. */
static int writeFile(const char *OutputName) {
int RetVal;
Expand All @@ -562,15 +573,7 @@ static int writeFile(const char *OutputName) {
initFileWriter(&fileWriter, OutputFile);
RetVal = lprofWriteData(&fileWriter, lprofGetVPDataReader(), MergeDone);

if (OutputFile == getProfileFile()) {
fflush(OutputFile);
if (doMerging() && !__llvm_profile_is_continuous_mode_enabled()) {
lprofUnlockFileHandle(OutputFile);
}
} else {
fclose(OutputFile);
}

closeFileObject(OutputFile);
return RetVal;
}

Expand Down Expand Up @@ -1359,4 +1362,107 @@ COMPILER_RT_VISIBILITY int __llvm_profile_set_file_object(FILE *File,
return 0;
}

int __llvm_write_custom_profile(const char *Target,
const __llvm_profile_data *DataBegin,
const __llvm_profile_data *DataEnd,
const char *CountersBegin,
const char *CountersEnd, const char *NamesBegin,
const char *NamesEnd) {
int ReturnValue = 0, FilenameLength, TargetLength;
char *FilenameBuf, *TargetFilename;
const char *Filename;

/* Save old profile data */
FILE *oldFile = getProfileFile();

// Temporarily suspend getting SIGKILL when the parent exits.
int PDeathSig = lprofSuspendSigKill();

if (lprofProfileDumped() || __llvm_profile_is_continuous_mode_enabled()) {
PROF_NOTE("Profile data not written to file: %s.\n", "already written");
if (PDeathSig == 1)
lprofRestoreSigKill();
return 0;
}

/* Check if there is llvm/runtime version mismatch. */
if (GET_VERSION(__llvm_profile_get_version()) != INSTR_PROF_RAW_VERSION) {
PROF_ERR("Runtime and instrumentation version mismatch : "
"expected %d, but get %d\n",
INSTR_PROF_RAW_VERSION,
(int)GET_VERSION(__llvm_profile_get_version()));
if (PDeathSig == 1)
lprofRestoreSigKill();
return -1;
}

/* Get current filename */
FilenameLength = getCurFilenameLength();
FilenameBuf = (char *)COMPILER_RT_ALLOCA(FilenameLength + 1);
Filename = getCurFilename(FilenameBuf, 0);

/* Check the filename. */
if (!Filename) {
PROF_ERR("Failed to write file : %s\n", "Filename not set");
if (PDeathSig == 1)
lprofRestoreSigKill();
return -1;
}

/* Allocate new space for our target-specific PGO filename */
TargetLength = strlen(Target);
TargetFilename =
(char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2);

/* Find file basename and path sizes */
int32_t DirEnd = FilenameLength - 1;
while (DirEnd >= 0 && !IS_DIR_SEPARATOR(Filename[DirEnd])) {
DirEnd--;
}
uint32_t DirSize = DirEnd + 1, BaseSize = FilenameLength - DirSize;

/* Prepend "TARGET." to current filename */
if (DirSize > 0) {
memcpy(TargetFilename, Filename, DirSize);
}
memcpy(TargetFilename + DirSize, Target, TargetLength);
TargetFilename[TargetLength + DirSize] = '.';
memcpy(TargetFilename + DirSize + 1 + TargetLength, Filename + DirSize,
BaseSize);
TargetFilename[FilenameLength + 1 + TargetLength] = 0;

/* Open and truncate target-specific PGO file */
FILE *OutputFile = fopen(TargetFilename, "w");
setProfileFile(OutputFile);

if (!OutputFile) {
PROF_ERR("Failed to open file : %s\n", TargetFilename);
if (PDeathSig == 1)
lprofRestoreSigKill();
return -1;
}

FreeHook = &free;
setupIOBuffer();

/* Write custom data */
ProfDataWriter fileWriter;
initFileWriter(&fileWriter, OutputFile);

/* Write custom data to the file */
ReturnValue = lprofWriteDataImpl(
&fileWriter, DataBegin, DataEnd, CountersBegin, CountersEnd, NULL, NULL,
lprofGetVPDataReader(), NULL, NULL, NULL, NULL, NamesBegin, NamesEnd, 0);
closeFileObject(OutputFile);

// Restore SIGKILL.
if (PDeathSig == 1)
lprofRestoreSigKill();

/* Restore old profiling file */
setProfileFile(oldFile);

return ReturnValue;
}

#endif
1 change: 1 addition & 0 deletions offload/include/Shared/Environment.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ enum class DeviceDebugKind : uint32_t {
FunctionTracing = 1U << 1,
CommonIssues = 1U << 2,
AllocationTracker = 1U << 3,
PGODump = 1U << 4,
};

struct DeviceEnvironmentTy {
Expand Down
12 changes: 10 additions & 2 deletions offload/plugins-nextgen/common/include/GlobalHandler.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,22 @@ struct __llvm_profile_data {
#include "llvm/ProfileData/InstrProfData.inc"
};

extern "C" {
extern int __attribute__((weak)) __llvm_write_custom_profile(
const char *Target, const __llvm_profile_data *DataBegin,
const __llvm_profile_data *DataEnd, const char *CountersBegin,
const char *CountersEnd, const char *NamesBegin, const char *NamesEnd);
}

/// PGO profiling data extracted from a GPU device
struct GPUProfGlobals {
SmallVector<uint8_t> NamesData;
SmallVector<SmallVector<int64_t>> Counts;
SmallVector<int64_t> Counts;
SmallVector<__llvm_profile_data> Data;
SmallVector<uint8_t> NamesData;
Triple TargetTriple;

void dump() const;
Error write() const;
};

/// Subclass of GlobalTy that holds the memory for a global of \p Ty.
Expand Down
57 changes: 48 additions & 9 deletions offload/plugins-nextgen/common/src/GlobalHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device,
GlobalTy CountGlobal(NameOrErr->str(), Sym.getSize(), Counts.data());
if (auto Err = readGlobalFromDevice(Device, Image, CountGlobal))
return Err;
DeviceProfileData.Counts.push_back(std::move(Counts));
DeviceProfileData.Counts.append(std::move(Counts));
} else if (NameOrErr->starts_with(getInstrProfDataVarPrefix())) {
// Read profiling data for this global variable
__llvm_profile_data Data{};
Expand All @@ -224,15 +224,14 @@ void GPUProfGlobals::dump() const {
<< "\n";

outs() << "======== Counters =========\n";
for (const auto &Count : Counts) {
outs() << "[";
for (size_t i = 0; i < Count.size(); i++) {
if (i == 0)
outs() << " ";
outs() << Count[i] << " ";
}
outs() << "]\n";
for (size_t i = 0; i < Counts.size(); i++) {
if (i > 0 && i % 10 == 0)
outs() << "\n";
else if (i != 0)
outs() << " ";
outs() << Counts[i];
}
outs() << "\n";

outs() << "========== Data ===========\n";
for (const auto &ProfData : Data) {
Expand Down Expand Up @@ -264,3 +263,43 @@ void GPUProfGlobals::dump() const {
Symtab.dumpNames(outs());
outs() << "===========================\n";
}

Error GPUProfGlobals::write() const {
if (!__llvm_write_custom_profile)
return Plugin::error("Could not find symbol __llvm_write_custom_profile. "
"The compiler-rt profiling library must be linked for "
"GPU PGO to work.");

size_t DataSize = Data.size() * sizeof(__llvm_profile_data),
CountsSize = Counts.size() * sizeof(int64_t);
__llvm_profile_data *DataBegin, *DataEnd;
char *CountersBegin, *CountersEnd, *NamesBegin, *NamesEnd;

// Initialize array of contiguous data. We need to make sure each section is
// contiguous so that the PGO library can compute deltas properly
SmallVector<uint8_t> ContiguousData(NamesData.size() + DataSize + CountsSize);

// Compute region pointers
DataBegin = (__llvm_profile_data *)(ContiguousData.data() + CountsSize);
DataEnd =
(__llvm_profile_data *)(ContiguousData.data() + CountsSize + DataSize);
CountersBegin = (char *)ContiguousData.data();
CountersEnd = (char *)(ContiguousData.data() + CountsSize);
NamesBegin = (char *)(ContiguousData.data() + CountsSize + DataSize);
NamesEnd = (char *)(ContiguousData.data() + CountsSize + DataSize +
NamesData.size());

// Copy data to contiguous buffer
memcpy(DataBegin, Data.data(), DataSize);
memcpy(CountersBegin, Counts.data(), CountsSize);
memcpy(NamesBegin, NamesData.data(), NamesData.size());

// Invoke compiler-rt entrypoint
int result = __llvm_write_custom_profile(TargetTriple.str().c_str(),
DataBegin, DataEnd, CountersBegin,
CountersEnd, NamesBegin, NamesEnd);
if (result != 0)
return Plugin::error("Error writing GPU PGO data to file");

return Plugin::success();
}
10 changes: 8 additions & 2 deletions offload/plugins-nextgen/common/src/PluginInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -861,8 +861,14 @@ Error GenericDeviceTy::deinit(GenericPluginTy &Plugin) {
if (!ProfOrErr)
return ProfOrErr.takeError();

// TODO: write data to profiling file
ProfOrErr->dump();
// Dump out profdata
if ((OMPX_DebugKind.get() & uint32_t(DeviceDebugKind::PGODump)) ==
uint32_t(DeviceDebugKind::PGODump))
ProfOrErr->dump();

// Write data to profiling file
if (auto Err = ProfOrErr->write())
return Err;
}

// Delete the memory manager before deinitializing the device. Otherwise,
Expand Down
4 changes: 4 additions & 0 deletions offload/test/lit.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,10 @@ config.available_features.add(config.libomptarget_current_target)
if config.libomptarget_has_libc:
config.available_features.add('libc')

profdata_path = os.path.join(config.bin_llvm_tools_dir, "llvm-profdata")
if config.libomptarget_test_pgo:
config.available_features.add('pgo')
config.substitutions.append(("%profdata", profdata_path))

# Determine whether the test system supports unified memory.
# For CUDA, this is the case with compute capability 70 (Volta) or higher.
Expand Down Expand Up @@ -407,6 +409,8 @@ if config.test_fortran_compiler:
config.available_features.add('flang')
config.substitutions.append(("%flang", config.test_fortran_compiler))

config.substitutions.append(("%target_triple", config.libomptarget_current_target))

config.substitutions.append(("%openmp_flags", config.test_openmp_flags))
if config.libomptarget_current_target.startswith('nvptx') and config.cuda_path:
config.substitutions.append(("%cuda_flags", "--cuda-path=" + config.cuda_path))
Expand Down
2 changes: 1 addition & 1 deletion offload/test/lit.site.cfg.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
@AUTO_GEN_COMMENT@

config.bin_llvm_tools_dir = "@CMAKE_BINARY_DIR@/bin"
config.bin_llvm_tools_dir = "@LLVM_RUNTIME_OUTPUT_INTDIR@"
config.test_c_compiler = "@OPENMP_TEST_C_COMPILER@"
config.test_cxx_compiler = "@OPENMP_TEST_CXX_COMPILER@"
config.test_fortran_compiler="@OPENMP_TEST_Fortran_COMPILER@"
Expand Down
15 changes: 10 additions & 5 deletions offload/test/offloading/pgo1.c
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
// RUN: %libomptarget-compile-generic -fprofile-instr-generate \
// RUN: -Xclang "-fprofile-instrument=clang"
// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \
// RUN: --check-prefix="CLANG-PGO"
// RUN: %libomptarget-compile-generic -fprofile-generate \
// RUN: -Xclang "-fprofile-instrument=llvm"
// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic \
// RUN: env LLVM_PROFILE_FILE=llvm.profraw %libomptarget-run-generic 2>&1
// RUN: %profdata show --all-functions --counts \
// RUN: %target_triple.llvm.profraw | %fcheck-generic \
// RUN: --check-prefix="LLVM-PGO"

// RUN: %libomptarget-compile-generic -fprofile-instr-generate \
// RUN: -Xclang "-fprofile-instrument=clang"
// RUN: env LLVM_PROFILE_FILE=clang.profraw %libomptarget-run-generic 2>&1
// RUN: %profdata show --all-functions --counts \
// RUN: %target_triple.clang.profraw | %fcheck-generic \
// RUN: --check-prefix="CLANG-PGO"

// REQUIRES: gpu
// REQUIRES: pgo

Expand Down
1 change: 1 addition & 0 deletions openmp/docs/design/Runtimes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1522,3 +1522,4 @@ debugging features are supported.
* Enable debugging assertions in the device. ``0x01``
* Enable diagnosing common problems during offloading . ``0x4``
* Enable device malloc statistics (amdgpu only). ``0x8``
* Dump device PGO counters (only if PGO on GPU is enabled). ``0x10``