Skip to content

[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 #93365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Feb 12, 2025

Conversation

EthanLuisMcDonough
Copy link
Member

@EthanLuisMcDonough EthanLuisMcDonough commented May 25, 2024

This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes:

  • Introduces __llvm_write_custom_profile to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files.
  • Adds __llvm_write_custom_profile as weak symbol to libomptarget so that it can write the collected data to a profraw file.
  • Adds PGODump debug flag and only displays dump when the aforementioned flag is set

This PR formerly only supported -fprofile-instrument=clang. This commit adds support for -fprofile-instrument=llvm
Replace getPointerBitCastOrAddrSpaceCast with getAddrSpaceCast and allow no-op getAddrSpaceCast calls when types are identical
TODO: Fix tests
TargetFilename =
(char *)COMPILER_RT_ALLOCA(FilenameLength + TargetLength + 2);

/* Prepend "TARGET." to current filename */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this handle file names with directory components? Otherwise, I end up with errors like:

LLVM Profile Error: Failed to open file : amdgcn-amd-amdhsa./home/jdenny/tmp/default_15853421304062701701_0.profraw

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be fixed now. Thanks.

Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, one minor comment

if (PDeathSig == 1)
lprofRestoreSigKill();
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up to the beginning.

@EthanLuisMcDonough EthanLuisMcDonough merged commit 9e5c136 into llvm:main Feb 12, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 12, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building compiler-rt,offload,openmp at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15627

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/pgo1.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate      -Xclang "-fprofile-instrument=llvm"
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate -Xclang -fprofile-instrument=llvm
# note: command had no output on stdout or stderr
# RUN: at line 3
env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/pgo1.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c      --check-prefix="LLVM-PGO"
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c --check-prefix=LLVM-PGO
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c:59:14: error: LLVM-PGO: expected string not found in input
# | // LLVM-PGO: ======== Counters =========
# |              ^
# | <stdin>:1:1: note: scanning from here
# | Counters:
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/pgo1.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: Counters: 
# | check:59     X~~~~~~~~~ error: no match found
# |           2:  __omp_offloading_802_b388217_main_l27: 
# | check:59     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           3:  Hash: 0x03fd5b902019ff2d 
# | check:59     ~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           4:  Counters: 4 
# | check:59     ~~~~~~~~~~~~~
# |           5:  Block counts: [20, 10, 2, 1] 
# | check:59     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           6:  test1: 
# | check:59     ~~~~~~~~
# |           .
# |           .
# |           .
# | >>>>>>
# `-----------------------------
...

flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
…vm#76587  (llvm#93365)

This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on llvm#76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
@dtellenbach
Copy link
Member

Hi @EthanLuisMcDonough, this unfortunately breaks

on one for our bots. I think you're missing COMPILER_RT_VISIBILITY.

Please fix or revert for now. Thank you!

@qiongsiwu
Copy link
Contributor

Hi @EthanLuisMcDonough ! As @dtellenbach noted earlier, this PR is breaking two tests on our bots. Could you fix it as soon as possible? I will come back and revisit by 2pm PST today. If I don't hear back from you, I will revert this PR.

Really appreciate your attention to the issue!

@EthanLuisMcDonough
Copy link
Member Author

Hi @dtellenbach and @qiongsiwu. Thank you for letting me know about this. I'm currently investigating the issue and trying to reproduce the issue locally on darwin. I'm going to try and fix this as soon as possible.

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
…vm#76587  (llvm#93365)

This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on llvm#76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
EthanLuisMcDonough added a commit that referenced this pull request Feb 17, 2025
This pull request fixes an issue that was introduced in #93365.
`__llvm_write_custom_profile` visibility was causing issues on Darwin.
This function needs to be publicly accessible in order to be accessed by
libomptarget, so this pull request makes `__llvm_write_custom_profile`
an explicitly exported symbol on Darwin. Tested on M3 and X86 macs.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
…vm#76587  (llvm#93365)

This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on llvm#76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
EthanLuisMcDonough added a commit that referenced this pull request Mar 20, 2025
This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
#93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 20, 2025
…#94268)

This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
llvm/llvm-project#93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
@dtellenbach
Copy link
Member

dtellenbach commented Mar 20, 2025

@EthanLuisMcDonough I think your patch effectively introduces a dependency on libc because __llvm_write_custom_profile has __attribute__((used)) but calls e.g. atoi through setupIOBuffer.

In compiler-rt it's not safe to make that assumption because it potentially breaks embedded platforms. IMO it's also bad practice to force used symbols into a static archive. Would you please take another look and at least make the functionality dependent on offloading or something similar?

Thanks a lot!

@EthanLuisMcDonough
Copy link
Member Author

@EthanLuisMcDonough I think your patch effectively introduces a dependency on libc because __llvm_write_custom_profile has __attribute__((used)) but calls e.g. atoi through setupIOBuffer.

In compiler-rt it's not safe to make that assumption because it potentially breaks embedded platforms. IMO it's also bad practice to force used symbols into a static archive. Would you please take another look and at least make the functionality dependent on offloading or something similar?

Thanks a lot!

Thank you for bringing this to my attention. I'll make sure to look into a fix for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants