Skip to content

[profile] Add %b LLVM_PROFILE_FILE option for binary ID #123963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions clang/docs/SourceBasedCodeCoverage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ directory structure will be created. Additionally, the following special
not specified (i.e the pattern is "%m"), it's assumed that ``N = 1``. The
merge pool specifier can only occur once per filename pattern.

* "%b" expands out to the binary ID (build ID). It can be used with "%Nm" to
avoid binary signature collisions. To use it, the program should be compiled
with the build ID linker option (``--build-id`` for GNU ld or LLD,
``/build-id`` for lld-link on Windows). Linux, Windows and AIX are supported.

* "%c" expands out to nothing, but enables a mode in which profile counter
updates are continuously synced to a file. This means that if the
instrumented program crashes, or is killed by a signal, perfect coverage
Expand Down
13 changes: 10 additions & 3 deletions clang/docs/UsersManual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2880,7 +2880,8 @@ instrumentation:
environment variable to specify an alternate file. If non-default file name
is specified by both the environment variable and the command line option,
the environment variable takes precedence. The file name pattern specified
can include different modifiers: ``%p``, ``%h``, ``%m``, ``%t``, and ``%c``.
can include different modifiers: ``%p``, ``%h``, ``%m``, ``%b``, ``%t``, and
``%c``.

Any instance of ``%p`` in that file name will be replaced by the process
ID, so that you can easily distinguish the profile output from multiple
Expand All @@ -2902,11 +2903,11 @@ instrumentation:
``%p`` is that the storage requirement for raw profile data files is greatly
increased. To avoid issues like this, the ``%m`` specifier can used in the profile
name. When this specifier is used, the profiler runtime will substitute ``%m``
with a unique integer identifier associated with the instrumented binary. Additionally,
with an integer identifier associated with the instrumented binary. Additionally,
multiple raw profiles dumped from different processes that share a file system (can be
on different hosts) will be automatically merged by the profiler runtime during the
dumping. If the program links in multiple instrumented shared libraries, each library
will dump the profile data into its own profile data file (with its unique integer
will dump the profile data into its own profile data file (with its integer
id embedded in the profile name). Note that the merging enabled by ``%m`` is for raw
profile data generated by profiler runtime. The resulting merged "raw" profile data
file still needs to be converted to a different format expected by the compiler (
Expand All @@ -2916,6 +2917,12 @@ instrumentation:

$ LLVM_PROFILE_FILE="code-%m.profraw" ./code

Although rare, binary signatures used by the ``%m`` specifier can have
collisions. In this case, the ``%b`` specifier, which expands to the binary
ID (build ID in ELF and COFF), can be added. To use it, the program should be
compiled with the build ID linker option (``--build-id`` for GNU ld or LLD,
``/build-id`` for lld-link on Windows). Linux, Windows and AIX are supported.

See `this <SourceBasedCodeCoverage.html#running-the-instrumented-program>`_ section
about the ``%t``, and ``%c`` modifiers.

Expand Down
58 changes: 53 additions & 5 deletions compiler-rt/lib/profile/InstrProfilingFile.c
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ typedef struct lprofFilename {
char Hostname[COMPILER_RT_MAX_HOSTLEN];
unsigned NumPids;
unsigned NumHosts;
unsigned NumBinaryIds;
/* When in-process merging is enabled, this parameter specifies
* the total number of profile data files shared by all the processes
* spawned from the same binary. By default the value is 1. If merging
Expand All @@ -88,8 +89,8 @@ typedef struct lprofFilename {
ProfileNameSpecifier PNS;
} lprofFilename;

static lprofFilename lprofCurFilename = {0, 0, 0, {0}, NULL,
{0}, 0, 0, 0, PNS_unknown};
static lprofFilename lprofCurFilename = {0, 0, 0, {0}, NULL, {0},
0, 0, 0, 0, PNS_unknown};

static int ProfileMergeRequested = 0;
static int getProfileFileSizeForMerging(FILE *ProfileFile,
Expand Down Expand Up @@ -790,7 +791,7 @@ static int checkBounds(int Idx, int Strlen) {
* lprofcurFilename structure. */
static int parseFilenamePattern(const char *FilenamePat,
unsigned CopyFilenamePat) {
int NumPids = 0, NumHosts = 0, I;
int NumPids = 0, NumHosts = 0, NumBinaryIds = 0, I;
char *PidChars = &lprofCurFilename.PidChars[0];
char *Hostname = &lprofCurFilename.Hostname[0];
int MergingEnabled = 0;
Expand Down Expand Up @@ -855,6 +856,16 @@ static int parseFilenamePattern(const char *FilenamePat,
FilenamePat);
return -1;
}
} else if (FilenamePat[I] == 'b') {
if (!NumBinaryIds++) {
/* Check if binary ID does not exist or if its size is 0. */
if (__llvm_write_binary_ids(NULL) <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a code comment to indicate it reads the size of the binary ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PROF_WARN("Unable to get binary ID for filename pattern %s. Using "
"the default name.",
FilenamePat);
return -1;
}
}
} else if (FilenamePat[I] == 'c') {
if (__llvm_profile_is_continuous_mode_enabled()) {
PROF_WARN("%%c specifier can only be specified once in %s.\n",
Expand Down Expand Up @@ -887,6 +898,7 @@ static int parseFilenamePattern(const char *FilenamePat,

lprofCurFilename.NumPids = NumPids;
lprofCurFilename.NumHosts = NumHosts;
lprofCurFilename.NumBinaryIds = NumBinaryIds;
return 0;
}

Expand Down Expand Up @@ -934,24 +946,53 @@ static void parseAndSetFilename(const char *FilenamePat,
* filename with PID and hostname substitutions. */
/* The length to hold uint64_t followed by 3 digits pool id including '_' */
#define SIGLEN 24
/* The length to hold 160-bit hash in hexadecimal form */
#define BINARY_ID_LEN 40
static int getCurFilenameLength(void) {
int Len;
if (!lprofCurFilename.FilenamePat || !lprofCurFilename.FilenamePat[0])
return 0;

if (!(lprofCurFilename.NumPids || lprofCurFilename.NumHosts ||
lprofCurFilename.TmpDir || lprofCurFilename.MergePoolSize))
lprofCurFilename.NumBinaryIds || lprofCurFilename.TmpDir ||
lprofCurFilename.MergePoolSize))
return strlen(lprofCurFilename.FilenamePat);

Len = strlen(lprofCurFilename.FilenamePat) +
lprofCurFilename.NumPids * (strlen(lprofCurFilename.PidChars) - 2) +
lprofCurFilename.NumHosts * (strlen(lprofCurFilename.Hostname) - 2) +
lprofCurFilename.NumBinaryIds * BINARY_ID_LEN +
(lprofCurFilename.TmpDir ? (strlen(lprofCurFilename.TmpDir) - 1) : 0);
if (lprofCurFilename.MergePoolSize)
Len += SIGLEN;
return Len;
}

typedef struct lprofBinaryIdsBuffer {
char String[BINARY_ID_LEN + 1];
int Length;
} lprofBinaryIdsBuffer;

/* Reads binary ID length and then its data, writes it into lprofBinaryIdsBuffer
* in hexadecimal form. */
static uint32_t binaryIdsStringWriter(ProfDataWriter *This,
ProfDataIOVec *IOVecs,
uint32_t NumIOVecs) {
if (NumIOVecs < 2 || IOVecs[0].ElmSize != sizeof(uint64_t))
return -1;
uint64_t BinaryIdLen = *(const uint64_t *)IOVecs[0].Data;
if (IOVecs[1].ElmSize != sizeof(uint8_t) || IOVecs[1].NumElm != BinaryIdLen)
return -1;
const uint8_t *BinaryIdData = (const uint8_t *)IOVecs[1].Data;
lprofBinaryIdsBuffer *Data = (lprofBinaryIdsBuffer *)This->WriterCtx;
for (uint64_t I = 0; I < BinaryIdLen; I++) {
Data->Length +=
snprintf(Data->String + Data->Length, BINARY_ID_LEN + 1 - Data->Length,
"%02hhx", BinaryIdData[I]);
}
return 0;
}

/* Return the pointer to the current profile file name (after substituting
* PIDs and Hostnames in filename pattern. \p FilenameBuf is the buffer
* to store the resulting filename. If no substitution is needed, the
Expand All @@ -965,7 +1006,8 @@ static const char *getCurFilename(char *FilenameBuf, int ForceUseBuf) {
return 0;

if (!(lprofCurFilename.NumPids || lprofCurFilename.NumHosts ||
lprofCurFilename.TmpDir || lprofCurFilename.MergePoolSize ||
lprofCurFilename.NumBinaryIds || lprofCurFilename.TmpDir ||
lprofCurFilename.MergePoolSize ||
__llvm_profile_is_continuous_mode_enabled())) {
if (!ForceUseBuf)
return lprofCurFilename.FilenamePat;
Expand All @@ -992,6 +1034,12 @@ static const char *getCurFilename(char *FilenameBuf, int ForceUseBuf) {
memcpy(FilenameBuf + J, lprofCurFilename.TmpDir, TmpDirLength);
FilenameBuf[J + TmpDirLength] = DIR_SEPARATOR;
J += TmpDirLength + 1;
} else if (FilenamePat[I] == 'b') {
lprofBinaryIdsBuffer Data = {{0}, 0};
ProfDataWriter Writer = {binaryIdsStringWriter, &Data};
__llvm_write_binary_ids(&Writer);
memcpy(FilenameBuf + J, Data.String, Data.Length);
J += Data.Length;
} else {
if (!getMergePoolSize(FilenamePat, &I))
continue;
Expand Down
40 changes: 40 additions & 0 deletions compiler-rt/test/profile/Linux/binary-id-path.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// REQUIRES: linux
// RUN: split-file %s %t.dir
// RUN: %clang_profgen -Wl,--build-id=sha1 -o %t.dir/foo %t.dir/foo.c
// RUN: %clang_profgen -Wl,--build-id=sha1 -o %t.dir/bar %t.dir/bar.c

// Check that foo and bar have the same signatures.
// RUN: rm -rf %t.profdir
// RUN: env LLVM_PROFILE_FILE=%t.profdir/%m.profraw %run %t.dir/foo
// RUN: env LLVM_PROFILE_FILE=%t.profdir/%m.profraw %run %t.dir/bar 2>&1 | FileCheck %s --check-prefix=MERGE-ERROR

// Check that foo and bar have different binary IDs.
// RUN: rm -rf %t.profdir %t.profdata
// RUN: env LLVM_PROFILE_FILE=%t.profdir/%b.profraw %run %t.dir/foo
// RUN: env LLVM_PROFILE_FILE=%t.profdir/%b.profraw %run %t.dir/bar
// RUN: llvm-profdata merge -o %t.profdata %t.profdir
// RUN: llvm-profdata show --binary-ids %t.profdata | FileCheck %s --check-prefix=BINARY-ID

// Check fallback to the default name if binary ID is missing.
// RUN: %clang_profgen -Wl,--build-id=none -o %t.dir/foo %t.dir/foo.c
// RUN: env LLVM_PROFILE_FILE=%t.profdir/%b.profraw %run %t.dir/foo 2>&1 | FileCheck %s --check-prefix=MISSING

// MERGE-ERROR: LLVM Profile Error: Profile Merging of file {{.*}}.profraw failed: File exists

// BINARY-ID: Instrumentation level: Front-end
// BINARY-ID-NEXT: Total functions: 3
// BINARY-ID-NEXT: Maximum function count: 2
// BINARY-ID-NEXT: Maximum internal block count: 0
// BINARY-ID-NEXT: Binary IDs:
// BINARY-ID-NEXT: {{[0-9a-f]+}}
// BINARY-ID-NEXT: {{[0-9a-f]+}}

// MISSING: Unable to get binary ID for filename pattern {{.*}}.profraw. Using the default name.

//--- foo.c
int main(void) { return 0; }
void foo(void) {}

//--- bar.c
int main(void) { return 0; }
void bar(int *a) { *a += 10; }