Skip to content

[Sanitizer] Use % patterns in report paths #141820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 30, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 67 additions & 3 deletions compiler-rt/lib/sanitizer_common/sanitizer_file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,76 @@ static void RecursiveCreateParentDirs(char *path) {
}
}

/// Parse the report path \p pattern and copy the parsed path to \p dest.
///
/// * `%%` becomes `%`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why support this one? "%" isn't a common filename character

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for completeness. Is there any reason we shouldn't handle this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just didn't seem like it would be useful in practice, but I'm not strongly opposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO having this for completeness is a good idea. As a user it would be infuriating if we didn't if I happened to need it for whatever reason.

/// * `%H` expands to the environment variable `HOME`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't seem to be supported on the PGO side, why is it more useful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the main motivation for this PR. I'd like some way to resolve the home directory at runtime. I think it could also be useful on the PGO siee

/// * `%t` expands to the environment variable `TMPDIR`
/// * `%p` expands to the process ID (PID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PID is already appended, is it useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I mainly implemented this for completeness compared to PGO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, again not strongly opposed

static void ParseAndSetPath(const char *pattern, char *dest,
const uptr dest_size) {
CHECK(pattern);
CHECK(dest);
CHECK_GT(dest_size, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why GT and not GE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dest_size = 1 then that only leave space for the null byte. I guess an empty path is valid, so maybe I should allow that

dest[0] = '\0';
uptr next_substr_start_idx = 0;
for (uptr i = 0; i < internal_strlen(pattern) - 1; i++) {
if (pattern[i] != '%')
continue;
int bytes_to_copy = i - next_substr_start_idx;
// Copy over previous substring.
CHECK_LT(internal_strlcat(dest, pattern + next_substr_start_idx,
internal_strlen(dest) + bytes_to_copy + 1),
dest_size);
const char *str_to_concat;
switch (pattern[++i]) {
case '%':
str_to_concat = "%";
break;
case 'H':
str_to_concat = GetEnv("HOME");
break;
case 't':
str_to_concat = GetEnv("TMPDIR");
break;
case 'p': {
// Use printf directly to write the PID since it's not a static string.
int remaining_capacity = dest_size - internal_strlen(dest);
int bytes_copied =
internal_snprintf(dest + internal_strlen(dest), remaining_capacity,
"%ld", internal_getpid());
CHECK_GT(bytes_copied, 0);
CHECK_LT(bytes_copied, remaining_capacity);
str_to_concat = "";
break;
}
default: {
// Invalid pattern: fallback to original pattern.
const char *message = "ERROR: Unexpected pattern: ";
WriteToFile(kStderrFd, message, internal_strlen(message));
WriteToFile(kStderrFd, pattern, internal_strlen(pattern));
WriteToFile(kStderrFd, "\n", internal_strlen("\n"));
CHECK_LT(internal_strlcpy(dest, pattern, dest_size), dest_size);
return;
}
}
CHECK(str_to_concat);
CHECK_LT(internal_strlcat(dest, str_to_concat, dest_size), dest_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that LT is because of the trailing \0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, strlcat returns the size of the final string, not including the null byte.

https://linux.die.net/man/3/strlcat

next_substr_start_idx = i + 1;
}
CHECK_LT(internal_strlcat(dest, pattern + next_substr_start_idx, dest_size),
dest_size);
}

void ReportFile::SetReportPath(const char *path) {
if (path) {
uptr len = internal_strlen(path);
if (len > sizeof(path_prefix) - 100) {
Report("ERROR: Path is too long: %c%c%c%c%c%c%c%c...\n", path[0], path[1],
path[2], path[3], path[4], path[5], path[6], path[7]);
const char *message = "ERROR: Path is too long: ";
WriteToFile(kStderrFd, message, internal_strlen(message));
WriteToFile(kStderrFd, path, 8);
message = "...\n";
WriteToFile(kStderrFd, message, internal_strlen(message));
Die();
}
}
Expand All @@ -115,7 +179,7 @@ void ReportFile::SetReportPath(const char *path) {
} else if (internal_strcmp(path, "stdout") == 0) {
fd = kStdoutFd;
} else {
internal_snprintf(path_prefix, kMaxPathLength, "%s", path);
ParseAndSetPath(path, path_prefix, kMaxPathLength);
RecursiveCreateParentDirs(path_prefix);
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// RUN: %clangxx -O2 %s -o %t

// Case 1: Try setting a path that is an invalid/inaccessible directory.
// RUN: not %env %run %t 2>&1 | FileCheck %s --check-prefix=ERROR1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment as to what each of these invocations is testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment next to the snprintf()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but I meant up here - could you also add a 1-sentence comment before each run line as to what it is testing? (Reading the test from the top down I was initially confused about the "A" argument)


// Case 2: Try setting a path that is too large.
// RUN: not %env %run %t A 2>&1 | FileCheck %s --check-prefix=ERROR2

#include <sanitizer/common_interface_defs.h>
#include <stdio.h>

int main(int argc, char **argv) {
char buff[4096];
if (argc == 1) {
// Case 1
sprintf(buff, "%s/report", argv[0]);
// ERROR1: Can't create directory: {{.*}}
} else {
// Case 2
snprintf(buff, sizeof(buff), "%04095d", 42);
// ERROR2: Path is too long: 00000000...
}
__sanitizer_set_report_path(buff);
}
Original file line number Diff line number Diff line change
@@ -1,27 +1,42 @@
// Test __sanitizer_set_report_path and __sanitizer_get_report_path:
// RUN: %clangxx -O2 %s -o %t
// RUN: not %run %t 2>&1 | FileCheck %s
// RUN: %env HOME=%t.homedir TMPDIR=%t.tmpdir %run %t 2>%t.err | FileCheck %s
// RUN: FileCheck %s --input-file=%t.err --check-prefix=ERROR

#include <assert.h>
#include <sanitizer/common_interface_defs.h>
#include <stdio.h>
#include <string.h>

volatile int *null = 0;

int main(int argc, char **argv) {
char buff[1000];
char buff[4096];
sprintf(buff, "%s.report_path/report", argv[0]);
__sanitizer_set_report_path(buff);
assert(strncmp(buff, __sanitizer_get_report_path(), strlen(buff)) == 0);
// CHECK: {{.*}}.report_path/report.[[PID:[0-9]+]]
printf("%s\n", __sanitizer_get_report_path());

// Try setting again with an invalid/inaccessible directory.
char buff_bad[1000];
sprintf(buff_bad, "%s/report", argv[0]);
fprintf(stderr, "Expected bad report path: %s\n", buff_bad);
// CHECK: Expected bad report path: [[BADPATH:.*]]/report
__sanitizer_set_report_path(buff_bad);
assert(strncmp(buff, __sanitizer_get_report_path(), strlen(buff)) == 0);
}
strcpy(buff, "%H/foo");
__sanitizer_set_report_path(buff);
// CHECK: [[T:.*]].homedir/foo.[[PID]]
printf("%s\n", __sanitizer_get_report_path());

// CHECK: ERROR: Can't create directory: [[BADPATH]]
strcpy(buff, "%t/foo");
__sanitizer_set_report_path(buff);
// CHECK: [[T]].tmpdir/foo.[[PID]]
printf("%s\n", __sanitizer_get_report_path());

strcpy(buff, "%H/%p/%%foo");
__sanitizer_set_report_path(buff);
// CHECK: [[T]].homedir/[[PID]]/%foo.[[PID]]
printf("%s\n", __sanitizer_get_report_path());

strcpy(buff, "%%foo%%bar");
__sanitizer_set_report_path(buff);
// CHECK: %foo%bar.[[PID]]
printf("%s\n", __sanitizer_get_report_path());

strcpy(buff, "%%foo%ba%%r");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest removing the "%%" from this one to make it clearer that what is being tested is the "%b".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't parse a path, we fallback to the original path. I want to test that the other %% isn't parsed to %

__sanitizer_set_report_path(buff);
// ERROR: Unexpected pattern: %%foo%ba%%r
// CHECK: %%foo%ba%%r.[[PID]]
printf("%s\n", __sanitizer_get_report_path());
}
Loading