-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm-exegesis][AArch64] Check for PAC keys before disabling them #138643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-tools-llvm-exegesis Author: Abhilash Majumder (abhilash1910) Changes[llvm-exegesis][AArch64] Check for PAC keys before disabling them. Tries to resolve aarch64 error arising from accessible PAC keys from PR: Disable pauth and ldgm as unsupported instructions fixed. The suspected reason is in some systems, the pac keys are present and need to be checked before setting them. This is followup to merge the changes with following changes to attempt to fixup the arch64 failure. Changes: Apply checks to verify if PAC is enabled in some systems and disabling them . @lakshayk-nv please help to verify and review. Full diff: https://github.com/llvm/llvm-project/pull/138643.diff 1 Files Affected:
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index a1eb5a46f21fc..d1da6f41461a7 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -207,16 +207,26 @@ class ExegesisAArch64Target : public ExegesisTarget {
if (isPointerAuth(Opcode)) {
#if defined(__aarch64__) && defined(__linux__)
+
+ // Fix for some systems where PAC/PAG keys are present but is
+ // explicitly getting disabled
+ unsigned long pac_keys = 0;
+ if (prctl(PR_PAC_GET_ENABLED_KEYS, &pac_keys, 0, 0, 0) < 0) {
+ return "Failed to get PAC key status";
+ }
+
// Disable all PAC keys. Note that while we expect the measurements to
// be the same with PAC keys disabled, they could potentially be lower
// since authentication checks are bypassed.
- if (prctl(PR_PAC_SET_ENABLED_KEYS,
- PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY |
- PR_PAC_APDBKEY, // all keys
- 0, // disable all
- 0, 0) < 0) {
- return "Failed to disable PAC keys";
- }
+ if (pac_keys != 0) {
+ if (prctl(PR_PAC_SET_ENABLED_KEYS,
+ PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY |
+ PR_PAC_APDBKEY, // all keys
+ 0, // disable all
+ 0, 0) < 0) {
+ return "Failed to disable PAC keys";
+ }
+ }
#else
return "Unsupported opcode: isPointerAuth";
#endif
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run clang-format
over the patch to fix the code formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, the problem is that llvm-exegesis may fail with an error like "AUTIA: Failed to disable PAC keys", depending on the CPU features. It looks like I have managed to reproduce this issue in QEMU (in full-system emulation mode), the results are below.
As far as I see, PAuth-related prctl
s are not documented in man pages well, but thankfully the kernel sources are quite readable on themselves:
prctl
entry point is in kernel/sys.c, and here are relevant operations- the requests are ultimately processed in arch/arm64/kernel/pointer_auth.c
- according to the Caveats section of
man prctl
, as well as toman PR_PAC_RESET_KEYS
, it is probably better to passprctl
arguments aslong
Each configuration was tested by running
strace -f -e prctl \
/path/to/llvm-project/build-aarch64/bin/llvm-exegesis \
-mtriple=aarch64-linux-gnu \
-mcpu=neoverse-v2 \
-mode=latency \
--opcode-name=AUTIA \
--use-dummy-perf-counters
On the main branch (commit cd6c4b6), with emulated CPU set to neoverse-v1,pauth-impdef=on
(Neoverse-V2 is not supported in QEMU 9.2.1, but Neoverse-V1 seems to be enough to reproduce the issue. Similarly, pauth-impdef=on
should not affect the issue, but makes emulation faster) I got
prctl(PR_PAC_SET_ENABLED_KEYS, PR_PAC_APIAKEY|PR_PAC_APIBKEY|PR_PAC_APDAKEY|PR_PAC_APDBKEY, 0, 0, 0) = 0
---
mode: latency
key:
instructions:
- 'AUTIA X23 X23 X15'
config: ''
register_initial_values:
- 'X23=0x0'
- 'X15=0x0'
cpu_name: neoverse-v2
llvm_triple: aarch64-unknown-linux-gnu
min_instructions: 10000
measurements: []
error: ''
info: Repeating a single implicitly serial instruction
assembled_snippet: F70F1FF8170080D20F0080D2F711C1DAF711C1DAF711C1DAF711C1DAF70741F8C0035FD6
...
+++ exited with 0 +++
and setting CPU to cortex-a72
gives this output:
prctl(PR_PAC_SET_ENABLED_KEYS, PR_PAC_APIAKEY|PR_PAC_APIBKEY|PR_PAC_APDAKEY|PR_PAC_APDBKEY, 0, 0, 0) = -1 EINVAL (Invalid argument)
AUTIA: Failed to disable PAC keys
+++ exited with 0 +++
(by the way, return code is 0, which is a bit surprising...). This is probably because of if (!system_supports_address_auth()) return -EINVAL;
check in the kernel, assuming system_supports_address_auth()
means "FEAT_PAUTH was detected at run-time" (instead of "the kernel was compiled with PAuth support" or something like this).
This PR (commit 2a9af0f) produces an error like the following with both CPU models:
prctl(PR_PAC_GET_ENABLED_KEYS, 0x7ff2234598, 0, 0, 0) = -1 EINVAL (Invalid argument)
AUTIA: Failed to get PAC key status
+++ exited with 0 +++
Turned out, all arguments should be set to 0 and the bitmask is returned as the return value of prctl
.
I applied a quick fix:
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index 7e6382693c3e..3d59955177cb 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -213,8 +213,8 @@ private:
// For systems without PAC, this is a No-op but with PAC, it is
// safer to check the existing key state and then disable/enable them.
// Hence the guard for switching.
- unsigned long PacKeys = 0;
- if (prctl(PR_PAC_GET_ENABLED_KEYS, &PacKeys, 0, 0, 0) < 0) {
+ unsigned long PacKeys = prctl(PR_PAC_GET_ENABLED_KEYS, 0, 0, 0, 0);
+ if ((long)PacKeys < 0) {
return "Failed to get PAC key status";
}
With CPU set to neoverse-v1,pauth-impdef=on
the following output is produced:
prctl(PR_PAC_GET_ENABLED_KEYS, 0, 0, 0, 0) = 0xf (PR_PAC_APIAKEY|PR_PAC_APIBKEY|PR_PAC_APDAKEY|PR_PAC_APDBKEY)
prctl(PR_PAC_SET_ENABLED_KEYS, PR_PAC_APIAKEY|PR_PAC_APIBKEY|PR_PAC_APDAKEY|PR_PAC_APDBKEY, 0, 0, 0) = 0
---
mode: latency
key:
instructions:
- 'AUTIA X23 X23 X0'
config: ''
register_initial_values:
- 'X23=0x0'
- 'X0=0x0'
cpu_name: neoverse-v2
llvm_triple: aarch64-unknown-linux-gnu
min_instructions: 10000
measurements: []
error: ''
info: Repeating a single implicitly serial instruction
assembled_snippet: F70F1FF8170080D2000080D21710C1DA1710C1DA1710C1DA1710C1DAF70741F8C0035FD6
...
+++ exited with 0 +++
but with cortex-a72
it fails with:
prctl(PR_PAC_GET_ENABLED_KEYS, 0, 0, 0, 0) = -1 EINVAL (Invalid argument)
AUTIA: Failed to get PAC key status
+++ exited with 0 +++
Probably, EINVAL
is always returned when FEAT_PAUTH is not supported by the CPU.
Since this has been broken for a while now can we revert the commit that introduced this failure while the fix is being worked on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhilash1910 Thanks for the updates, I left a few comments on prctl_wrapper
so far, will re-test the patch in QEMU slightly later.
Co-authored-by: Anatoly Trosinenko <[email protected]>
Co-authored-by: Anatoly Trosinenko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just realized that the original issue was probably related to FEAT_FPAC
(by the way, support for FEAT_FPAC
is among the differences between emulated Neoverse-V1 that I used for testing and Neoverse-V2 mentioned it tests). I'm not sure if QEMU defines any "real" CPU model with FEAT_FPAC, but the max
pseudo CPU model should work, though.
At the commit fd254b0, I get
AUTIA: PAuth not supported on this system
for cortex-a72
emulated CPU model and successfully executed snippets for neoverse-v1,pauth-impdef=on
and max,pauth-impdef=on
. If multiple opcodes are passed like in --opcode-name=AUTIASP,AUTIA
, two prctl
system calls ("get" and then "set") are performed for the first opcode and one more "get" prctl
is performed for each subsequent opcode (or exactly one failing prctl
per pauth opcode in case of cortex-a72
, of course).
On the other hand, when executed on cortex-a72
emulated CPU, llvm-exegesis -mtriple=aarch64-linux-gnu -mcpu=neoverse-v2 -mode=latency --opcode-name=AUTIA --benchmark-phase=assemble-measured-code
returns AUTIA: PAuth not supported on this system
, even though no snippet code is to be executed on the CPU, which is not correct.
It looks like the high-level logic of this code should be improved (and the description in getIgnoredOpcodeReasonOrNull
updated). As far as I see, here are the issues that we want to avoid:
- if the CPU implements FEAT_PAuth with FEAT_FPAC, any snippet involving authentication instruction is likely to crash. As far as I understand, this does not crash the entire
llvm-exegsis
process, as it is able to handle signals that are normally fatal, but at least it reports errors that may be unexpected by the user. - if the CPU does not implement FEAT_PAuth at all,
llvm-exegesis
still should be able to perform all tasks not involving actual snippet execution- it is even possible to run snippet generation (but not execution, of course) for the same instructions on x86_64 computer!
- error diagnostic is a plus
Given the above points, a reasonable solution could probably be keeping the approach implemented by this PR (request the enabled keys and, if any key is currently enabled, disable everything). But no errors should probably be reported on failure, not even "Unsupported opcode: isPointerAuth" on non-AArch64 systems. Additionally, it seems a good idea to give the user an option to opt out of these prctl
s altogether.
getIgnoredOpcodeReasonOrNull
is probably intended to return error messages for the opcodes whose benchmarking is meaningless unless some custom support is implemented someday. It seems to be called even if merely generating a snippet on a foreign CPU architecture- disabling PAuth keys as a side effect of
getIgnoredOpcodeReasonOrNull
call is probably OK, provided this does not crash llvm-exegesis- I guess, this could be a problem if
llvm-exegesis
executable or any shared object up the call stack is built with pac-ret - this probably only applies to libc stuff (including thread entry functions, if any)
- I guess, this could be a problem if
- a failure to disable the keys when FEAT_FPAC is implemented, on the other hand, is probably not too harmful, as something like
error: 'snippet crashed while running: Illegal instruction'
would be printed in the report- as a reasonable tradeoff, you could return
nullptr
silently ifprctl(PR_PAC_GET_ENABLED_KEYS)
either returns 0 or setserrno
toEINVAL
(to rule out all "not applicable" cases we know about), but return an error message if the secondprctl
call was performed, and it failed (as it likely indicates we do something wrong)
- as a reasonable tradeoff, you could return
- I doubt it is safe to assume every CPU would have the same timings with keys being enabled vs. disabled
- it may be reasonable to perform
prctl
calls to disable all the keys by default, but even then it is probably a good idea to print a free-form warning to the console (like it is already done by some other part of llvm-exegesis for AUTIASP:setRegTo is not implemented, results will be unreliable
) - or the default behavior could be to keep everything as-is and disable the keys if a special command line option is given
- it may be reasonable to perform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable enough to me at this point.
Please get approval from someone who actually understands pointer authentication before merging though.
The linux PAC launch path was explained to me on discord and my concerns were cleared up, I can't speak for the PR code itself, my concerns were with how the actual process launch was being handled structurally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid there is an accessibility issue with AArch64DisablePacControl
in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates! Except for tiny comments here and there, the code looks good to me.
I tested it in QEMU with three different CPU models (no PAuth, PAuth without FPAC, PAuth with FPAC), and it behaves as expected (notably, when the simulated CPU is set to support PAuth with FPAC, when I ask llvm-exegesis to actually execute the snippets, they are executed successfully by default and crash on SIGILL when I pass --aarch64-disable-pac-control
- exactly as expected).
@abhilash1910 I applied the following change on top of this PR in order to test with
The first execution (line 4) completed successfully. The full log is available at: https://download.copr.fedorainfracloud.org/results/tuliom/llvm-snapshots-big-merge-20250604-pac/fedora-rawhide-aarch64/09136603-llvm/builder-live.log.gz |
[llvm-exegesis][AArch64] Check for PAC keys before disabling them.
Tries to resolve aarch64 error arising from accessible PAC keys from PR: Disable pauth and ldgm as unsupported instructions fixed. The suspected reason is in some systems, the pac keys are present and need to be checked before setting them.
This is followup to merge the changes with following changes to attempt to fixup the arch64 failure.
Changes:
Apply checks to verify if PAC is enabled in some systems and disabling them .
@lakshayk-nv please help to verify and review.
cc @sjoerdmeijer @davemgreen