-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[ASan] Add metadata to renamed instructions so ASan doesn't use the i… #119387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASan] Add metadata to renamed instructions so ASan doesn't use the i… #119387
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to add a test before posting review ? Otherwise LGTM!
Ah, yes! I'm also planning on getting some opinions on an RFC in case I have missed something |
01be0de
to
8781ff2
Compare
The failing tests (that were not failing before my changes) are doing so because of additional metadata in the DWARF output. I'm waiting until after the exact metadata is finalized before fixing that. I don't want to add something to the tests that will need to be changed later if the metadata is renamed/altered. |
02ab9e5
to
c15ec2f
Compare
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: None (gbMattN) Changes…ncorrect name Full diff: https://github.com/llvm/llvm-project/pull/119387.diff 4 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 5fccc9cbb37ec1..d8fdacf30e12e3 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -137,6 +137,14 @@ llvm::AllocaInst *CodeGenFunction::CreateTempAlloca(llvm::Type *Ty,
Alloca =
new llvm::AllocaInst(Ty, CGM.getDataLayout().getAllocaAddrSpace(),
ArraySize, Name, AllocaInsertPt->getIterator());
+ if (Alloca->getName() != Name.str() &&
+ SanOpts.Mask & SanitizerKind::Address) {
+
+ llvm::LLVMContext &ctx = Alloca->getContext();
+ llvm::MDString *trueNameMetadata = llvm::MDString::get(ctx, Name.str());
+ llvm::MDTuple *tuple = llvm::MDTuple::get(ctx, trueNameMetadata);
+ Alloca->setMetadata(llvm::LLVMContext::MD_unaltered_name, tuple);
+ }
if (Allocas) {
Allocas->Add(Alloca);
}
diff --git a/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp b/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp
new file mode 100644
index 00000000000000..f4d9ad5f6ea5f7
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp
@@ -0,0 +1,13 @@
+// RUN: %clangxx_asan -O0 %s -o %t
+// RUN: not %run %t 2>&1 | FileCheck %s
+
+int main()
+{
+ int x;
+ {
+ int x;
+ delete &x;
+ }
+}
+
+// CHECK: [32, 36) 'x'
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index df572e8791e13b..41fa34bf09ff65 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -53,3 +53,4 @@ LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
+LLVM_FIXED_MD_KIND(MD_unaltered_name, "unaltered.name", 42)
diff --git a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
index cb84588318496c..87f79bdaa16429 100644
--- a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
@@ -3430,13 +3430,15 @@ void FunctionStackPoisoner::processStaticAllocas() {
SmallVector<ASanStackVariableDescription, 16> SVD;
SVD.reserve(AllocaVec.size());
for (AllocaInst *AI : AllocaVec) {
- ASanStackVariableDescription D = {AI->getName().data(),
- ASan.getAllocaSizeInBytes(*AI),
- 0,
- AI->getAlign().value(),
- AI,
- 0,
- 0};
+ const char *Name = AI->getName().data();
+ if (AI->hasMetadata(LLVMContext::MD_unaltered_name)) {
+ MDTuple *tuple =
+ dyn_cast<MDTuple>(AI->getMetadata(LLVMContext::MD_unaltered_name));
+ Name = dyn_cast<MDString>(tuple->getOperand(0))->getString().data();
+ }
+ ASanStackVariableDescription D = {
+ Name, ASan.getAllocaSizeInBytes(*AI), 0, AI->getAlign().value(), AI, 0,
+ 0};
SVD.push_back(D);
}
|
@llvm/pr-subscribers-compiler-rt-sanitizer Author: None (gbMattN) Changes…ncorrect name Full diff: https://github.com/llvm/llvm-project/pull/119387.diff 4 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 5fccc9cbb37ec1..d8fdacf30e12e3 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -137,6 +137,14 @@ llvm::AllocaInst *CodeGenFunction::CreateTempAlloca(llvm::Type *Ty,
Alloca =
new llvm::AllocaInst(Ty, CGM.getDataLayout().getAllocaAddrSpace(),
ArraySize, Name, AllocaInsertPt->getIterator());
+ if (Alloca->getName() != Name.str() &&
+ SanOpts.Mask & SanitizerKind::Address) {
+
+ llvm::LLVMContext &ctx = Alloca->getContext();
+ llvm::MDString *trueNameMetadata = llvm::MDString::get(ctx, Name.str());
+ llvm::MDTuple *tuple = llvm::MDTuple::get(ctx, trueNameMetadata);
+ Alloca->setMetadata(llvm::LLVMContext::MD_unaltered_name, tuple);
+ }
if (Allocas) {
Allocas->Add(Alloca);
}
diff --git a/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp b/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp
new file mode 100644
index 00000000000000..f4d9ad5f6ea5f7
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/shadowed-stack-serialization.cpp
@@ -0,0 +1,13 @@
+// RUN: %clangxx_asan -O0 %s -o %t
+// RUN: not %run %t 2>&1 | FileCheck %s
+
+int main()
+{
+ int x;
+ {
+ int x;
+ delete &x;
+ }
+}
+
+// CHECK: [32, 36) 'x'
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index df572e8791e13b..41fa34bf09ff65 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -53,3 +53,4 @@ LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
+LLVM_FIXED_MD_KIND(MD_unaltered_name, "unaltered.name", 42)
diff --git a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
index cb84588318496c..87f79bdaa16429 100644
--- a/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
@@ -3430,13 +3430,15 @@ void FunctionStackPoisoner::processStaticAllocas() {
SmallVector<ASanStackVariableDescription, 16> SVD;
SVD.reserve(AllocaVec.size());
for (AllocaInst *AI : AllocaVec) {
- ASanStackVariableDescription D = {AI->getName().data(),
- ASan.getAllocaSizeInBytes(*AI),
- 0,
- AI->getAlign().value(),
- AI,
- 0,
- 0};
+ const char *Name = AI->getName().data();
+ if (AI->hasMetadata(LLVMContext::MD_unaltered_name)) {
+ MDTuple *tuple =
+ dyn_cast<MDTuple>(AI->getMetadata(LLVMContext::MD_unaltered_name));
+ Name = dyn_cast<MDString>(tuple->getOperand(0))->getString().data();
+ }
+ ASanStackVariableDescription D = {
+ Name, ASan.getAllocaSizeInBytes(*AI), 0, AI->getAlign().value(), AI, 0,
+ 0};
SVD.push_back(D);
}
|
0}; | ||
const char *Name = AI->getName().data(); | ||
if (AI->hasMetadata(LLVMContext::MD_unaltered_name)) { | ||
MDTuple *tuple = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the result of those dyn_cast are non null ? Otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only add this metadata in one place, and expect it to have a MDString in it. If dyn_cast fails and we try doing this to a nullptr
, LLVM will exit with a stack trace and request for a bug report, which I think is what we'd want to happen
@vitalybuka manually pinging you for a last review since you were active on the bugzilla ticket |
6ca3a9e
to
3a5c89a
Compare
@@ -53,3 +53,4 @@ LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38) | |||
LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39) | |||
LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40) | |||
LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41) | |||
LLVM_FIXED_MD_KIND(MD_unaltered_name, "unaltered.name", 42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is worth adding a new metadata kind to work around not using debug info?
If we need new metadata, it should be documented and also needs to be checked by the verifier, but perhaps it would be possible to use https://llvm.org/docs/LangRef.html#annotation-metadata instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I hadn't noticed that metadata type. I've switched to using it
AI, | ||
0, | ||
0}; | ||
const char *Name = AI->getName().data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is possible to use StringRef here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is!
0}; | ||
const char *Name = AI->getName().data(); | ||
if (AI->hasMetadata(LLVMContext::MD_unaltered_name)) { | ||
MDTuple *tuple = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDTuple *tuple = | |
MDTuple *Tuple = |
dyn_cast<MDTuple>(AI->getMetadata(LLVMContext::MD_unaltered_name)); | ||
Name = dyn_cast<MDString>(tuple->getOperand(0))->getString().data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should those be casts? If the dyn_cast can return nullptr, you need to check the return value
clang/lib/CodeGen/CGExpr.cpp
Outdated
@@ -137,6 +137,10 @@ llvm::AllocaInst *CodeGenFunction::CreateTempAlloca(llvm::Type *Ty, | |||
Alloca = | |||
new llvm::AllocaInst(Ty, CGM.getDataLayout().getAllocaAddrSpace(), | |||
ArraySize, Name, AllocaInsertPt->getIterator()); | |||
if (Alloca->getName() != Name.str() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of the Alloca->getName() != Name.str()
check; it will almost always fail. (In release builds of clang, we suppress instruction names, so Alloca->getName()
is just the empty string.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works with Release compiler.
https://godbolt.org/z/93ojPGM64
[32, 36) 'x' (line 7) <== Memory access at offset 32 is inside this variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of the
Alloca->getName() != Name.str()
check; it will almost always fail. (In release builds of clang, we suppress instruction names, soAlloca->getName()
is just the empty string.)
I'm still getting names from Allocas when I build in release mode, is there some other flag that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we have some code that specifically messes with the default when asan is enabled.
Res.getCodeGenOpts().DiscardValueNames &= |
If we have to support metadata anyway, I'd prefer to use the metadata unconditionally, instead of messing with LLVM IR instruction names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efriedma-quic To clarify we do:
- set metadata always
- remove DiscardValueNames hack (in a different future patch please, as this may cause regressions if we loose metdata more often than name)
LGTM if https://llvm-compile-time-tracker.com/ fail to notice a difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they want the Alloca->getName() != Name.str()
check removed, so that when running ASan, we emit this name metadata on every AllocaInst
I thought it would be nice to reduce the amount of additional metadata emitted, but I don't have particularly strong feelings on the matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the metadata unless asan/msan is enabled. Debug info stores names in its own way, and I don't think we need the names for anything else... at least, I can't think of any other use.
The Alloca->getName() != Name.str()
isn't going to work reliably: for example, inlining will rename values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both for clarification, I misinterpreted the comment.
Suggested approach is definitely LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alloca->getName() != Name.str()
is still here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood. Check has been removed!
I don't see (from description) why it's needed. |
A reproducer of the issue can be seen failing on godbolt here I forgot to put this at the top when I undrafted the pull request, apologies for the confusion. |
for (int i = 0; i < Tuple->getNumOperands(); i++) { | ||
if (auto stringMetadata = dyn_cast<MDString>(Tuple->getOperand(i))) { | ||
if (stringMetadata->getString() == "alloca_name_altered") { | ||
Name = ((MDString *)Tuple->getOperand(i + 1).get())->getString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break or better return here
for (int i = 0; i < Annotation->getNumOperands(); i++) { | ||
if (auto Tuple = dyn_cast<MDTuple>(Annotation->getOperand(i))) { | ||
for (int i = 0; i < Tuple->getNumOperands(); i++) { | ||
if (auto stringMetadata = dyn_cast<MDString>(Tuple->getOperand(i))) { | ||
if (stringMetadata->getString() == "alloca_name_altered") { | ||
return ((MDString *)Tuple->getOperand(i + 1).get())->getString(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an easy way to do this :)
Might be good to use some early continues to reduce the nesting here.
if (AI->hasMetadata(LLVMContext::MD_annotation)) { | ||
MDTuple *Annotation = | ||
(MDTuple *)AI->getMetadata(LLVMContext::MD_annotation); | ||
for (int i = 0; i < Annotation->getNumOperands(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we directly iterate over the operands?
// recorded as an annotation. | ||
if (AI->hasMetadata(LLVMContext::MD_annotation)) { | ||
MDTuple *Annotation = | ||
(MDTuple *)AI->getMetadata(LLVMContext::MD_annotation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use old-style casts, this should probably be cast<>
(MDTuple *)AI->getMetadata(LLVMContext::MD_annotation); | ||
for (int i = 0; i < Annotation->getNumOperands(); i++) { | ||
if (auto Tuple = dyn_cast<MDTuple>(Annotation->getOperand(i))) { | ||
for (int i = 0; i < Tuple->getNumOperands(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variables usually start with upper case in LLVM (here and below)
@@ -3391,6 +3391,27 @@ static void findStoresToUninstrumentedArgAllocas( | |||
} | |||
} | |||
|
|||
StringRef getAllocaName(AllocaInst *AI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringRef getAllocaName(AllocaInst *AI) { | |
static StringRef getAllocaName(AllocaInst *AI) { |
for (int i = 0; i < Tuple->getNumOperands(); i++) { | ||
if (auto stringMetadata = dyn_cast<MDString>(Tuple->getOperand(i))) { | ||
if (stringMetadata->getString() == "alloca_name_altered") { | ||
return ((MDString *)Tuple->getOperand(i + 1).get())->getString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old style cast
if (LocalEscapeCall) | ||
LocalEscapeCall->moveBefore(InsBefore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think I was overzealous on the clang-format
clang/lib/CodeGen/CGExpr.cpp
Outdated
llvm::errs() << "Alloca name '" << Alloca->getName() << "'\n"; | ||
llvm::errs() << "NAME str '" << Name.str() << "'\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over dbg ?
80a2b79
to
305390c
Compare
8aa5a34
to
fb5ad58
Compare
fb5ad58
to
b0845c9
Compare
Maybe the test needs to be relaxed a bit because of stack layout differences in other OS targets? Although I'm not sure why they're different. See https://lab.llvm.org/buildbot/#/builders/186/builds/7896:
|
The patch also breaks both the Solaris/sparcv9 and Solaris/amd64 bots. |
I'm also hitting this failure; in my case it's cropping up on i686 windows: https://github.com/mstorsjo/llvm-mingw/actions/runs/14255796193/job/39971583775 |
You're right, I don't think there is a reason to check the stack location in this test, just how it was serialized. I'll push a change and hopefully this'll be fixed. Thanks for the notes! |
Thanks for the fix in 4da5e9d, that does seem to have fixed the issue on i686 windows at least! |
Both Solaris bots are back green again. Thanks. |
…ncorrect name
Clang needs variables to be represented with unique names. This means that if a variable shadows another, its given a different name internally to ensure it has a unique name. If ASan tries to use this name when printing an error, it will print the modified unique name, rather than the variable's source code name
Fixes #47326