-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Safe Buffers] Serialize unsafe_buffer_usage pragmas #92031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules Author: Ziqing Luo (ziqingluo-90) ChangesA pair of The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized. During de-serialization, regions from loaded files are stored by their source files. When later one queries if a loaded location The reported issue at upstream: #90501 Full diff: https://github.com/llvm/llvm-project/pull/92031.diff 12 Files Affected:
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index e89b4a2c5230e..8d6884ebe7597 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -2883,11 +2883,15 @@ class Preprocessor {
/// otherwise.
SourceLocation CurrentSafeBufferOptOutStart; // It is used to report the start location of an never-closed region.
- // An ordered sequence of "-Wunsafe-buffer-usage" opt-out regions in one
- // translation unit. Each region is represented by a pair of start and end
- // locations. A region is "open" if its' start and end locations are
+ using SafeBufferOptOutMapTy =
+ SmallVector<std::pair<SourceLocation, SourceLocation>, 16>;
+ // An ordered sequence of "-Wunsafe-buffer-usage" opt-out regions in this
+ // translation unit. Each region is represented by a pair of start and
+ // end locations. A region is "open" if its' start and end locations are
// identical.
- SmallVector<std::pair<SourceLocation, SourceLocation>, 8> SafeBufferOptOutMap;
+ SafeBufferOptOutMapTy SafeBufferOptOutMap;
+ // `SafeBufferOptOutMap`s of loaded files:
+ llvm::DenseMap<FileID, SafeBufferOptOutMapTy> LoadedSafeBufferOptOutMap;
public:
/// \return true iff the given `Loc` is in a "-Wunsafe-buffer-usage" opt-out
@@ -2918,6 +2922,16 @@ class Preprocessor {
/// opt-out region
bool isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc);
+ /// \return a sequence of SourceLocations representing ordered opt-out regions
+ /// specified by
+ /// `\#pragma clang unsafe_buffer_usage begin/end`s of this translation unit.
+ SmallVector<SourceLocation, 64> serializeSafeBufferOptOutMap() const;
+
+ /// \param SrcLocSeqs a sequence of SourceLocations deserialized from a
+ /// record of code `PP_UNSAFE_BUFFER_USAGE`.
+ void setDeserializedSafeBufferOptOutMap(
+ const SmallVectorImpl<SourceLocation> &SrcLocSeqs);
+
private:
/// Helper functions to forward lexing to the actual lexer. They all share the
/// same signature.
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index dcfa4ac0c1967..d1a0eba943039 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -775,6 +775,9 @@ enum ASTRecordTypes {
/// Record code for lexical and visible block for delayed namespace in
/// reduced BMI.
DELAYED_NAMESPACE_LEXICAL_VISIBLE_RECORD = 68,
+
+ /// Record code for \#pragma clang unsafe_buffer_usage begin/end
+ PP_UNSAFE_BUFFER_USAGE = 69,
};
/// Record types used within a source manager block.
diff --git a/clang/lib/Lex/Preprocessor.cpp b/clang/lib/Lex/Preprocessor.cpp
index 0b70192743a39..6a41e3d4138aa 100644
--- a/clang/lib/Lex/Preprocessor.cpp
+++ b/clang/lib/Lex/Preprocessor.cpp
@@ -58,6 +58,7 @@
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Capacity.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MemoryBuffer.h"
@@ -1483,26 +1484,41 @@ void Preprocessor::emitFinalMacroWarning(const Token &Identifier,
}
bool Preprocessor::isSafeBufferOptOut(const SourceManager &SourceMgr,
- const SourceLocation &Loc) const {
- // Try to find a region in `SafeBufferOptOutMap` where `Loc` is in:
- auto FirstRegionEndingAfterLoc = llvm::partition_point(
- SafeBufferOptOutMap,
- [&SourceMgr,
- &Loc](const std::pair<SourceLocation, SourceLocation> &Region) {
- return SourceMgr.isBeforeInTranslationUnit(Region.second, Loc);
- });
+ const SourceLocation &Loc) const {
+ // The lambda that tests if a `Loc` is in an opt-out region given one opt-out
+ // region map:
+ auto TestInMap = [&SourceMgr](const SafeBufferOptOutMapTy &Map,
+ const SourceLocation &Loc) -> bool {
+ // Try to find a region in `SafeBufferOptOutMap` where `Loc` is in:
+ auto FirstRegionEndingAfterLoc = llvm::partition_point(
+ Map, [&SourceMgr,
+ &Loc](const std::pair<SourceLocation, SourceLocation> &Region) {
+ return SourceMgr.isBeforeInTranslationUnit(Region.second, Loc);
+ });
+
+ if (FirstRegionEndingAfterLoc != Map.end()) {
+ // To test if the start location of the found region precedes `Loc`:
+ return SourceMgr.isBeforeInTranslationUnit(
+ FirstRegionEndingAfterLoc->first, Loc);
+ }
+ // If we do not find a region whose end location passes `Loc`, we want to
+ // check if the current region is still open:
+ if (!Map.empty() && Map.back().first == Map.back().second)
+ return SourceMgr.isBeforeInTranslationUnit(Map.back().first, Loc);
+ return false;
+ };
- if (FirstRegionEndingAfterLoc != SafeBufferOptOutMap.end()) {
- // To test if the start location of the found region precedes `Loc`:
- return SourceMgr.isBeforeInTranslationUnit(FirstRegionEndingAfterLoc->first,
- Loc);
- }
- // If we do not find a region whose end location passes `Loc`, we want to
- // check if the current region is still open:
- if (!SafeBufferOptOutMap.empty() &&
- SafeBufferOptOutMap.back().first == SafeBufferOptOutMap.back().second)
- return SourceMgr.isBeforeInTranslationUnit(SafeBufferOptOutMap.back().first,
- Loc);
+ if (SourceMgr.isLocalSourceLocation(Loc))
+ return TestInMap(SafeBufferOptOutMap, Loc);
+
+ // Expand macros (if any) until it gets to a file location:
+ SourceLocation FLoc = SourceMgr.getFileLoc(Loc);
+ FileID FlocFID = SourceMgr.getDecomposedLoc(FLoc).first;
+ auto LoadedMap = LoadedSafeBufferOptOutMap.find(FlocFID);
+
+ if (LoadedMap != LoadedSafeBufferOptOutMap.end())
+ return TestInMap(LoadedMap->getSecond(), Loc);
+ // FIXME: think out if this is unreachable?
return false;
}
@@ -1551,6 +1567,58 @@ bool Preprocessor::isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc) {
return InSafeBufferOptOutRegion;
}
+SmallVector<SourceLocation, 64>
+Preprocessor::serializeSafeBufferOptOutMap() const {
+ assert(!InSafeBufferOptOutRegion &&
+ "Attempt to serialize safe buffer opt-out regions before file being "
+ "completely preprocessed");
+
+ SmallVector<SourceLocation, 64> SrcSeq;
+
+ for (const auto &[begin, end] : SafeBufferOptOutMap) {
+ SrcSeq.push_back(begin);
+ SrcSeq.push_back(end);
+ }
+ // Only `SafeBufferOptOutMap` gets serialized. No need to serialize
+ // `LoadedSafeBufferOptOutMap` because if this TU loads a pch/module, every
+ // pch/module in the pch-chain/module-DAG will be loaded one by one in order.
+ // It means that for each loading pch/module m, it just needs to load m's own
+ // `SafeBufferOptOutMap`.
+ return SrcSeq;
+}
+
+void Preprocessor::setDeserializedSafeBufferOptOutMap(
+ const SmallVectorImpl<SourceLocation> &SourceLocations) {
+ auto It = SourceLocations.begin();
+
+ assert(SourceLocations.size() % 2 == 0 &&
+ "ill-formed SourceLocation sequence");
+ while (It != SourceLocations.end()) {
+ SourceLocation begin = *It++;
+ SourceLocation end = *It++;
+ SourceLocation FileLoc = SourceMgr.getFileLoc(begin);
+ FileID FID = SourceMgr.getDecomposedLoc(FileLoc).first;
+
+ if (FID.isInvalid()) {
+ // I suppose this should not happen:
+ assert(false && "Attempted to read a safe buffer opt-out region whose "
+ "begin location is associated to an invalid File ID.");
+ break;
+ }
+ assert(!SourceMgr.isLocalFileID(FID) && "Expected a pch/module file");
+ // Here we assume that
+ // `SourceMgr.getFileLoc(begin) == SourceMgr.getFileLoc(end)`.
+ // Though it may not hold in very rare and strange cases, i.e., a pair of
+ // opt-out pragams written in different files but are always used in a way
+ // that they match in a TU (otherwise PP will complaint).
+ // FIXME: PP should complaint about cross file safe buffer opt-out pragmas.
+ assert(FID == SourceMgr.getDecomposedLoc(SourceMgr.getFileLoc(end)).first &&
+ "Maybe a safe buffer opt-out region starts and ends at different "
+ "files.");
+ LoadedSafeBufferOptOutMap[FID].emplace_back(begin, end);
+ }
+}
+
ModuleLoader::~ModuleLoader() = default;
CommentHandler::~CommentHandler() = default;
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 43b69045bb054..4e22b3e1187c0 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3586,6 +3586,17 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
break;
}
+ case PP_UNSAFE_BUFFER_USAGE: {
+ if (!Record.empty()) {
+ SmallVector<SourceLocation, 64> SrcLocs;
+ unsigned Idx = 0;
+ while (Idx < Record.size())
+ SrcLocs.push_back(ReadSourceLocation(F, Record, Idx));
+ PP.setDeserializedSafeBufferOptOutMap(SrcLocs);
+ }
+ break;
+ }
+
case PP_CONDITIONAL_STACK:
if (!Record.empty()) {
unsigned Idx = 0, End = Record.size() - 1;
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 30195868ca996..3203173640d18 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -911,6 +911,7 @@ void ASTWriter::WriteBlockInfoBlock() {
RECORD(PP_CONDITIONAL_STACK);
RECORD(DECLS_TO_CHECK_FOR_DEFERRED_DIAGS);
RECORD(PP_ASSUME_NONNULL_LOC);
+ RECORD(PP_UNSAFE_BUFFER_USAGE);
// SourceManager Block.
BLOCK(SOURCE_MANAGER_BLOCK);
@@ -2439,6 +2440,12 @@ void ASTWriter::WritePreprocessor(const Preprocessor &PP, bool IsModule) {
Record.clear();
}
+ // Write the safe buffer opt-out region map in PP
+ for (SourceLocation &S : PP.serializeSafeBufferOptOutMap())
+ AddSourceLocation(std::move(S), Record);
+ Stream.EmitRecord(PP_UNSAFE_BUFFER_USAGE, Record);
+ Record.clear();
+
// Enter the preprocessor block.
Stream.EnterSubblock(PREPROCESSOR_BLOCK_ID, 3);
diff --git a/clang/test/Modules/Inputs/SafeBuffers/base.h b/clang/test/Modules/Inputs/SafeBuffers/base.h
new file mode 100644
index 0000000000000..660df17998a48
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/base.h
@@ -0,0 +1,9 @@
+#ifdef __cplusplus
+int base(int *p) {
+ int x = p[5];
+#pragma clang unsafe_buffer_usage begin
+ int y = p[5];
+#pragma clang unsafe_buffer_usage end
+ return x + y;
+}
+#endif
diff --git a/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap b/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap
new file mode 100644
index 0000000000000..a4826c6514b98
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/safe_buffers_test.modulemap
@@ -0,0 +1,10 @@
+module safe_buffers_test_base {
+ header "base.h"
+}
+
+module safe_buffers_test_optout {
+ explicit module test_sub1 { header "test_sub1.h" }
+ explicit module test_sub2 { textual header "test_sub2.h" }
+ use safe_buffers_test_base
+}
+
diff --git a/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h b/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h
new file mode 100644
index 0000000000000..c2b105c6e748a
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/test_sub1.h
@@ -0,0 +1,20 @@
+#include "base.h"
+
+#ifdef __cplusplus
+int sub1(int *p) {
+ int x = p[5];
+#pragma clang unsafe_buffer_usage begin
+ int y = p[5];
+#pragma clang unsafe_buffer_usage end
+ return x + y;
+}
+
+template <typename T>
+T sub1_T(T *p) {
+ T x = p[5];
+#pragma clang unsafe_buffer_usage begin
+ T y = p[5];
+#pragma clang unsafe_buffer_usage end
+ return x + y;
+}
+#endif
diff --git a/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h b/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h
new file mode 100644
index 0000000000000..11eb9591ab600
--- /dev/null
+++ b/clang/test/Modules/Inputs/SafeBuffers/test_sub2.h
@@ -0,0 +1,11 @@
+#include "base.h"
+
+#ifdef __cplusplus
+int sub2(int *p) {
+ int x = p[5];
+#pragma clang unsafe_buffer_usage begin
+ int y = p[5];
+#pragma clang unsafe_buffer_usage end
+ return x + y;
+}
+#endif
diff --git a/clang/test/Modules/safe_buffers_optout.cpp b/clang/test/Modules/safe_buffers_optout.cpp
new file mode 100644
index 0000000000000..dd8b1cca5ec89
--- /dev/null
+++ b/clang/test/Modules/safe_buffers_optout.cpp
@@ -0,0 +1,39 @@
+// RUN: rm -rf %t
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodule-name=safe_buffers_test_base -x c++ %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20\
+// RUN: -o %t/safe_buffers_test_base.pcm -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodule-name=safe_buffers_test_optout -x c++ %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20\
+// RUN: -o %t/safe_buffers_test_optout.pcm -fmodule-file=%t/safe_buffers_test_base.pcm -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -verify -fmodule-file=%t/safe_buffers_test_optout.pcm -I %S/Inputs/SafeBuffers %s\
+// RUN: -std=c++20 -Wunsafe-buffer-usage
+
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodules-cache-path=%t -fmodule-name=safe_buffers_test_base -x c++\
+// RUN: %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20 -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -emit-module -fmodules-cache-path=%t -fmodule-name=safe_buffers_test_optout -x c++\
+// RUN: %S/Inputs/SafeBuffers/safe_buffers_test.modulemap -std=c++20 -Wunsafe-buffer-usage
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -verify -fmodules-cache-path=%t -I %S/Inputs/SafeBuffers %s -x c++\
+// RUN: -std=c++20 -Wunsafe-buffer-usage
+
+#include "test_sub1.h"
+#include "test_sub2.h"
+
+// Testing safe buffers opt-out region serialization with modules: this
+// file loads 2 submodules from top-level module
+// `safe_buffers_test_optout`, which uses another top-level module
+// `safe_buffers_test_base`. (So the module dependencies form a DAG.)
+
+// [email protected]:3{{unsafe buffer access}}
+// [email protected]:3{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub1.h:5{{unsafe buffer access}}
+// expected-note@test_sub1.h:5{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub1.h:14{{unsafe buffer access}}
+// expected-note@test_sub1.h:14{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@test_sub2.h:5{{unsafe buffer access}}
+// expected-note@test_sub2.h:5{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int foo(int * p) {
+ int x = p[5]; // expected-warning{{unsafe buffer access}} expected-note{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+#pragma clang unsafe_buffer_usage begin
+ int y = p[5];
+#pragma clang unsafe_buffer_usage end
+ sub1_T(p); // instantiate template
+ return base(p) + sub1(p) + sub2(p);
+}
diff --git a/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp
new file mode 100644
index 0000000000000..04ac77b3f342e
--- /dev/null
+++ b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch-complex.cpp
@@ -0,0 +1,72 @@
+// A more complex example than warn-unsafe-buffer-usage-pragma-pch.cpp:
+// MAIN - includes INC_H_1
+// \ loads PCH_H_1 - includes INC_H_2
+// \ loads PCH_H_2
+
+// Test with PCH
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -emit-pch -o %t-1 -DPCH_H_2 %s
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -include-pch %t-1 -emit-pch -o %t-2 -DPCH_H_1 %s
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include-pch %t-2 -DMAIN -verify %s
+
+#define UNSAFE_BEGIN _Pragma("clang unsafe_buffer_usage begin")
+#define UNSAFE_END _Pragma("clang unsafe_buffer_usage end")
+
+
+#ifdef INC_H_1
+#undef INC_H_1
+int a(int *s) {
+ s[2]; // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+ return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef INC_H_2
+#undef INC_H_2
+int b(int *s) {
+ s[2]; // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+ return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef PCH_H_1
+#undef PCH_H_1
+#define INC_H_2
+#include "warn-unsafe-buffer-usage-pragma-pch-complex.cpp"
+
+int c(int *s) {
+ s[2]; // <- expected warning here
+UNSAFE_BEGIN
+ return s[1];
+UNSAFE_END
+}
+#endif
+
+#ifdef PCH_H_2
+#undef PCH_H_2
+int d(int *s) {
+ s[2]; // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+ return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+#endif
+
+#ifdef MAIN
+#undef MAIN
+#define INC_H_1
+#include "warn-unsafe-buffer-usage-pragma-pch-complex.cpp"
+
+// expected-warning@-45{{unsafe buffer access}} expected-note@-45{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-36{{unsafe buffer access}} expected-note@-36{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-24{{unsafe buffer access}} expected-note@-24{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+// expected-warning@-15{{unsafe buffer access}} expected-note@-15{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int main() {
+ int s[] = {1, 2, 3};
+ return a(s) + b(s) + c(s) + d(s);
+}
+#undef MAIN
+#endif
diff --git a/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp
new file mode 100644
index 0000000000000..abd3f0ffe9565
--- /dev/null
+++ b/clang/test/SemaCXX/warn-unsafe-buffer-usage-pragma-pch.cpp
@@ -0,0 +1,27 @@
+// The original example from https://github.com/llvm/llvm-project/issues/90501
+
+// Test without PCH
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include %s -verify %s
+// Test with PCH
+// RUN: %clang_cc1 -Wno-unused-value -std=c++20 -emit-pch -o %t %s
+// RUN: %clang_cc1 -Wno-unused-value -Wunsafe-buffer-usage -std=c++20 -include-pch %t -verify %s
+
+#ifndef A_H
+#define A_H
+
+int a(int *s) {
+ s[2]; // <- expected warning here
+#pragma clang unsafe_buffer_usage begin
+ return s[1];
+#pragma clang unsafe_buffer_usage end
+}
+
+#else
+// expected-warning@-7{{unsafe buffer access}}
+// expected-note@-8{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}}
+int main() {
+ int s[] = {1, 2, 3};
+ return a(s);
+}
+
+#endif
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy that this is going somewhere! Everything makes sense to me but I also don't know a lot about this stuff.
During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized.
Hmm how does this work? When a precompiled header includes another precompiled header, do they both get loaded independently and then you get to combine the maps?
@@ -1551,6 +1567,58 @@ bool Preprocessor::isPPInSafeBufferOptOutRegion(SourceLocation &StartLoc) { | |||
return InSafeBufferOptOutRegion; | |||
} | |||
|
|||
SmallVector<SourceLocation, 64> | |||
Preprocessor::serializeSafeBufferOptOutMap() const { | |||
assert(!InSafeBufferOptOutRegion && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require the pragma to be closed before the end of TU right? And that's a hard error, not a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in this case we need to support the situation when the warning was ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm sorry, they are not warnings, they are errors:
llvm-project/clang/lib/Lex/Pragma.cpp
Line 1265 in c796900
PP.Diag(Loc, diag::err_pp_double_begin_pragma_unsafe_buffer_usage); |
For loading multiple PCHs, it is a chain. Suppose It is the same story for modules, except that modules form a DAG instead of a chain. Clang will load each module file in an order with respect to the DAG. So for each module/PCH emitted by clang, only their own opt-out regions need to be serialized. |
I am curious so I will ask: why did this take an approach of serializing the output of the pragmas (the maps) instead of serializing the pragmas themselves as AST nodes, so that the maps would be constructed in the same way as from parsing the text directly? that's how other pragmas were serialized that I have seen. |
These pragmas don't really translate very well into the AST. Similarly to
We could support it in the same way that clang supports Duff's Device but I don't think this is really a good idea. We ultimately want them to be textual, a preprocessor-level construct, because that's what security audits are looking for: an extremely clear sign that this is where unchecked pointer operations live. So that they could tell it without looking at anything else in the code. |
yeah, |
Thanks for the explanation :) |
Throw an error if clang deserializes a safe buffer opt-out region that begins and ends at different files.
Just realized how stupid I am (O_o)---the current implementation only compares source locations with opt-out regions that are in the same source file, for all source locations from loaded files. This is not going to work for cases like below.
Add [WIP] back to this PR. More work is needed. |
Figured out how to identify a loaded AST from a source location. With that, we can eliminate the restriction that a de-serialized safe buffer opt-out region cannot span over multiple files. (Of course those files need to be in the same TU.)
Learned more about how loaded files & ASTs are managed in clang. Now opt-out regions in a loaded AST can span over multiple files (within one TU). They now behave in a consistent way when they are from loaded files or local files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialization changes look good to me. I'll leave the formal approve chance to others.
// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -verify -fmodules-cache-path=%t -fmodule-map-file=%S/Inputs/SafeBuffers/safe_buffers_test.modulemap -I %S/Inputs/SafeBuffers %s\ | ||
// RUN: -x c++ -std=c++20 -Wunsafe-buffer-usage | ||
|
||
#include "test_sub1.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we prefer using split-file
in tests if there are multiple inputs. You can find examples in the same folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment! split-file
is really neat!!
@@ -2439,6 +2440,12 @@ void ASTWriter::WritePreprocessor(const Preprocessor &PP, bool IsModule) { | |||
Record.clear(); | |||
} | |||
|
|||
// Write the safe buffer opt-out region map in PP | |||
for (SourceLocation &S : PP.serializeSafeBufferOptOutMap()) | |||
AddSourceLocation(std::move(S), Record); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we won't move source location.
if (!Map.empty() && Map.back().first == Map.back().second) | ||
return SourceMgr.isBeforeInTranslationUnit(Map.back().first, Loc); | ||
return false; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok @ziqingluo-90 has just walked me through this offline and I think I understand what's happening now.
What was unobvious to me: shouldn't we TestInMap
over the entire stack of includes? And the answer is:
- Regular textual
#include
s will just show up in the current map normally; there's no need to go up the stack. - PCHes are going to be included with the
-include-pch
flag, not directly. There's not really much of a stack going on at this point and there's no way to "put pragmas around the compiler flag". - In case of modules, we don't care what this function returns deep inside a module, because we probably shouldn't be analyzing that code in the first place. Instead, the warning should be naturally enabled in the clang invocation that builds the module itself, and that's where the warning would be emitted, taking only the current module into account.
- As a matter of fact, as of today we're analyzing that nested code. But we shouldn't be, that's not how modules are intended to work. We consider it a bug and have a separate plan to fix this. Right now this means that the warning will be emitted multiple times for the same line of code (both by module compilation and by .cpp file compilation, and possibly by everything in between). And the worst that could happen if this function misbehaves, is that some of these duplicates will disappear. Which is annoying but harmless.
Let's add a comment explaining this!
|
||
auto [FID, Ignore] = getDecomposedLoc(Loc); | ||
const FileID *FirstFID = | ||
llvm::lower_bound(LoadedSLocEntryAllocBegin, FID, std::greater<FileID>{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite magical to me because I don't understand how FileIDs are assigned and in what order they show up in the LoadedSLocEntryAllocBegin
list. But that's probably ok, this looks like completely normal SourceManager
code to me.
Ok I think I'm completely happy with the patch now. It makes perfect sense to me and appears to be doing the right thing. Thank you @ziqingluo-90 for figuring this out! Let's add that comment and land. |
clang/lib/Basic/SourceManager.cpp
Outdated
const FileID *FirstFID = | ||
llvm::lower_bound(LoadedSLocEntryAllocBegin, FID, std::greater<FileID>{}); | ||
|
||
assert(FirstFID && | ||
"The failure to find the first FileID of a " | ||
"loaded AST from a loaded source location was unexpected."); | ||
return *FirstFID; | ||
return FirstFID->getHashValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can we go back to returning FileID
here? We really shouldn't rely on the fact that the hash value is equal to the actual thing we're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok LGTM!! Thanks again for figuring this all out!
The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. For de-serialization, regions from loaded files are stored by their ASTs. When later one queries if a loaded location L is in an opt-out region, PP looks up the regions of the loaded AST where L is at. (Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a warning-opt-out region. The begin and end locations (opt-out regions) are stored in preprocessor instances (PP) and will be queried by the `-Wunsafe-buffer-usage` analyzer.) The reported issue at upstream: llvm#90501 rdar://124035402 (cherry picked from commit 2e7b95e)
…lease/6.0 🍒[Safe Buffers] Serialize unsafe_buffer_usage pragmas (llvm#92031)
The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. For de-serialization, regions from loaded files are stored by their ASTs. When later one queries if a loaded location L is in an opt-out region, PP looks up the regions of the loaded AST where L is at. (Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a warning-opt-out region. The begin and end locations (opt-out regions) are stored in preprocessor instances (PP) and will be queried by the `-Wunsafe-buffer-usage` analyzer.) The reported issue at upstream: llvm#90501 rdar://124035402
A pair of
#pragma clang unsafe_buffer_usage begin/end
pragmas marks a warning-opt-out region. The begin and end locations (opt-out regions) are stored in preprocessor instances (PP) and will be queried by the-Wunsafe-buffer-usage
analyzer.The commit adds serialization and de-serialization implementations for the stored regions. Basically, the serialized representation of the regions of a PP is a (ordered) sequence of source location encodings. During serialization, it only serializes regions of the current translation unit. Regions from loaded files are not serialized. During de-serialization, regions from loaded files are stored by their source files. When later one queries if a loaded location
l
is in an opt-out region, PP looks up regions by the source file ofl
.The reported issue at upstream: #90501
rdar://124035402