Skip to content

[clang][modules] Fix use-after-free in header serialization #96356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

jansvoboda11
Copy link
Contributor

With the pruning of unused module map files disabled (-fno-modules-prune-non-affecting-module-map-files), HeaderFileInfo no longer gets deserialized before ASTWriter::WriteHeaderSearch(). This function then interleaves the stores of references to KnownHeader with their lazy deserialization. Lazy deserialization may cause reallocation of ModuleMap::Headers entries (including its SmallVector<KnownHeader, 1> values) thus making previously-stored ArrayRef<KnownHeader> dangling. This patch fixes that situation by storing a copy instead.

With the pruning of unused module map files disabled (`-fno-modules-prune-non-affecting-module-map-files`), `HeaderFileInfo` no longer gets deserialized before `ASTWriter::WriteHeaderSearch()`. This function then interleaves the stores of references to `KnownHeader` with their lazy deserialization. Lazy deserialization may cause reallocation of `ModuleMap::Headers` entries (including its `SmallVector<KnownHeader, 1>` values) thus making previously-stored `ArrayRef<KnownHeader>` dangling. This patch fixes that situation by storing a copy instead.
@jansvoboda11 jansvoboda11 requested a review from vsapsai June 21, 2024 20:29
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jun 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Jan Svoboda (jansvoboda11)

Changes

With the pruning of unused module map files disabled (-fno-modules-prune-non-affecting-module-map-files), HeaderFileInfo no longer gets deserialized before ASTWriter::WriteHeaderSearch(). This function then interleaves the stores of references to KnownHeader with their lazy deserialization. Lazy deserialization may cause reallocation of ModuleMap::Headers entries (including its SmallVector&lt;KnownHeader, 1&gt; values) thus making previously-stored ArrayRef&lt;KnownHeader&gt; dangling. This patch fixes that situation by storing a copy instead.


Full diff: https://github.com/llvm/llvm-project/pull/96356.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTWriter.cpp (+8-2)
  • (added) clang/test/Modules/use-after-free-2.c (+180)
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index e6a58dcc61e3f..95b1165154c05 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -1973,9 +1973,15 @@ namespace {
         llvm::PointerIntPair<Module *, 2, ModuleMap::ModuleHeaderRole>;
 
     struct data_type {
-      const HeaderFileInfo &HFI;
+      data_type(const HeaderFileInfo &HFI, bool AlreadyIncluded,
+                ArrayRef<ModuleMap::KnownHeader> KnownHeaders,
+                UnresolvedModule Unresolved)
+          : HFI(HFI), AlreadyIncluded(AlreadyIncluded),
+            KnownHeaders(KnownHeaders), Unresolved(Unresolved) {}
+
+      HeaderFileInfo HFI;
       bool AlreadyIncluded;
-      ArrayRef<ModuleMap::KnownHeader> KnownHeaders;
+      SmallVector<ModuleMap::KnownHeader, 1> KnownHeaders;
       UnresolvedModule Unresolved;
     };
     using data_type_ref = const data_type &;
diff --git a/clang/test/Modules/use-after-free-2.c b/clang/test/Modules/use-after-free-2.c
new file mode 100644
index 0000000000000..0c89c759bcb75
--- /dev/null
+++ b/clang/test/Modules/use-after-free-2.c
@@ -0,0 +1,180 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+//--- A.modulemap
+module A {
+  header "A.h"
+
+  textual header "A00.h"
+  textual header "A01.h"
+  textual header "A02.h"
+  textual header "A03.h"
+  textual header "A04.h"
+  textual header "A05.h"
+  textual header "A06.h"
+  textual header "A07.h"
+  textual header "A08.h"
+  textual header "A09.h"
+
+  textual header "A10.h"
+  textual header "A11.h"
+  textual header "A12.h"
+  textual header "A13.h"
+  textual header "A14.h"
+  textual header "A15.h"
+  textual header "A16.h"
+  textual header "A17.h"
+  textual header "A18.h"
+  textual header "A19.h"
+
+  textual header "A20.h"
+  textual header "A21.h"
+  textual header "A22.h"
+  textual header "A23.h"
+  textual header "A24.h"
+  textual header "A25.h"
+  textual header "A26.h"
+  textual header "A27.h"
+  textual header "A28.h"
+  textual header "A29.h"
+
+  textual header "A30.h"
+  textual header "A31.h"
+  textual header "A32.h"
+  textual header "A33.h"
+  textual header "A34.h"
+  textual header "A35.h"
+  textual header "A36.h"
+  textual header "A37.h"
+  textual header "A38.h"
+  textual header "A39.h"
+
+  textual header "A40.h"
+  textual header "A41.h"
+  textual header "A42.h"
+  textual header "A43.h"
+  textual header "A44.h"
+  textual header "A45.h"
+}
+//--- A.h
+
+//--- A00.h
+//--- A01.h
+//--- A02.h
+//--- A03.h
+//--- A04.h
+//--- A05.h
+//--- A06.h
+//--- A07.h
+//--- A08.h
+//--- A09.h
+
+//--- A10.h
+//--- A11.h
+//--- A12.h
+//--- A13.h
+//--- A14.h
+//--- A15.h
+//--- A16.h
+//--- A17.h
+//--- A18.h
+//--- A19.h
+
+//--- A20.h
+//--- A21.h
+//--- A22.h
+//--- A23.h
+//--- A24.h
+//--- A25.h
+//--- A26.h
+//--- A27.h
+//--- A28.h
+//--- A29.h
+
+//--- A30.h
+//--- A31.h
+//--- A32.h
+//--- A33.h
+//--- A34.h
+//--- A35.h
+//--- A36.h
+//--- A37.h
+//--- A38.h
+//--- A39.h
+
+//--- A40.h
+//--- A41.h
+//--- A42.h
+//--- A43.h
+//--- A44.h
+//--- A45.h
+
+//--- B.modulemap
+module B { header "B.h" }
+//--- B.h
+#include "A.h"
+
+//--- C.modulemap
+module C { header "C.h" }
+//--- C.h
+#include "A00.h"
+#include "A01.h"
+#include "A02.h"
+#include "A03.h"
+#include "A04.h"
+#include "A05.h"
+#include "A06.h"
+#include "A07.h"
+#include "A08.h"
+#include "A09.h"
+
+#include "A10.h"
+#include "A11.h"
+#include "A12.h"
+#include "A13.h"
+#include "A14.h"
+#include "A15.h"
+#include "A16.h"
+#include "A17.h"
+#include "A18.h"
+#include "A19.h"
+
+#include "A20.h"
+#include "A21.h"
+#include "A22.h"
+#include "A23.h"
+#include "A24.h"
+#include "A25.h"
+#include "A26.h"
+#include "A27.h"
+#include "A28.h"
+#include "A29.h"
+
+#include "A30.h"
+#include "A31.h"
+#include "A32.h"
+#include "A33.h"
+#include "A34.h"
+#include "A35.h"
+#include "A36.h"
+#include "A37.h"
+#include "A38.h"
+#include "A39.h"
+
+#include "A40.h"
+#include "A41.h"
+#include "A42.h"
+#include "A43.h"
+#include "A44.h"
+#include "A45.h"
+
+#include "B.h"
+
+// RUN: %clang_cc1 -fmodules -fno-modules-prune-non-affecting-module-map-files \
+// RUN:   -emit-module %t/A.modulemap -fmodule-name=A -o %t/A.pcm
+// RUN: %clang_cc1 -fmodules -fno-modules-prune-non-affecting-module-map-files \
+// RUN:   -emit-module %t/B.modulemap -fmodule-name=B -o %t/B.pcm \
+// RUN:   -fmodule-file=A=%t/A.pcm -fmodule-map-file=%t/A.modulemap
+// RUN: %clang_cc1 -fmodules -fno-modules-prune-non-affecting-module-map-files \
+// RUN:   -emit-module %t/C.modulemap -fmodule-name=C -o %t/C.pcm \
+// RUN:   -fmodule-file=B=%t/B.pcm -fmodule-map-file=%t/B.modulemap

@vsapsai
Copy link
Collaborator

vsapsai commented Jun 28, 2024

Non-representative check for changes in the memory consumption doesn't show anything interesting for the added test case. Maximum resident set size and peak memory footprint with the change and without it are pretty close to each other, no statistic, just eyeballing.

My main concern with this approach is I'm not sure that after fixing the memory issue we don't have a remaining logic bug. Because we are still modifying ModuleMap::Headers during ASTWriter::WriteHeaderSearch iteration and that looks suspicious. Need to think how to make sure the discovered unexpected modification isn't causing other problems.

@jansvoboda11
Copy link
Contributor Author

My main concern with this approach is I'm not sure that after fixing the memory issue we don't have a remaining logic bug. Because we are still modifying ModuleMap::Headers during ASTWriter::WriteHeaderSearch iteration and that looks suspicious. Need to think how to make sure the discovered unexpected modification isn't causing other problems.

How do you suggest we verify this? I don't see Clang storing reference to the KnownHeader vector anywhere else.

@jansvoboda11
Copy link
Contributor Author

Ping @vsapsai.

@vsapsai
Copy link
Collaborator

vsapsai commented Jul 3, 2024

Looked at this more and haven't found anything sketchy. ASTWriter::WriteHeaderSearch writes the same content both with and without -fno-modules-prune-non-affecting-module-map-files. Preprocessor::alreadyIncluded is called in reasonable places (as for me, ASTWriter is the most questionable spot).

As I cannot find any other risks, ensuring memory safety looks good to me.

@jansvoboda11 jansvoboda11 merged commit 0387a86 into llvm:main Jul 8, 2024
10 checks passed
@jansvoboda11 jansvoboda11 deleted the fix-use-after-free branch July 8, 2024 16:26
jansvoboda11 added a commit to swiftlang/llvm-project that referenced this pull request Jul 8, 2024
With the pruning of unused module map files disabled
(`-fno-modules-prune-non-affecting-module-map-files`), `HeaderFileInfo`
no longer gets deserialized before `ASTWriter::WriteHeaderSearch()`.
This function then interleaves the stores of references to `KnownHeader`
with their lazy deserialization. Lazy deserialization may cause
reallocation of `ModuleMap::Headers` entries (including its
`SmallVector<KnownHeader, 1>` values) thus making previously-stored
`ArrayRef<KnownHeader>` dangling. This patch fixes that situation by
storing a copy instead.

(cherry picked from commit 0387a86)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants