Skip to content

[clang] use relative paths for builtin headers during module compilation #68023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

rmaz
Copy link
Contributor

@rmaz rmaz commented Oct 2, 2023

When including builtin headers as part of a system module, ensure we use absolute paths to those headers. Otherwise the module will fail to compile when specifying relative resource directories.

@rmaz rmaz requested a review from benlangmuir October 2, 2023 19:24
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Oct 2, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Changes

When including builtin headers as part of a system module, ensure we use absolute paths to those headers. Otherwise the module will fail to compile when specifying relative resource directories.


Full diff: https://github.com/llvm/llvm-project/pull/68023.diff

3 Files Affected:

  • (modified) clang/lib/Lex/ModuleMap.cpp (+4)
  • (added) clang/test/Modules/Inputs/builtin-headers/module.modulemap (+3)
  • (added) clang/test/Modules/relative-resource-dir.m (+8)
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index e8437572ebf4bf6..75dd76a63063274 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -348,6 +348,10 @@ bool ModuleMap::resolveAsBuiltinHeader(
   if (!File)
     return false;
 
+  // Ensure the path to the module directory is absolute, otherwise
+  // builtin headers will fail to resolve when using relative resource
+  // directory paths without a -I.
+  llvm::sys::fs::make_absolute(Path);
   auto Role = headerKindToRole(Header.Kind);
   Module::Header H = {Header.FileName, std::string(Path.str()), *File};
   addHeader(Mod, H, Role);
diff --git a/clang/test/Modules/Inputs/builtin-headers/module.modulemap b/clang/test/Modules/Inputs/builtin-headers/module.modulemap
new file mode 100644
index 000000000000000..78a5b730dc6a925
--- /dev/null
+++ b/clang/test/Modules/Inputs/builtin-headers/module.modulemap
@@ -0,0 +1,3 @@
+module ModuleWithBuiltinHeader [system] {
+    header "float.h"
+}
\ No newline at end of file
diff --git a/clang/test/Modules/relative-resource-dir.m b/clang/test/Modules/relative-resource-dir.m
new file mode 100644
index 000000000000000..1d38fe922e71849
--- /dev/null
+++ b/clang/test/Modules/relative-resource-dir.m
@@ -0,0 +1,8 @@
+// RUN: EXPECTED_RESOURCE_DIR=`%clang -print-resource-dir` && \
+// RUN:  mkdir -p %t && rm -rf %t/resource-dir && \
+// RUN:  cp -R $EXPECTED_RESOURCE_DIR %t/resource-dir
+// RUN: cd %t && %clang -cc1 -x objective-c -fmodules -fmodule-format=obj \
+// RUN:   -fimplicit-module-maps -fmodules-cache-path=%t.mcp \
+// RUN:   -fbuiltin-headers-in-system-modules -resource-dir resource-dir \
+// RUN:   -emit-module %S/Inputs/builtin-headers/module.modulemap \
+// RUN:   -fmodule-name=ModuleWithBuiltinHeader -o %t.pcm

@rmaz rmaz requested a review from jansvoboda11 October 2, 2023 19:26
@rmaz rmaz force-pushed the absbuiltinhdrpath branch 2 times, most recently from a44aa3a to 3617539 Compare October 3, 2023 15:03
Copy link
Collaborator

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me, but I would like @jansvoboda11 to also look at it.

@jansvoboda11
Copy link
Contributor

This is now always storing an absolute path into Header::PathRelativeToRootModuleDirectory for built-in headers. I guess that's not much more surprising than the "as-written" resource directory prepended to the header name we stored previously, but I think this should be clarified somehow (either by documentation, or by renaming the member).

My bigger concern is that we now generate absolute paths into the <module-includes> buffer which gets serialized into the PCM file. I believe that some folks using Clang modules rely on them being "relocatable", i.e. having no absolute paths serialized. I believe this patch would break that use-case.

I'm not sure what the best solution here... Maybe prepending #pragma before #include of each builtin header?

@rmaz
Copy link
Contributor Author

rmaz commented Oct 5, 2023

This is now always storing an absolute path into Header::PathRelativeToRootModuleDirectory for built-in headers

Wouldn't this always have been the case? I'll check shortly with a resource dir picked up via toolchain bundle layout.

My bigger concern is that we now generate absolute paths into the buffer which gets serialized into the PCM file. I believe that some folks using Clang modules rely on them being "relocatable", i.e. having no absolute paths serialized. I believe this patch would break that use-case.

This is the use case I am trying to support, via -fmodule-file-home-is-cwd and with the resource dir as a subfolder of the cwd. This works currently, but we require adding a -I. to module compilation or modules with builtin headers will not compile successfully. If the resource dir is outside of the cwd then you would have to have an absolute path here anyway, wouldn't you?

I guess the alternative would be to set PathRelativeToRootModuleDirectory relative to the resource dir module directory instead, and rely on the resource dir being added to the header search paths to make lookup succeed. This seems confusing too though, as you would expect the PathRelativeToRootModuleDirectory to be the module being compiled, not a different module.

@rmaz
Copy link
Contributor Author

rmaz commented Oct 5, 2023

Maybe prepending #pragma before #include of each builtin header?

Could you expand on this? Not sure I get the idea.

@rmaz
Copy link
Contributor Author

rmaz commented Oct 5, 2023

Wouldn't this always have been the case? I'll check shortly with a resource dir picked up via toolchain bundle layout.

Looks like yes:

Process 77520 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000109b568fd clang`clang::ModuleMap::resolveAsBuiltinHeader(this=0x000000011b02ac70, Mod=0x000000011b03d600, Header=0x00007ff7bfefabd0) at ModuleMap.cpp:348:8
   345 	  SmallString<128> Path;
   346 	  llvm::sys::path::append(Path, BuiltinIncludeDir->getName(), Header.FileName);
   347 	  auto File = SourceMgr.getFileManager().getOptionalFileRef(Path);
-> 348 	  if (!File)
   349 	    return false;
   350 	
   351 	  // Ensure the path to the module directory is absolute, otherwise
Target 0: (clang) stopped.
(lldb) p Path.str()
(llvm::StringRef)  (Data = "/Users/rhow/local/github/llvmbuilddbg.noindex/lib/clang/18/include/float.h", Length = 74)

This was debugging:

clang -cc1 -x objective-c -fmodules -fmodule-format=raw -fimplicit-module-maps -fmodules-cache-path=/tmp/mcp -fbuiltin-headers-in-system-modules -emit-module clang/test/Modules/Inputs/builtin-headers/module.modulemap -fmodule-name=ModuleWithBuiltinHeader -o /tmp/o.pcm

@jansvoboda11
Copy link
Contributor

If the resource dir is outside of the cwd then you would have to have an absolute path here anyway, wouldn't you?

Yes. But my understanding was that's not what you're interested in. I thought you're trying to fix the situation where your resource dir is in the CWD, so that's the context I'm assuming. Is that correct?

This is now always storing an absolute path into Header::PathRelativeToRootModuleDirectory for built-in headers

Wouldn't this always have been the case? I'll check shortly with a resource dir picked up via toolchain bundle layout.

I was considering your test case, where we'd previously store "resource-dir/include/float.h" into Header::PathRelativeToRootModuleDirectory (knowing that ends up causing failures down the line). Now we're essentially guaranteeing that for built-in headers, the path is absolute ("/path/to/cwd/resource-dir/include/float.h"), contrary to what the member name suggests. I find that a bit confusing.

This is the use case I am trying to support, via -fmodule-file-home-is-cwd ...

But -fmodule-file-home-is-cwd will not go as far as fixing up the contents of SourceManager buffers. So while all the paths you serialize into the PCM will get "relativized", the <module-includes> buffer would still contain the absolute path in the form of:

#include "/path/to/cwd/resource-dir/include/float.h"

If you originally compiled the PCM on machine A that has CWD set to "/path/to/cwd", you could probably still successfully load it on machine B with CWD set to "/blah". But if you re-compiled the PCM on machine B, the output would not be the same, since the PCM would now embed this into the <module-includes> buffer instead:

#include "/blah/resource-dir/include/float.h"

Is that not a concern for your use-case?

Maybe prepending #pragma before #include of each builtin header?

Could you expand on this? Not sure I get the idea.

I was going to suggest keeping Header::PathRelativeToRootModuleDirectory as "float.h", and generating this into the <module-includes> buffer:

#pragma prefix the following header path with the builtin includes directory
#include "float.h"

This would ensure both machines A and B produce identical PCM files.

@rmaz
Copy link
Contributor Author

rmaz commented Oct 5, 2023

I thought you're trying to fix the situation where your resource dir is in the CWD, so that's the context I'm assuming. Is that correct?

Yes.

I was considering your test case, where we'd previously store "resource-dir/include/float.h" into Header::PathRelativeToRootModuleDirectory

Oh I see, being a bit slow there.

But -fmodule-file-home-is-cwd will not go as far as fixing up the contents of SourceManager buffers. So while all the paths you serialize into the PCM will get "relativized", the buffer would still contain the absolute path

Ah, didn't consider that, yes that is an issue you are right.

I was going to suggest keeping Header::PathRelativeToRootModuleDirectory as "float.h", and generating this into the buffer

Wouldn't this work without the pragma? Isn't the resource directory always added to the header search paths, so the header should be found correctly if we set PathRelativeToRootModuleDirectory relative to the resource dir for builtin headers?

@jansvoboda11
Copy link
Contributor

I was going to suggest keeping Header::PathRelativeToRootModuleDirectory as "float.h", and generating this into the buffer

Wouldn't this work without the pragma? Isn't the resource directory always added to the header search paths, so the header should be found correctly if we set PathRelativeToRootModuleDirectory relative to the resource dir for builtin headers?

Yeah, the driver adds -internal-isystem <resource-dir>/include to the -cc1 invocation (on Darwin). From your test-case I assumed you wanted the compilation to succeed even without this include path. If that's not the case then yes, you can simply put #include "float.h" into the <module-includes> buffer by storing UnresolvedHeaderDirective::FileName to Header::PathRelativeToRootModuleDirectory and then fixing the module map header resolution by doing this in ModuleMap::findHeader():

auto Directory = Header.HasBuiltinHeader ? BuiltinIncludeDir : M->Directory;

Which effectively mimics what we already do in ModuleMap::resolveAsBuiltinHeader().

@rmaz
Copy link
Contributor Author

rmaz commented Oct 6, 2023

I had a go at this approach this morning, it mostly works but fails for the case where we have a module with a textual header with the same name as a builtin header. When using relative paths for the builtin headers we now prefer the module's textual header instead of the builtin header. This fails the following test case:
https://github.com/llvm/llvm-project/blob/main/clang/test/Modules/cstd.m#L27-L29

I believe this is because:

What would you suggest as the best solution here? Do you think its reasonable to always add the builtin header directory to the front of the Includers ahead of the module build directory?

@rmaz rmaz force-pushed the absbuiltinhdrpath branch from 3617539 to d824f38 Compare October 6, 2023 22:12
@rmaz rmaz changed the title [clang] use absolute path for builtin headers during module compilation [clang] use relative paths for builtin headers during module compilation Oct 6, 2023
@rmaz rmaz force-pushed the absbuiltinhdrpath branch from d824f38 to 9d5ae4b Compare October 6, 2023 22:21
@rmaz
Copy link
Contributor Author

rmaz commented Oct 6, 2023

How about the updated approach? This will use a different include dir based on if the header is a builtin header name or not, which also solves the issue of having to pass the extra include path to the frontend.

@rmaz
Copy link
Contributor Author

rmaz commented Oct 16, 2023

@jansvoboda11 still good with this change, or is there a less invasive way to get this done?

@jansvoboda11
Copy link
Contributor

@jansvoboda11 still good with this change, or is there a less invasive way to get this done?

Hi, sorry for being unresponsive, I'm out of office until the end of the week. This looks reasonable to me, but I'd like to run a smoke test with the Darwin SDK first. I can get back to you early next week. Sounds good?

auto Directory = M->Directory;
// Search for the header file within the module's home directory
// or the builtin include dir if this is a builtin header.
auto Directory = Header.HasBuiltinHeader ? BuiltinIncludeDir : M->Directory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't, no. The header ends up resolved against the resource dir as expected:

<SUBMODULE_BLOCK NumWords=63 BlockCodeSize=5>
    <SUBMODULE_METADATA op0=1 op1=0/>
    <SUBMODULE_DEFINITION abbrevid=4 op0=1 op1=0 op2=0 op3=18 op4=0 op5=0 op6=1 op7=0 op8=0 op9=0 op10=0 op11=0 op12=0 op13=1/> blob data = 'ModuleWithBuiltinHeader'
    <SUBMODULE_HEADER abbrevid=6/> blob data = 'float.h'
    <SUBMODULE_TOPHEADER abbrevid=7/> blob data = '/home/llvmbuilddbg.noindex/tools/clang/test/Modules/Output/relative-resource-dir.m.tmp/resource-dir/include/float.h'
  </SUBMODULE_BLOCK>

@jansvoboda11
Copy link
Contributor

jansvoboda11 commented Oct 20, 2023

Our downstream test ClangScanDeps/modules-include-tree-prefix-map.c started failing with the newer version of this PR, so CC'ing @benlangmuir to take another look.

@rmaz rmaz force-pushed the absbuiltinhdrpath branch from 9d5ae4b to 5e3c631 Compare October 20, 2023 19:58
@benlangmuir
Copy link
Collaborator

Thanks for your patience, I should have looked at this sooner. The change to our downstream test looks expected; the <module-includes> buffer changed

-#import "/^tc/lib/clang/18/include/stdbool.h"
+#import "stdbool.h"

But the actual resolution of which header is used is identical.

I'm happy with this change; @jansvoboda11 were you looking at anything else or is this okay for @rmaz to land?

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming, Ben! I don't have other concerns, this should be good to land.

@rmaz
Copy link
Contributor Author

rmaz commented Oct 25, 2023

Thanks for the detailed reviews all, will get this rebased and landed.

@rmaz rmaz force-pushed the absbuiltinhdrpath branch from 5e3c631 to 87bbfac Compare October 25, 2023 19:02
When including builtin headers as part of a system module, serialize
them relative to the builtin include dir. To handle later lookup
add a method to provide the correct base directory. This makes it
possible to compile modules including builtin headers with
relative resource dirs.
@rmaz rmaz force-pushed the absbuiltinhdrpath branch from 87bbfac to cf4ae97 Compare October 26, 2023 14:41
@rmaz rmaz merged commit 396b562 into llvm:main Oct 27, 2023
@rmaz rmaz deleted the absbuiltinhdrpath branch October 27, 2023 20:17
facebook-github-bot pushed a commit to facebook/buck2-prelude that referenced this pull request Nov 6, 2023
Summary: Add a swift_toolchain attribute to identify if the toolchain supports relative resource directories in module output. This requires llvm/llvm-project#68023.

Reviewed By: milend

Differential Revision: D50943679

fbshipit-source-id: c944f3a5e3879dd91605ad9df008b5784bdfcda1
facebook-github-bot pushed a commit to facebook/ocamlrep that referenced this pull request Nov 6, 2023
Summary: Add a swift_toolchain attribute to identify if the toolchain supports relative resource directories in module output. This requires llvm/llvm-project#68023.

Reviewed By: milend

Differential Revision: D50943679

fbshipit-source-id: c944f3a5e3879dd91605ad9df008b5784bdfcda1
facebook-github-bot pushed a commit to facebook/buck2 that referenced this pull request Nov 6, 2023
Summary: Add a swift_toolchain attribute to identify if the toolchain supports relative resource directories in module output. This requires llvm/llvm-project#68023.

Reviewed By: milend

Differential Revision: D50943679

fbshipit-source-id: c944f3a5e3879dd91605ad9df008b5784bdfcda1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants