Skip to content

[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2025

Conversation

bd1976bris
Copy link
Collaborator

This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM.

DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (--thinlto-index-only), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure. In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.

This PR provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. It implements basic support for distributing backend ThinLTO compilation jobs via an external process that coordinates with a distribution system (such as Incredibuild). A JSON interface is used for communication with this external process. The JSON interface allows new distribution systems to be supported without modifying LLVM. Please see the documentation added in this PR: llvm/docs/dtlto.rst for more details.

Subsequent commits will add:

  • Support for the ThinLTO cache.
  • Support for more LTO configuration states e.g., basic block sections.
  • Performance improvements, for example, improving the performance of the temporary file removal.

RFC: Integrated Distributed ThinLTO RFC
For the design of the DTLTO feature, see: #126654.

@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:support llvm:transforms labels Feb 19, 2025
@llvmbot

This comment was marked as spam.

Copy link

github-actions bot commented Feb 19, 2025

✅ With the latest revision this PR passed the Python code formatter.

Copy link

github-actions bot commented Feb 19, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@bd1976bris
Copy link
Collaborator Author

I forced pushed an update, the code is equivalent, reasons for the force-push:

  1. To split up the change into multiple smaller steps. This should facilitate review.
  2. To apply clang-format to everything (I decided that getting the GitHub checks green was more important than maintaining the existing coding style).

@bd1976bris bd1976bris requested review from MaskRay and removed request for MaskRay February 19, 2025 22:45
@bd1976bris
Copy link
Collaborator Author

In offline review it was suggested that the distributor scripts could benefit from module docstrings and the preferred "with" statement idiom for opening files. I made those changes and also spotted some nits in the tests which I have corrected. I'm adding new commits rather than updating the original commits and force-pushing as I think that force-pushing might cause turbulence for reviewers. Please let me know if you would prefer force-pushing.

bd1976bris added a commit to bd1976bris/llvm-project that referenced this pull request Feb 25, 2025
- Remove linker version.

- Make AddBuffer a default parameter for LTO::run() to minimize changes
  at call sites.

- Format code with clang-format, even if it deviates slightly from the
  surrounding style. I decided that CI passing for the PRs was more
  important than matching the existing style.

- Order distributor command-line options correctly.

- Remove dead code in llvm-lto2.cpp.

- Add docstrings to Python distributors.

- Update the Python mock.py distributor to validate JSON.

- Organize LLVM DTLTO tests by moving them into a dedicated "dtlto"
  directory, tidying existing tests and adding new ones.

- Implement fix for more than one LTO partition (spotted by MaskRay).
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I have some comments and questions below. Thanks for splitting out the LLVM portions.

@@ -0,0 +1,3 @@
Tests for DTLTO (integrated distributed ThinLTO) functionality.

These are integration tests as DTLTO invokes `clang` for code-generation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of llvm-lto and opt here just to do testing before the clang and lld patches go in? If so, do we need cross project testing right now? Wondering if it would be better to just stick with the testing you have in this patch that is under llvm/test, and add the cross project testing using only external tools once those patches are in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test checks that the DTLTO code that creates Clang command lines generates the expected options and that those options are accepted by clang. I wrote it to use llvm-lto and opt so it could be part of this commit before the clang and lld patches go in. However, I think it is a nice principle to only use external tools in cross-project-tests. I have removed the test from this PR with the intention of adding it back later. Additionally if we can use your idea (#127749 (comment)) of storing the Clang -c step options and applying them to the remote backend compilations then this test will be rewritten to test that behavior instead.

contribute. Distributors can use this to label build jobs for informational
purposes.

- **Linker's version string**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw that this got removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I hadn't pulled in the documentation changes that were done on: #126654. Fixed now.

the same information that is emitted into import files for ThinLTO.

- The **input files** required for each job.
- The per-job set of files required for backend compilation, such as bitcode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the list of bitcode files include the bitcode files being imported from? If so, is the imports list described earlier needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON format has been simplified now. The imports field has been removed and there is now just an "inputs" field which contains all the inputs specific to a compilation job including the bitcode modules from which imports will be made. I have simplified the documentation to match. Thanks.

; RUN: rm -rf %t && split-file %s %t && cd %t

;; Generate bitcode files with a summary index.
; RUN: opt -thinlto-bc x86_64-unknown-linux-gnu.ll -o x86_64-unknown-linux-gnu.bc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use clang with input cc files instead of opt with input ll files for the cross project tests (related to my earlier comment about whether this should wait and just use external tools).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang can be used here. Thanks. However, I have removed this test for now. See earlier comment: #127749 (comment)

invocation is contributing to, to aid the user in identifying them:

- **JSON Job Description File**:
- Format: `dtlto.<UID>.dist-file.json`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is UID in this context? Should this be PID?

Copy link
Collaborator Author

@bd1976bris bd1976bris Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this to be PID now. Note that I expect the naming strategy to change in the future as we get feedback from more distribution systems. I have some concerns that mixing the PID into the filenames might prevent the caching systems for some distribution systems from working effectively.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have simplified the documentation now and this section has been removed. I don't want users relying on the naming scheme for temporary files which might be subject to change when I look at optimizing cache utilization.

RUN: not %{command} -save-temps 2>&1 \
RUN: | FileCheck %s --check-prefixes=ERR
RUN: ls | FileCheck %s --check-prefix=NOINDEXFILES
NOINDEXFILES-NOT: imports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant with earlier line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed the first check leaving only the -save-temps one.

# Check that imports files are created with -thinlto-emit-imports.
RUN: not %{command} -thinlto-emit-imports 2>&1 \
RUN: | FileCheck %s --check-prefixes=ERR
RUN: ls | FileCheck %s --check-prefix=INDEXFILES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above suggest s/INDEX/IMPORT/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@@ -97,6 +97,12 @@ static cl::opt<bool>
"specified with -thinlto-emit-indexes or "
"-thinlto-distributed-indexes"));

static cl::opt<bool> DTLTO("dtlto", cl::desc("Perform DTLTO"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than requiring a separate bool flag, consider checking whether dtlto-distributor was specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Done.


if __name__ == "__main__":
json_arg = sys.argv[-1]
distributor_args = sys.argv[1:-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest calling this input_file instead of distributor_args to be clearer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done now.

// shard, import list, etc..) to allow for the codegen to be performed
// externally . This backend's `wait` function then invokes an external
// distributor process to do backend compilations.
class OutOfProcessThinBackend : public CGThinBackend {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, how much does this share with the original InProcessThinBackend? I am wondering if the approach of having them derive from the same refactored CGThinBackend is useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very little. Just some state and a few lines which handle CfiFunctionDefs. It's still nice to share these parts though. Should I remove CGThinBackend? What is your preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend (which invokes emitFiles) - i.e. the latter could probably be refactored to use the new Job structure you have introduced here. It isn't clear to me which existing backend has more naturally in common with the new one. Do you eventually plan to share the Cache from the in process backend?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend

In the sense of using the Cache and also adding the generated object files to the link, it is similar to the in process backend. However, in the sense of primarily using emitFiles to generate the pre-job summary index files, it is similar to the write indexes backend.

Do you eventually plan to share the Cache from the in process backend?

Eventually we will add support for the Cache that is used in the in process backend. We have found that, in some cases, this is more performant than the distribution system caching.

bd1976bris added a commit to bd1976bris/llvm-project that referenced this pull request Mar 3, 2025
- Pull in documentation improvements.
@bd1976bris
Copy link
Collaborator Author

@teresajohnson currently you are only reviewer invited to this PR as I wanted to get your input before moving forward. Some of your comments e.g. #127749 (comment) are design level comments and there is also on-going design discussion on the design PR, I want to make sure that I'm not excluding other parties. Should I invite more reviewers here or discuss the issues you have raised on the other PR?

@teresajohnson
Copy link
Contributor

@teresajohnson currently you are only reviewer invited to this PR as I wanted to get your input before moving forward. Some of your comments e.g. #127749 (comment) are design level comments and there is also on-going design discussion on the design PR, I want to make sure that I'm not excluding other parties. Should I invite more reviewers here or discuss the issues you have raised on the other PR?

I'm fine with adding in other reviewers here, or for that comment in particular, to do continue discussion on the other PR. I'll try to answer that one here as well as on the other PR, as I have a slightly different perspective on that than maskray. I'm away at a conference so my responses will be slow this week.

bd1976bris added a commit to bd1976bris/llvm-project that referenced this pull request Mar 5, 2025
…lang and lld patches go in? If so, do we need cross project testing right now? Wondering if it would be better to just stick with the testing you have in this patch that is under llvm/test, and add the cross project testing using only external tools once those patches are in.

This test checks that the DTLTO code that the code that creates Clang command lines generates the expected options and that those options are accepted by clang. I wrote it to use llvm-lto and opt so it could be part of this commit before the clang and lld patches go in. However, I think it is a nice principle to only use external tools in cross-project-tests. I have removed the test from this PR with the intention of adding it back later. Additionally if we can use your idea (llvm#127749 (comment)) of storing the Clang `-c` step options and applying them to the remote backend compilations then this test will be rewritten to test that behaviour instead.
@bd1976bris
Copy link
Collaborator Author

I'm fine with adding in other reviewers here, or for that comment in particular, to do continue discussion on the other PR. I'll try to answer that one here as well as on the other PR, as I have a slightly different perspective on that than maskray. I'm away at a conference so my responses will be slow this week.

Great :) I have added a comment publicizing this PR (#127749) on the design PR (#126654) now and added relevant reviewers. Please see: #126654 (comment).

Copy link
Collaborator

@cachemeifyoucan cachemeifyoucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the bulk of the implementation yet. But start with some comments.

Limitations
-----------

The current implementation of DTLTO has the following limitations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be a TODO list since the items on the list are all doable, thus not limitation by design.

"primary_input": ["dtlto.o"],
"summary_index": ["dtlto.1.51232.native.o.thinlto.bc"],
"primary_output": ["dtlto.1.51232.native.o"],
"imports": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is helpful to show example with more than one job so there is an example for how cross reference works.

"-O3", "-fprofile-sample-use=my.profdata",
"-o", ["primary_output", 0],
"-c", "-x", "ir", ["primary_input", 0],
["summary_index", "-fthinlto-index=", 0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.

I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.

It's my own format, although it has been shown to have enough information to drive distribution via real-world distribution systems we have tested with Incredibuild and reclient+buildfarm (both of which are available to test for free).

I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.

I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?

I don't think that decoding is particularly difficult (see, for example: local.py.)

I'm not attached to the format of the JSON as long as it can do what we need. Here are the goals/constraints I came up with for a JSON interface:

  • Be as minimal as possible.
  • Don't use spaces in field names.
    • This allows for programmatic access, such as with member access syntax. e.g., .summary_index.
  • Allow the codegen tool to be replaced without a schema update.
  • Don't depend on platform-or-shell-specific shlexing of strings
    • Think about handling quotes, spaces, etc..
  • Support filenames appearing inside arguments
    • e.g., -fthinlto-index=summary.index.
  • Decouple LLVM from distributors to the largest degree possible.
    • This is to try to ensure that if the command line for the codegen tool changes, the distributor(s) shouldn't need updating.
  • Explicitly provide the primary input file, summary index files, and primary output file.
    • This is to help with supporting distribution systems such as FASTBuild, distcc, and Icecream, which lack sandboxing and might require the paths to be manipulated.

In the current format, the sub-arrays are used to represent references to per-job filenames so that a command line can be stored that doesn't contain filenames. This ensures that the contents of the command line remain transparent to the distributor, allowing another code-generation tool to replace Clang without updating the distributors. It also means that the distributor has access to the filenames (as they are stored separately) and can manipulate them, if required, without needing to change the command line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.

My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.

For the format, I don't have too much strings attached as well but here is what I think it can be improved:

  • for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.
  • Maybe it helps with a better example, I don't see what the last 0 in the array means. Does it ever be non-zero? Looking at local.py only makes me more confused. An example with more than 1 input will definitely help here.
  • It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?

If you ask me to design, I might do something like:

  • common section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.
  • job section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list of additionalOption objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.

Also if the equal flags like -fthinlto-index= is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.

Thanks for clarifying!

  • for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.

I understand this point, but I feel that expanding the references in C++ code is not taxing. I also don't think that any referencing scheme would lend itself to mapping directly to C++ data types.

  • Maybe it helps with a better example, I don't see what the last 0 in the array means. Does it ever be non-zero? Looking at local.py only makes me more confused. An example with more than 1 input will definitely help here.

It's just a flexible referencing scheme to try to minimise the likelihood of ever needing to change it.

The expansion rule is specified here: https://github.com/bd1976bris/llvm-project/blob/DTLTO_llvm_only/llvm/docs/DTLTO.rst#command-line-expansion-example.

For a reference like: ["additional_inputs", "--", 2, "=", 0, ":", 1] and a corresponding per job array like:

{
    ...
    "additional_inputs": ["first", "second", "third"]
}

Expansion yields: "--third=first:second"

I don't have a real-world use-case where this is important though. For the cases I have currently we only need to index into the first entry of the per-job arrays, and to handle the special case of equal flags like -fthinlto-index=. I just came up with something that should be able to handle any conceivable option.

I will improve the documentation with a case that uses a non-zero index. That is, assuming we don't change the way references work (see the other parts of this comment).

  • It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?

This isn't currently possible. If it was needed my plan to support this was just to add an "args" array to the per-job entries that would be handled in the same manner as the current "args" array in the "common" object.

If you ask me to design, I might do something like:

  • common section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.
  • job section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list of additionalOption objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.

Also if the equal flags like -fthinlto-index= is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.

Looks great 👍 I will rework along these lines. However, before drawing too much of your attention to the JSON format, can I first ask whether you (and other reviewers) are happy with a JSON interface? There are potentially some advantages to a plugin interface. I made a comment on this here: #126654 (comment).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cachemeifyoucan, I have explored several different JSON formats with different ways of referencing between filenames and the command line arguments. However, the ideas I tested made the JSON less understandable and the referencing less flexible than the initial JSON format discussed above.

I am now proposing a simpler format that eliminates references for filenames. Instead, it simply lists the input/output filenames and command-line arguments in arrays of strings. While this results in some string duplication, it only increases the JSON file size by 6% (compared to the initial scheme) in my tests on a Clang link. As well as simplicity and readability, an additional benefit is that consumers have less work to do when composing command lines.
With this format, modifying filenames programmatically would be more challenging for a consumer (distributor). We initially theorized this might be necessary for supporting certain distribution systems, but in my testing with SN-DBS, Incredibuild, and Reclient (see the description of #126654), it has not been required. I have no firm evidence that it will be a requirement in the future.

I have updated the implementation and documentation to use this simpler format. @teresajohnson and @MaskRay, I’d appreciate your input on this change. Of course, anyone is welcome to comment, but I’ve directly referenced those who have previously expressed opinions on the JSON format in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have a strong opinion and I am not intended to block the review with JSON format design.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cachemeifyoucan. Your comments on the JSON format were valuable and the new format makes this feature simpler and more approachable. Adding different options to different jobs is now possible as well.

@@ -426,6 +460,7 @@ class LTO {
// The bitcode modules to compile, if specified by the LTO Config.
std::optional<ModuleMapType> ModulesToCompile;
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
DenseMap<StringRef, std::string> ModuleTriples;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really hoping we have only one triple for the build because what does it even mean when you cross importing different triple. Will be good to document that you need to have per module triple to account for the versions field.

@@ -426,6 +460,7 @@ class LTO {
// The bitcode modules to compile, if specified by the LTO Config.
std::optional<ModuleMapType> ModulesToCompile;
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
DenseMap<StringRef, std::string> ModuleTriples;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we usually don't use DenseMap<StringRef> Maybe StringMap is better.

"thinlto-distributor-arg",
cl::desc("Additional arguments to pass to the ThinLTO distributor"));

cl::opt<std::string> ThinLTORemoteCompiler(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those cl::opt debugging option or the real option that user needs to pass in? I should avoid -mllvm option in production usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those cl::opt debugging option or the real option that user needs to pass in? I should avoid -mllvm option in production usage.

The use of -mllvm options avoids the need to implement a similar set of options for DTLTO in each LLD port (and also in llvm-lto2). See: #126654 (comment). I don't have a strong opinion though. Perhaps @MaskRay would like to comment on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ultimately the goal for options in production usage should be lld options. I'm not sure whether it is more churn to implement there from the get-go vs starting with cl::opt and transitioning later. At the least there should be a TODO here to convert these options

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove the cl::opt options and instead pass extra state when configuring the new backend. I will implement that in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that -mllvm options are considered internal and for production usage we should recommend proper driver and linker options. It's annoying though to update llvm-lto2 and every ELF port....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using -mllvm options was a nice reduction in the number of changes required in different tools, which was helpful to reduce the size of the initial PR. However, I think there is agreement now that proper options are required. Note that we don't have to update all the LLD ports at once. We can add DTLTO support to the different LLD ports incrementally. Perhaps we can start with just the ELF LLD port (chosen as it is the LLD port I'm most familiar with)?


#--- t1.ll

target triple = "x86_64-unknown-linux-gnu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer all the tests written with .ll suffix, and using bitcode comment style, rather than use .test suffix. It will get the correct syntax highlight so it is easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer all the tests written with .ll suffix, and using bitcode comment style, rather than use .test suffix. It will get the correct syntax highlight so it is easier to read.

Sounds good. Although, I suppose that an alternative POV is that it's not correct because these files are not .ll files. I don't mind doing this if you feel strongly about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a .ll file because every line of code that is not RUN/CHECK/comment is a llvm IR assembly. Even if we split file, we still name the combined file with the same suffix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll do this.

Copy link
Collaborator

@cachemeifyoucan cachemeifyoucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the approach looks fine to me.

@@ -426,6 +460,7 @@ class LTO {
// The bitcode modules to compile, if specified by the LTO Config.
std::optional<ModuleMapType> ModulesToCompile;
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
DenseMap<StringRef, std::string> ModuleTriples;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single triple from linker sounds good to me.

// distributor program.
// TODO: If this strategy of deriving options proves insufficient, alternative
// approaches should be considered, such as:
// - A serialization/deserialization format for LTO configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of storing command-line options, especially you want some bitcode compatibility (-cc1 flag has no compatibility). That means the object files and future static library support will require rev-locking, which might or might not be want you want.

I think most of the target options should be available in the bitcode as attributes, and those are not, we should add more of them into bitcode. Converting some options out from CodeGen option here is unfortunate but I can accept that. Hacking other options into LTO code generation is always just a hack and was never officially supported.

inconvertibleErrorCode());
}

for (auto &Job : Jobs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using the thread pool again here to read file in parallel?

Also why specifically pass AddBuffer function all the way here since AddStream pretty much does the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using the thread pool again here to read file in parallel?

My preference (but please feel free to disagree) would be to keep the code simple for the initial commit and then add optimizations later. It will be worthwhile eventually to try to load and delete these temporary files in as optimal a manner as possible. The optimizations I have included in this PR are those that require some sort of structural change, which brings us neatly onto...

Also why specifically pass AddBuffer function all the way here since AddStream pretty much does the same thing?

AddBuffer should be a more performant way to add a file from disk (versus AddStream). However, it's currently wrapped in the optional cache handling machinery and not exposed. Passing it through causes some turbulence, so I wanted to present this to reviewers to get comments. This hasn't worried any of the reviewers particularly - so I will revert to just using AddStream for the initial commit. The AddBuffer changes can be introduced later with performance analysis to justify them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done now.

return Error::success();
}

namespace {
class InProcessThinBackend : public ThinBackendProc {
class CGThinBackend : public ThinBackendProc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class needs a comment

CHECK-NEXT: 0
CHECK-NEXT: ]
CHECK: "--target=x86_64-unknown-linux-gnu"
CHECK: "-O2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...

The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json (which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.

Copy link
Collaborator Author

@bd1976bris bd1976bris Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...

This test is checking pretty-printed JSON. The JSON generated for the distributors is more compact.

The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json (which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.

I understand the point about the advantages of the "command" form. However, to aid consumers of the JSON I strongly want to avoid the complexities of quoting for different potential shells on various platforms. I could add a utility script to consume the JSON and print the shell quoted commands to make it easy to run the individual commands?

@romanova-ekaterina
Copy link
Contributor

As mentioned elsewhere on this thread, we have an alternative "no backend" design and implementation.

Please find it here: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

Here's a summary:

  1. Out of process (DTLTO) backend is not used in this implementation.
  2. Both regular and thin archive members are supported in a way that's minimally invasive to LTO. This is in contrast to the approach whereby a new backend is introduced, since archive support for DTLTO cannot be implemented within a backend. Support for (non-thin) archives would require changes to the linker or LTO.cpp.
  3. Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later.
  4. Footprint on lld and llvm/lto is very small.
  5. Plugins (shared libraries) are used in this implementation rather than external processes/python scripts.
  6. Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and thus allow only one-way communication with DBS. Plugins allow to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a prototype implemented for distcc distribution system that allows direct memory to memory interface, which we can share on request. Direct memory to memory interface could not be implemented in one-way external process interface.
  7. Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. Plugin interface could be extended to integrate more closely with the DBS, allowing do more advanced things like getting information about internal state of DBS, allowing to do things like load balancing.

If there are aspects that people prefer about this approach, we can consider bringing them in to this PR. Or if the "no backend" design is a preferred starting point, we can turn it into a Pull Request and take across parts of this one.

@bd1976bris
Copy link
Collaborator Author

Thanks for the review comments so far. This PR has now been updated to use a simpler JSON format. I have had to rework the documentation for that simpler format, and in the process I have addressed all the documentation suggestions that were made either here or on the design PR. The documentation and indeed the implementation is much simplified, and I believe all outstanding review comments are now addressed. Please take another look.

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

@teresajohnson
Copy link
Contributor

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

I'm OOO but will wrap up the review asap probably early next week.

@kbelochapka
Copy link
Collaborator

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

@teresajohnson , @MaskRay , @cachemeifyoucan ,

Could you please look at the alternative implementation of DTLTO (with no DTLTO backend and with plugins) and compare it with the DTLTO implementation described in this PR (with out-of-process DTLTO backend and external processes/python scripts) before approving this PR?

Here is the link to this branch:
https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

It's a fully working DTLTO implementation, although it has been minimized to make it suitable for a code review. We spent considerable time working on this project, but we haven't received any comments/feedback about it. The link to this branch was posted as a comment in this PR, but might have been lost between hundreds of other comments.

Katya Romanova (@romanova-ekaterina) already described the main ideas behind the alternative implementation and the difference between both implementations in one of the comments in the PR, but I will re-iterate it again:

(1) Very little footprint on lld and llvm sources.
(2) Easy to use, only one command line option to control –distribute=
(3) Both regular and thin archive members are fully supported. This contrasts with the DTLTO backend solution, since archive support for DTLTO cannot be implemented within a LTO backend.
(4) Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later.
(5) Plugins (shared libraries) are used in this implementation rather than external processes/python scripts. Plugin that supports external scripts was implemented but was not included in the initial review. Plugins can be statically linked into a lld executable. We have several plugins implemented that support most popular distributed build systems – DistCC, IceCream, FastBuild, Icredibuild, Re-Client, HomeCC
(6) Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and allow only one-way communication. For example, one if the advantages of using plugins is the ability to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a plugin prototype implemented for DistCC distribution system which implements direct memory to memory interface with DistCC client. (we can share it on request). Direct memory to memory interface could not be implemented in one-way external process interface.
(7) Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. The Plugin interface could be integrated more closely with a DBS, allowing do more advanced compilation jobs scheduling based on current distributed build system state, which in turn would improve the load balancing.

@teresajohnson
Copy link
Contributor

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

@teresajohnson , @MaskRay , @cachemeifyoucan ,

Could you please look at the alternative implementation of DTLTO (with no DTLTO backend and with plugins) and compare it with the DTLTO implementation described in this PR (with out-of-process DTLTO backend and external processes/python scripts) before approving this PR?

Here is the link to this branch: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

It's a fully working DTLTO implementation, although it has been minimized to make it suitable for a code review. We spent considerable time working on this project, but we haven't received any comments/feedback about it. The link to this branch was posted as a comment in this PR, but might have been lost between hundreds of other comments.

Katya Romanova (@romanova-ekaterina) already described the main ideas behind the alternative implementation and the difference between both implementations in one of the comments in the PR, but I will re-iterate it again:

(1) Very little footprint on lld and llvm sources. (2) Easy to use, only one command line option to control –distribute= (3) Both regular and thin archive members are fully supported. This contrasts with the DTLTO backend solution, since archive support for DTLTO cannot be implemented within a LTO backend. (4) Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later. (5) Plugins (shared libraries) are used in this implementation rather than external processes/python scripts. Plugin that supports external scripts was implemented but was not included in the initial review. Plugins can be statically linked into a lld executable. We have several plugins implemented that support most popular distributed build systems – DistCC, IceCream, FastBuild, Icredibuild, Re-Client, HomeCC (6) Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and allow only one-way communication. For example, one if the advantages of using plugins is the ability to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a plugin prototype implemented for DistCC distribution system which implements direct memory to memory interface with DistCC client. (we can share it on request). Direct memory to memory interface could not be implemented in one-way external process interface. (7) Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. The Plugin interface could be integrated more closely with a DBS, allowing do more advanced compilation jobs scheduling based on current distributed build system state, which in turn would improve the load balancing.

This approach does simplify the lld and LTO side, but I'd like to get thoughts from @MaskRay, as this changes the interfacing with lld. The downside is that it then also requires support for a larger new DTLTO library (it looks like the original patch only contains a Local plugin - do each of the other distributed build systems need to replicate a lot of that functionality?).

@tru
Copy link
Collaborator

tru commented Apr 9, 2025

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

I am happy for this PR to move forward at this point. We probably need some changes in order to be more cache friendly, but I think that can come later.

@romanova-ekaterina
Copy link
Contributor

romanova-ekaterina commented Apr 10, 2025

Hi Teresa,

This approach does simplify the lld and LTO side, but I'd like to get thoughts from @MaskRay, as this changes the interfacing with lld. The downside is that it then also requires support for a larger new DTLTO library (it looks like the original patch only contains a Local plugin - do each of the other distributed build systems need to replicate a lot of that functionality?).

No, replication of the functionality will not be needed for plugins in general.
All common functionality will be in DTLTO static library 
(DTLTO.lib) which will be statically linked with lld.

Plugins do not need to be linked with DTLTO static library. 
Plugins could be implemented in any programming language that allow creating shared libraries.

@romanova-ekaterina
Copy link
Contributor

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

One thing might worth mentioning about two different solutions.
The first solution has DTLTO backend, which generates JSON file and passes it 
to external process (e.g. python script). 
The second solution doesn't have a DTLTO backend. It's basically a tiny layer between the linker and DBS-specific plugin, which calls the plugin's exported functions.

However, we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" than invokes external process.

@bd1976bris
Copy link
Collaborator Author

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

I am happy for this PR to move forward at this point. We probably need some changes in order to be more cache friendly, but I think that can come later.

While working with Tobias, I back-ported this PR to LLVM 19 and wired up the Clang and LLD side changes.

In case it helps anyone else, that can be found here: https://github.com/bd1976bris/llvm-project/commits/dtlto_llvm19/.

@cachemeifyoucan
Copy link
Collaborator

I don't have a strong opinion for either implementation. If I have to pick one, I like this PR better. I feel like the alternative will require keeping more implementation details of the build system in tree of LLVM. That is not a problem for itself but I don't know how stable those APIs and implementation details are, so it might be a problem for integration.

On the other side, I see this JSON interface from this PR is very simple and stable (maybe we should add a version in there just to be safe). I don't think the LTO interface is bloated after the change and it is not stable anyway. Thus slightly prefer this patch.

@romanova-ekaterina
Copy link
Contributor

I don't have a strong opinion for either implementation. If I have to pick one, I like this PR better. I feel like the alternative will require keeping more implementation details of the build system in tree of LLVM. That is not a problem for itself but I don't know how stable those APIs and implementation details are, so it might be a problem for integration.

On the other side, I see this JSON interface from this PR is very simple and stable (maybe we should add a version in there just to be safe). I don't think the LTO interface is bloated after the change and it is not stable anyway. Thus slightly prefer this patch.

Actually, neither of the solutions will keep implementation details of the distribution build systems in the tree of LLVM.

The plugins or external processes will either be in a separate GitHub repository or will be placed in a repository for particular distribution build systems (e.g. the DTLTO-related plugin for distcc will be added to the distcc repository). The location will be discussed with upstream and decided.

Also, it's worth noting that both interfaces could be supported at the same time in DTLTO: external processes interface for those who prefer simplicity or plugin interface for those who need performance. To understand the performance advantage that the plugins interface could provide, please refer to items (5) and (6) in the comment that kbelochapka provided:
#127749 (comment)

Or we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" (from alternative solution) that invokes external process (from main solution) or vice versa.

The main decision we are looking for reviewers' input on is whether they prefer having the DTLTO code in a ThinLTO backend (main solution) or not in a backend (alternative solution). To understand the main differences, please refer to items (1), (2), and (3) in the comment that kbelochapka provided:
#127749 (comment)

@bd1976bris
Copy link
Collaborator Author

I would like to summarize the state of this PR as I understand it. Firstly, some reviewers have indicated that this PR is acceptable in its current state. However, there is an alternative proposal: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend. We somehow need to come to a conclusion, and I think an appeal to authority (fallacy or not) is the way :)

@MaskRay – can I please ask you to reply to @teresajohnson's comment: #127749 (comment). @tru has created a distributor that works with his organization's fork of FASTBuild, using the JSON interface. If you prefer the no-backend approach, then the JSON interface from this PR could be merged into that approach.

Thanks.

@bd1976bris
Copy link
Collaborator Author

ping.

@teresajohnson
Copy link
Contributor

teresajohnson commented May 16, 2025

I would like to summarize the state of this PR as I understand it. Firstly, some reviewers have indicated that this PR is acceptable in its current state. However, there is an alternative proposal: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend. We somehow need to come to a conclusion, and I think an appeal to authority (fallacy or not) is the way :)

@MaskRay – can I please ask you to reply to @teresajohnson's comment: #127749 (comment). @tru has created a distributor that works with his organization's fork of FASTBuild, using the JSON interface. If you prefer the no-backend approach, then the JSON interface from this PR could be merged into that approach.

Thanks.

ping @MaskRay. But overall, this is not a lot of code in LTO, and like @cachemeifyoucan I have a slight preference for this one. It is good to hear that it has already been adopted successfully by another user.

I just went through again and have a few comments, but overall it is in good shape.

StringRef ModuleID;
StringRef NativeObjectPath;
StringRef SummaryIndexPath;
ImportsFilesContainer ImportFiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: make the spelling consistent, probably "ImportsFiles" everywhere (plural Imports).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

// Load the native object from a file into a memory buffer
// and store its contents in the output buffer.
auto ObjFileMbOrErr =
MemoryBuffer::getFile(Job.NativeObjectPath, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document constant args

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

module being compiled.
- The second entry in each job's ``inputs`` array is the corresponding
individual summary index file.
- The first entry in each job's ``outputs`` array is the primary output object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second outputs file, can there be more than 2 outputs, and how are they ordered?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section documents the reserved entries for these arrays. Only the first entry is reserved and specifies the primary output file. There is no guaranteed order for the other output files.

The primary output file is a reserved entry because some distribution systems make use of this path - e.g., to provide a meaningful user label for compilation jobs. Provision has also been made for additional output files, as some distribution systems only transport explicitly specified files back to the host machine.

The current DTLTO code does not use more than one output file. However, in the future, if LTO options are supported that imply additional output files, those files will be added to this array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just note some of this in a comment (specifically about only the first entry being a reserved name, and the part from the last paragraph above about supporting LTO options implying additional output files).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@@ -1443,6 +1462,23 @@ class InProcessThinBackend : public ThinBackendProc {
CfiFunctionDecls.insert(
GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));
}
};

class InProcessThinBackend : public CGThinBackend {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to this class for completeness?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

ModulePath,
Saver.save(ObjFilePath.str()),
Saver.save(ObjFilePath.str() + ".thinlto.bc"),
{}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this is filled in by emitFiles below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

J = {Task,
ModulePath,
Saver.save(ObjFilePath.str()),
Saver.save(ObjFilePath.str() + ".thinlto.bc"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other invocations write the summary index to the ModulePath + ".thinlto.bc" - is there a reason for making this different? Thinking the emitFiles interfaces could be commoned a bit more if they were the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DTLTO, we need to use unique paths for these files so that two links that both use the same bitcode module can't collide. We want this to be handled automatically - without the user having to specify additional linking options. Can I ask how Bazel handles this situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the lto::getThinLTOOutputFile function (guided by the --thinlto-prefix-replace= flag) to change part of the original module path prefix to a different path prefix, so that the output files are put into a directory subtree within a target-specific output directory. The native objects are eventually put there too, alongside their associated .thinlto.bc summary index files, by the clang LTO backend invocations.

I guess here it is a little different because these are intermediate files that the user doesn't know or care about, so it shouldn't matter that this code doesn't support --thinlto-prefix-replace=, for example.

// Write sharded indices to SummaryPath, (optionally) imports to disk, and
// (optionally) record imports in ImportsFiles.
Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
StringRef ModulePath, StringRef SummaryPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: switch the order of SummaryPath and NewModulePath args to make it closer to the other interface and easier to compare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

{}};

assert(ModuleToDefinedGVSummaries.count(ModulePath));
BackendThreadPool.async(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just document that here.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

module being compiled.
- The second entry in each job's ``inputs`` array is the corresponding
individual summary index file.
- The first entry in each job's ``outputs`` array is the primary output object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just note some of this in a comment (specifically about only the first entry being a reserved name, and the part from the last paragraph above about supporting LTO options implying additional output files).

J = {Task,
ModulePath,
Saver.save(ObjFilePath.str()),
Saver.save(ObjFilePath.str() + ".thinlto.bc"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the lto::getThinLTOOutputFile function (guided by the --thinlto-prefix-replace= flag) to change part of the original module path prefix to a different path prefix, so that the output files are put into a directory subtree within a target-specific output directory. The native objects are eventually put there too, alongside their associated .thinlto.bc summary index files, by the clang LTO backend invocations.

I guess here it is a little different because these are intermediate files that the user doesn't know or care about, so it shouldn't matter that this code doesn't support --thinlto-prefix-replace=, for example.

This patch introduces initial support for Integrated Distributed
ThinLTO (DTLTO) in LLVM.

DTLTO enables distribution of backend ThinLTO compilation jobs via
external distribution systems, such as Incredibuild. The existing
support for distributing ThinLTO compilations uses separate thin-link
(--thinlto-index-only), backend compilation, and link steps coordinated
by a modern build system, like Bazel. This "Bazel-style" distributed
ThinLTO requires a modern build system to manage dynamically discovered
dependencies specified in the summary index file shards. However,
adopting a modern build system can be prohibitive for users with
established build infrastructure. In contrast, DTLTO manages
distribution within LLVM during the traditional link step. This
approach allows DTLTO to work with any build process that supports
in-process ThinLTO.

This patch provides a minimal but functional implementation of DTLTO,
which will be expanded in subsequent commits. It adds support for
delegating the backend ThinLTO jobs to an external process (the
distributor), which in turn coordinates distribution through a system
such as Incredibuild. A JSON interface is used for communication with
the distributor. The JSON interface allows new distribution systems to
be supported without modifying LLVM. Please see the documentation added
in this patch: llvm/docs/dtlto.rst for more details.

Subsequent LLVM commits will add:
- Support for the ThinLTO cache.
- Support for more LTO configuration states, e.g., basic block sections.
- Performance improvements, such as more efficient removal of temporary
  files.

RFC: Integrated Distributed ThinLTO RFC
For the design of the DTLTO feature, see: llvm#126654.
@MaskRay
Copy link
Member

MaskRay commented May 23, 2025

Sorry, I don’t have the bandwidth to dive into this patch again in detail, but from a quick look, it seems solid!

@bd1976bris bd1976bris merged commit 6520b21 into llvm:main May 23, 2025
10 of 12 checks passed
@bd1976bris
Copy link
Collaborator Author

Thanks to everyone for the review effort here.

I’m currently on PTO this week, but once I’m back, I’ll post the Clang and LLD patches for review.

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
This patch adds initial support for Integrated Distributed ThinLTO
(DTLTO) in LLVM, which manages distribution internally during the
traditional link step. This enables compatibility with any build
system that supports in-process ThinLTO. In contrast, existing
approaches to distributed ThinLTO, which split the thin-link
(--thinlto-index-only), backend compilation, and final link into
separate steps, require build system support, e.g. Bazel.

This patch implements the core DTLTO mechanism, which enables
delegation of ThinLTO backend jobs to an external process (the
distributor). The distributor can then manage job distribution through
systems like Incredibuild. A generic JSON interface is used to
communicate with the distributor, allowing for the creation of new
distributors (and thus integration with different distribution
systems) without modifying LLVM.

Please see llvm/docs/dtlto.rst for more details.

RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641
Design Review: llvm#126654
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:support llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants