Skip to content

[LinkerWrapper] Add an overriding option for debugging #91984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions clang/docs/ClangLinkerWrapper.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ only for the linker wrapper will be forwarded to the wrapped linker job.
-l <libname> Search for library <libname>
--opt-level=<O0, O1, O2, or O3>
Optimization level for LTO
--override-image=<kind=file>
Uses the provided file as if it were the output of the device link step
-o <path> Path to file to write output
--pass-remarks-analysis=<value>
Pass remarks for LTO
Expand Down Expand Up @@ -87,6 +89,42 @@ other. Generally, this requires that the target triple and architecture match.
An exception is made when the architecture is listed as ``generic``, which will
cause it be linked with any other device code with the same target triple.

Debugging
=========

The linker wrapper performs a lot of steps internally, such as input matching,
symbol resolution, and image registration. This makes it difficult to debug in
some scenarios. The behavior of the linker-wrapper is controlled mostly through
metadata, described in `clang documentation
<https://clang.llvm.org/docs/OffloadingDesign.html>`_. Intermediate output can
be obtained from the linker-wrapper using the ``--save-temps`` flag. These files
can then be modified.

.. code-block:: sh

$> clang openmp.c -fopenmp --offload-arch=gfx90a -c
$> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps
$> ; Modify temp files.
$> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o

Doing this will allow you to override one of the input files by replacing its
embedded offloading metadata with a user-modified version. However, this will be
more difficult when there are multiple input files. For a very large hammer, the
``--override-image=<kind>=<file>`` flag can be used.

In the following example, we use the ``--save-temps`` to obtain the LLVM-IR just
before running the backend. We then modify it to test altered behavior, and then
compile it to a binary. This can then be passed to the linker-wrapper which will
then ignore all embedded metadata and use the provided image as if it were the
result of the device linking phase.

.. code-block:: sh

$> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps
$> ; Modify temp files.
$> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out
$> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--override-image=openmp=a.out

Example
=======

Expand Down
7 changes: 7 additions & 0 deletions clang/test/Driver/linker-wrapper.c
Original file line number Diff line number Diff line change
Expand Up @@ -226,3 +226,10 @@ __attribute__((visibility("protected"), used)) int x;
// RELOCATABLE-LINK-CUDA: fatbinary{{.*}} -64 --create {{.*}}.fatbin --image=profile=sm_89,file={{.*}}.img
// RELOCATABLE-LINK-CUDA: /usr/bin/ld.lld{{.*}}-r
// RELOCATABLE-LINK-CUDA: llvm-objcopy{{.*}}a.out --remove-section .llvm.offloading

// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld --override=image=openmp=%t.o %t.o -o a.out 2>&1 \
// RUN: | FileCheck %s --check-prefix=OVERRIDE
// OVERRIDE-NOT: clang
// OVERRIDE: /usr/bin/ld
43 changes: 43 additions & 0 deletions clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1149,6 +1149,39 @@ DerivedArgList getLinkerArgs(ArrayRef<OffloadFile> Input,
return DAL;
}

Error handleOverrideImages(
const InputArgList &Args,
DenseMap<OffloadKind, SmallVector<OffloadingImage>> &Images) {
for (StringRef Arg : Args.getAllArgValues(OPT_override_image)) {
OffloadKind Kind = getOffloadKind(Arg.split("=").first);
StringRef Filename = Arg.split("=").second;

ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr =
MemoryBuffer::getFileOrSTDIN(Filename);
if (std::error_code EC = BufferOrErr.getError())
return createFileError(Filename, EC);

Expected<std::unique_ptr<ObjectFile>> ElfOrErr =
ObjectFile::createELFObjectFile(**BufferOrErr,
/*InitContent=*/false);
if (!ElfOrErr)
return ElfOrErr.takeError();
ObjectFile &Elf = **ElfOrErr;

OffloadingImage TheImage{};
TheImage.TheImageKind = IMG_Object;
TheImage.TheOffloadKind = Kind;
TheImage.StringData["triple"] =
Args.MakeArgString(Elf.makeTriple().getTriple());
if (std::optional<StringRef> CPU = Elf.tryGetCPUName())
TheImage.StringData["arch"] = Args.MakeArgString(*CPU);
TheImage.Image = std::move(*BufferOrErr);

Images[Kind].emplace_back(std::move(TheImage));
}
return Error::success();
}

/// Transforms all the extracted offloading input files into an image that can
/// be registered by the runtime.
Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
Expand All @@ -1158,6 +1191,12 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(

std::mutex ImageMtx;
DenseMap<OffloadKind, SmallVector<OffloadingImage>> Images;

// Initialize the images with any overriding inputs.
if (Args.hasArg(OPT_override_image))
if (Error Err = handleOverrideImages(Args, Images))
return Err;

auto Err = parallelForEachError(LinkerInputFiles, [&](auto &Input) -> Error {
llvm::TimeTraceScope TimeScope("Link device input");

Expand Down Expand Up @@ -1439,6 +1478,10 @@ Expected<SmallVector<SmallVector<OffloadFile>>>
getDeviceInput(const ArgList &Args) {
llvm::TimeTraceScope TimeScope("ExtractDeviceCode");

// Skip all the input if the user is overriding the output.
if (Args.hasArg(OPT_override_image))
return SmallVector<SmallVector<OffloadFile>>();

StringRef Root = Args.getLastArgValue(OPT_sysroot_EQ);
SmallVector<StringRef> LibraryPaths;
for (const opt::Arg *Arg : Args.filtered(OPT_library_path, OPT_libpath))
Expand Down
4 changes: 4 additions & 0 deletions clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ def wrapper_jobs : Joined<["--"], "wrapper-jobs=">,
Flags<[WrapperOnlyOption]>, MetaVarName<"<number>">,
HelpText<"Sets the number of parallel jobs to use for device linking">;

def override_image : Joined<["--"], "override-image=">,
Flags<[WrapperOnlyOption]>, MetaVarName<"<kind=file>">,
HelpText<"Uses the provided file as if it were the output of the device link step">;

// Flags passed to the device linker.
def arch_EQ : Joined<["--"], "arch=">,
Flags<[DeviceOnlyOption, HelpHidden]>, MetaVarName<"<arch>">,
Expand Down
Loading