Skip to content

[LTO] Improve diagnostics handling when parsing module-level inline assembly #75726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 18, 2023

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Dec 17, 2023

Non-LTO compiles set the buffer name to ""
(AsmPrinter::addInlineAsmDiagBuffer) and pass diagnostics to
ClangDiagnosticHandler (through the MCContext handler in
MachineModuleInfoWrapperPass::doInitialization) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.

% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 error generated.
% clang -c -flto a.c; echo $?  # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0

CollectAsmSymbols parses inline assembly and is transitively called by
both ModuleSummaryIndexAnalysis::run and WriteBitcodeToFile, leading
to duplicate diagnostics.

This patch updates CollectAsmSymbols to be similar to non-LTO
compiles.

% clang -c -flto=thin a.c; echo $?
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 errors generated.
1

The HasErrors check does not prevent duplicate warnings but assembler
warnings are very uncommon.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir llvm:binary-utilities labels Dec 17, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 17, 2023

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-macho
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-clang

Author: Fangrui Song (MaskRay)

Changes

Non-LTO compiles set the buffer name to "<inline asm>"
(AsmPrinter::addInlineAsmDiagBuffer) and pass diagnostics to
ClangDiagnosticHandler (through the MCContext handler in
MachineModuleInfoWrapperPass::doInitialization) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.

% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
&lt;inline asm&gt;:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 error generated.
% clang -c -flto a.c; echo $?  # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0

CollectAsmSymbols parses inline assembly and is transitively called by
both ModuleSummaryIndexAnalysis::run and WriteBitcodeToFile, leading
to duplicate diagnostics.

This patch updates CollectAsmSymbols to be similar to non-LTO
compiles.

% clang -c -flto=thin a.c; echo $?
&lt;inline asm&gt;:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 errors generated.
1

The HasErrors check does not prevent duplicate warnings but assembler
warnings are very uncommon.


Full diff: https://github.com/llvm/llvm-project/pull/75726.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+2)
  • (modified) clang/test/CodeGen/invalid_global_asm.c (+5)
  • (modified) llvm/include/llvm/IR/DiagnosticHandler.h (+1)
  • (modified) llvm/lib/Object/ModuleSymbolTable.cpp (+15-1)
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index 753a8fd74fa696..4121a3709bc3af 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -418,6 +418,8 @@ void BackendConsumer::anchor() { }
 } // namespace clang
 
 bool ClangDiagnosticHandler::handleDiagnostics(const DiagnosticInfo &DI) {
+  if (DI.getSeverity() == DS_Error)
+    HasErrors = true;
   BackendCon->DiagnosticHandlerImpl(DI);
   return true;
 }
diff --git a/clang/test/CodeGen/invalid_global_asm.c b/clang/test/CodeGen/invalid_global_asm.c
index 5b7e8b43d752d8..d5645f7fc92bf4 100644
--- a/clang/test/CodeGen/invalid_global_asm.c
+++ b/clang/test/CodeGen/invalid_global_asm.c
@@ -1,5 +1,10 @@
 // REQUIRES: arm-registered-target
 // RUN: not %clang_cc1 -emit-obj -triple armv6-unknown-unknown -o %t %s 2>&1 | FileCheck %s
+// RUN: not %clang_cc1 -emit-obj -triple armv6-unknown-unknown -flto -o %t %s 2>&1 | FileCheck %s
+
+/// Test the diagnostic behavior considering the whole system including the driver.
+// RUN: not %clang --target=armv6-unknown-unknown -c -flto=thin -o %t %s 2>&1 | FileCheck %s
 #pragma clang diagnostic ignored "-Wmissing-noreturn"
 __asm__(".Lfoo: movw r2, #:lower16:.Lbar - .Lfoo");
 // CHECK: <inline asm>:1:8: error: instruction requires: armv6t2
+// CHECK-NOT: error:
diff --git a/llvm/include/llvm/IR/DiagnosticHandler.h b/llvm/include/llvm/IR/DiagnosticHandler.h
index 55e5e5975808d9..db7d7444f75f05 100644
--- a/llvm/include/llvm/IR/DiagnosticHandler.h
+++ b/llvm/include/llvm/IR/DiagnosticHandler.h
@@ -23,6 +23,7 @@ class DiagnosticInfo;
 /// which remarks are enabled.
 struct DiagnosticHandler {
   void *DiagnosticContext = nullptr;
+  bool HasErrors = false;
   DiagnosticHandler(void *DiagContext = nullptr)
       : DiagnosticContext(DiagContext) {}
   virtual ~DiagnosticHandler() = default;
diff --git a/llvm/lib/Object/ModuleSymbolTable.cpp b/llvm/lib/Object/ModuleSymbolTable.cpp
index ab073e18cb4668..07f76688fa43e7 100644
--- a/llvm/lib/Object/ModuleSymbolTable.cpp
+++ b/llvm/lib/Object/ModuleSymbolTable.cpp
@@ -16,6 +16,7 @@
 #include "RecordStreamer.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/GlobalValue.h"
@@ -68,6 +69,11 @@ void ModuleSymbolTable::addModule(Module *M) {
 static void
 initializeRecordStreamer(const Module &M,
                          function_ref<void(RecordStreamer &)> Init) {
+  // This function may be called twice, once for ModuleSummaryIndexAnalysis and
+  // the other when writing the IR symbol table. If parsing inline assembly has
+  // caused errors in the first run, suppress the second run.
+  if (M.getContext().getDiagHandlerPtr()->HasErrors)
+    return;
   StringRef InlineAsm = M.getModuleInlineAsm();
   if (InlineAsm.empty())
     return;
@@ -95,7 +101,8 @@ initializeRecordStreamer(const Module &M,
   if (!MCII)
     return;
 
-  std::unique_ptr<MemoryBuffer> Buffer(MemoryBuffer::getMemBuffer(InlineAsm));
+  std::unique_ptr<MemoryBuffer> Buffer(
+      MemoryBuffer::getMemBuffer(InlineAsm, "<inline asm>"));
   SourceMgr SrcMgr;
   SrcMgr.AddNewSourceBuffer(std::move(Buffer), SMLoc());
 
@@ -115,6 +122,13 @@ initializeRecordStreamer(const Module &M,
   if (!TAP)
     return;
 
+  MCCtx.setDiagnosticHandler([&](const SMDiagnostic &SMD, bool IsInlineAsm,
+                                 const SourceMgr &SrcMgr,
+                                 std::vector<const MDNode *> &LocInfos) {
+    M.getContext().diagnose(
+        DiagnosticInfoSrcMgr(SMD, M.getName(), IsInlineAsm, /*LocCookie=*/0));
+  });
+
   // Module-level inline asm is assumed to use At&t syntax (see
   // AsmPrinter::doInitialization()).
   Parser->setAssemblerDialect(InlineAsm::AD_ATT);

@llvmbot
Copy link
Member

llvmbot commented Dec 17, 2023

@llvm/pr-subscribers-llvm-binary-utilities

Author: Fangrui Song (MaskRay)

Changes

Non-LTO compiles set the buffer name to "<inline asm>"
(AsmPrinter::addInlineAsmDiagBuffer) and pass diagnostics to
ClangDiagnosticHandler (through the MCContext handler in
MachineModuleInfoWrapperPass::doInitialization) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.

% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
&lt;inline asm&gt;:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 error generated.
% clang -c -flto a.c; echo $?  # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0

CollectAsmSymbols parses inline assembly and is transitively called by
both ModuleSummaryIndexAnalysis::run and WriteBitcodeToFile, leading
to duplicate diagnostics.

This patch updates CollectAsmSymbols to be similar to non-LTO
compiles.

% clang -c -flto=thin a.c; echo $?
&lt;inline asm&gt;:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 errors generated.
1

The HasErrors check does not prevent duplicate warnings but assembler
warnings are very uncommon.


Full diff: https://github.com/llvm/llvm-project/pull/75726.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+2)
  • (modified) clang/test/CodeGen/invalid_global_asm.c (+5)
  • (modified) llvm/include/llvm/IR/DiagnosticHandler.h (+1)
  • (modified) llvm/lib/Object/ModuleSymbolTable.cpp (+15-1)
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index 753a8fd74fa696..4121a3709bc3af 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -418,6 +418,8 @@ void BackendConsumer::anchor() { }
 } // namespace clang
 
 bool ClangDiagnosticHandler::handleDiagnostics(const DiagnosticInfo &DI) {
+  if (DI.getSeverity() == DS_Error)
+    HasErrors = true;
   BackendCon->DiagnosticHandlerImpl(DI);
   return true;
 }
diff --git a/clang/test/CodeGen/invalid_global_asm.c b/clang/test/CodeGen/invalid_global_asm.c
index 5b7e8b43d752d8..d5645f7fc92bf4 100644
--- a/clang/test/CodeGen/invalid_global_asm.c
+++ b/clang/test/CodeGen/invalid_global_asm.c
@@ -1,5 +1,10 @@
 // REQUIRES: arm-registered-target
 // RUN: not %clang_cc1 -emit-obj -triple armv6-unknown-unknown -o %t %s 2>&1 | FileCheck %s
+// RUN: not %clang_cc1 -emit-obj -triple armv6-unknown-unknown -flto -o %t %s 2>&1 | FileCheck %s
+
+/// Test the diagnostic behavior considering the whole system including the driver.
+// RUN: not %clang --target=armv6-unknown-unknown -c -flto=thin -o %t %s 2>&1 | FileCheck %s
 #pragma clang diagnostic ignored "-Wmissing-noreturn"
 __asm__(".Lfoo: movw r2, #:lower16:.Lbar - .Lfoo");
 // CHECK: <inline asm>:1:8: error: instruction requires: armv6t2
+// CHECK-NOT: error:
diff --git a/llvm/include/llvm/IR/DiagnosticHandler.h b/llvm/include/llvm/IR/DiagnosticHandler.h
index 55e5e5975808d9..db7d7444f75f05 100644
--- a/llvm/include/llvm/IR/DiagnosticHandler.h
+++ b/llvm/include/llvm/IR/DiagnosticHandler.h
@@ -23,6 +23,7 @@ class DiagnosticInfo;
 /// which remarks are enabled.
 struct DiagnosticHandler {
   void *DiagnosticContext = nullptr;
+  bool HasErrors = false;
   DiagnosticHandler(void *DiagContext = nullptr)
       : DiagnosticContext(DiagContext) {}
   virtual ~DiagnosticHandler() = default;
diff --git a/llvm/lib/Object/ModuleSymbolTable.cpp b/llvm/lib/Object/ModuleSymbolTable.cpp
index ab073e18cb4668..07f76688fa43e7 100644
--- a/llvm/lib/Object/ModuleSymbolTable.cpp
+++ b/llvm/lib/Object/ModuleSymbolTable.cpp
@@ -16,6 +16,7 @@
 #include "RecordStreamer.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/GlobalValue.h"
@@ -68,6 +69,11 @@ void ModuleSymbolTable::addModule(Module *M) {
 static void
 initializeRecordStreamer(const Module &M,
                          function_ref<void(RecordStreamer &)> Init) {
+  // This function may be called twice, once for ModuleSummaryIndexAnalysis and
+  // the other when writing the IR symbol table. If parsing inline assembly has
+  // caused errors in the first run, suppress the second run.
+  if (M.getContext().getDiagHandlerPtr()->HasErrors)
+    return;
   StringRef InlineAsm = M.getModuleInlineAsm();
   if (InlineAsm.empty())
     return;
@@ -95,7 +101,8 @@ initializeRecordStreamer(const Module &M,
   if (!MCII)
     return;
 
-  std::unique_ptr<MemoryBuffer> Buffer(MemoryBuffer::getMemBuffer(InlineAsm));
+  std::unique_ptr<MemoryBuffer> Buffer(
+      MemoryBuffer::getMemBuffer(InlineAsm, "<inline asm>"));
   SourceMgr SrcMgr;
   SrcMgr.AddNewSourceBuffer(std::move(Buffer), SMLoc());
 
@@ -115,6 +122,13 @@ initializeRecordStreamer(const Module &M,
   if (!TAP)
     return;
 
+  MCCtx.setDiagnosticHandler([&](const SMDiagnostic &SMD, bool IsInlineAsm,
+                                 const SourceMgr &SrcMgr,
+                                 std::vector<const MDNode *> &LocInfos) {
+    M.getContext().diagnose(
+        DiagnosticInfoSrcMgr(SMD, M.getName(), IsInlineAsm, /*LocCookie=*/0));
+  });
+
   // Module-level inline asm is assumed to use At&t syntax (see
   // AsmPrinter::doInitialization()).
   Parser->setAssemblerDialect(InlineAsm::AD_ATT);

@yuanfang-chen
Copy link
Collaborator

MachO/lto-module-asm-err.ll needs update?

…ssembly

Non-LTO compiles set the buffer name to "<inline asm>"
(`AsmPrinter::addInlineAsmDiagBuffer`) and pass diagnostics to
`ClangDiagnosticHandler` (through the `MCContext` handler in
`MachineModuleInfoWrapperPass::doInitialization`) to ensure that
the exit code is 1 in the presence of errors. In contrast, LTO compiles
spuriously succeed even if error messages are printed.

```
% cat a.c
void _start() {}
asm("unknown instruction");
% clang -c a.c
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 error generated.
% clang -c -flto a.c; echo $?  # -flto=thin is the same
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
error: invalid instruction mnemonic 'unknown'
unknown instruction
^~~~~~~
0
```

`CollectAsmSymbols` parses inline assembly and is transitively called by
both `ModuleSummaryIndexAnalysis::run` and `WriteBitcodeToFile`, leading
to duplicate diagnostics.

This patch updates `CollectAsmSymbols` to be similar to non-LTO
compiles.
```
% clang -c -flto=thin a.c; echo $?
<inline asm>:1:1: error: invalid instruction mnemonic 'unknown'
    1 | unknown instruction
      | ^
1 errors generated.
1
```

The `HasErrors` check does not prevent duplicate warnings but assembler
warnings are very uncommon.
@MaskRay
Copy link
Member Author

MaskRay commented Dec 18, 2023

MachO/lto-module-asm-err.ll needs update?

Thanks for the reminder. Updated in 09f2f27

Copy link
Collaborator

@yuanfang-chen yuanfang-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaskRay MaskRay merged commit 96aca7c into llvm:main Dec 18, 2023
@MaskRay MaskRay deleted the lto-asm-diagnostics branch December 18, 2023 17:47
@@ -418,6 +418,8 @@ void BackendConsumer::anchor() { }
} // namespace clang

bool ClangDiagnosticHandler::handleDiagnostics(const DiagnosticInfo &DI) {
if (DI.getSeverity() == DS_Error)
HasErrors = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the way the "HasErrors" bit works... can we refactor so every diagnostic handler doesn't need its own code to set the HasErrors bits? The way this is written, it looks like every LLVM-based compiler needs some code for this, not just clang.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #75889 to improve this case a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category lld:MachO lld llvm:binary-utilities llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants