Skip to content

[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info #143452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 11, 2025

Conversation

jmorse
Copy link
Member

@jmorse jmorse commented Jun 9, 2025

By chance, two things have prevented the autoupgrade path being exercised much so far:

  • LLParser setting the debug-info mode to "old" on seeing intrinsics,
  • The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info block.

In practice, this appears to mean this code path hasn't seen the various invalid inputs that can come its way. This commit does a number of things:

  • Tolerates the various illegal inputs that can be written with debug-intrinsics, and that must be tolerated until the Verifier runs,
  • Printing illegal/null DbgRecord fields must succeed,
  • Verifier errors need to localise the function/block where the error is,
  • Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes I found. There are also two edge cases:

  • Some of the unit tests switch back and forth between intrinsic and record modes at will; I've deleted coverage and some assertions to tolerate this as intrinsic support is now Gone (TM),
  • In sroa-extract-bits.ll, the order of debug records flips. This is because the autoupgrader upgrades in the opposite order to the basic block conversion routines... which doesn't change the record order, but does change the use list order in Metadata! This should (TM) have no consequence to the correctness of LLVM, but will change the order of various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but they're all intertwined, sorry.

By chance, two things have prevented the autoupgrade path being exercised
much so far:
 * LLParser setting the debug-info mode to "old" on seeing intrinsics,
 * The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
   block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of things:
 * Tolerates the various illegal inputs that can be written with
   debug-intrinsics, and that must be tolerated until the Verifier runs,
 * Printing illegal/null DbgRecord fields must succeed,
 * Verifier errors need to localise the function/block where the error is,
 * Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes I
found. There are also two edge cases:
 * Some of the unit tests switch back and forth between intrinsic and
   record modes at will; I've deleted coverage and some assertions to
   tolerate this as intrinsic support is now Gone (TM),
 * In sroa-extract-bits.ll, the order of debug records flips. This is
   because the autoupgrader upgrades in the opposite order to the basic
   block conversion routines... which doesn't change the record order, but
   _does_ change the use list order in Metadata! This should (TM) have no
   consequence to the correctness of LLVM, but will change the order of
   various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.
@llvmbot
Copy link
Member

llvmbot commented Jun 9, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-debuginfo

Author: Jeremy Morse (jmorse)

Changes

By chance, two things have prevented the autoupgrade path being exercised much so far:

  • LLParser setting the debug-info mode to "old" on seeing intrinsics,
  • The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info block.

In practice, this appears to mean this code path hasn't seen the various invalid inputs that can come its way. This commit does a number of things:

  • Tolerates the various illegal inputs that can be written with debug-intrinsics, and that must be tolerated until the Verifier runs,
  • Printing illegal/null DbgRecord fields must succeed,
  • Verifier errors need to localise the function/block where the error is,
  • Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes I found. There are also two edge cases:

  • Some of the unit tests switch back and forth between intrinsic and record modes at will; I've deleted coverage and some assertions to tolerate this as intrinsic support is now Gone (TM),
  • In sroa-extract-bits.ll, the order of debug records flips. This is because the autoupgrader upgrades in the opposite order to the basic block conversion routines... which doesn't change the record order, but does change the use list order in Metadata! This should (TM) have no consequence to the correctness of LLVM, but will change the order of various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but they're all intertwined, sorry.


Patch is 46.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143452.diff

21 Files Affected:

  • (modified) llvm/lib/AsmParser/LLParser.cpp (-2)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+26-10)
  • (modified) llvm/lib/IR/AutoUpgrade.cpp (+21-8)
  • (modified) llvm/lib/IR/BasicBlock.cpp (-4)
  • (modified) llvm/lib/IR/Verifier.cpp (+12-12)
  • (modified) llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll (+5-1)
  • (modified) llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll (+9-9)
  • (modified) llvm/test/DebugInfo/Generic/sroa-extract-bits.ll (+14-14)
  • (modified) llvm/test/Transforms/IROutliner/outlining-debug-statements.ll (+2-1)
  • (modified) llvm/test/Transforms/ObjCARC/code-motion.ll (+8-5)
  • (added) llvm/test/Verifier/RemoveDI/invalid-dbg-declare-operands.ll (+80)
  • (added) llvm/test/Verifier/dbg-declare-invalid-debug-loc.ll (+113)
  • (modified) llvm/test/Verifier/diexpression-entry-value-llvm-ir.ll (+3-3)
  • (modified) llvm/test/Verifier/llvm.dbg.declare-address.ll (+2-2)
  • (modified) llvm/test/Verifier/llvm.dbg.declare-expression.ll (+2-2)
  • (modified) llvm/test/Verifier/llvm.dbg.declare-variable.ll (+2-2)
  • (modified) llvm/test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll (+8-8)
  • (modified) llvm/test/Verifier/llvm.dbg.value-expression.ll (+2-2)
  • (modified) llvm/test/Verifier/llvm.dbg.value-value.ll (+2-2)
  • (modified) llvm/test/Verifier/llvm.dbg.value-variable.ll (+2-2)
  • (modified) llvm/unittests/IR/DebugInfoTest.cpp (-13)
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index b933d240c4d27..5c007dcf00224 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -8336,8 +8336,6 @@ bool LLParser::parseCall(Instruction *&Inst, PerFunctionState &PFS,
       return error(CallLoc, "llvm.dbg intrinsic should not appear in a module "
                             "using non-intrinsic debug info");
     }
-    if (!SeenOldDbgInfoFormat)
-      M->setNewDbgInfoFormatFlag(false);
     SeenOldDbgInfoFormat = true;
   }
   CI->setAttributes(PAL);
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 7223dd845d18d..598a8e141b1dd 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -1204,6 +1204,10 @@ void SlotTracker::processFunctionMetadata(const Function &F) {
 }
 
 void SlotTracker::processDbgRecordMetadata(const DbgRecord &DR) {
+  // Tolerate null metadata pointers: it's a completely illegal debug record,
+  // but we can have faulty metadata from debug-intrinsic days being
+  // autoupgraded into debug records. This gets caught by the verifier, which
+  // then will print the faulty IR, hitting this code path.
   if (const DbgVariableRecord *DVR = dyn_cast<const DbgVariableRecord>(&DR)) {
     // Process metadata used by DbgRecords; we only specifically care about the
     // DILocalVariable, DILocation, and DIAssignID fields, as the Value and
@@ -1211,9 +1215,11 @@ void SlotTracker::processDbgRecordMetadata(const DbgRecord &DR) {
     // Note: The above doesn't apply for empty-metadata operands.
     if (auto *Empty = dyn_cast<MDNode>(DVR->getRawLocation()))
       CreateMetadataSlot(Empty);
-    CreateMetadataSlot(DVR->getRawVariable());
+    if (DVR->getRawVariable())
+      CreateMetadataSlot(DVR->getRawVariable());
     if (DVR->isDbgAssign()) {
-      CreateMetadataSlot(cast<MDNode>(DVR->getRawAssignID()));
+      if (auto *AssignID = DVR->getRawAssignID())
+        CreateMetadataSlot(cast<MDNode>(AssignID));
       if (auto *Empty = dyn_cast<MDNode>(DVR->getRawAddress()))
         CreateMetadataSlot(Empty);
     }
@@ -1222,7 +1228,8 @@ void SlotTracker::processDbgRecordMetadata(const DbgRecord &DR) {
   } else {
     llvm_unreachable("unsupported DbgRecord kind");
   }
-  CreateMetadataSlot(DR.getDebugLoc().getAsMDNode());
+  if (DR.getDebugLoc())
+    CreateMetadataSlot(DR.getDebugLoc().getAsMDNode());
 }
 
 void SlotTracker::processInstructionMetadata(const Instruction &I) {
@@ -4867,22 +4874,31 @@ void AssemblyWriter::printDbgVariableRecord(const DbgVariableRecord &DVR) {
     llvm_unreachable(
         "Tried to print a DbgVariableRecord with an invalid LocationType!");
   }
+
+  auto PrintOrNull = [&](Metadata *M) {
+    if (!M) {
+      Out << "(null)";
+    } else {
+      WriteAsOperandInternal(Out, M, WriterCtx, true);
+    }
+  };
+
   Out << "(";
-  WriteAsOperandInternal(Out, DVR.getRawLocation(), WriterCtx, true);
+  PrintOrNull(DVR.getRawLocation());
   Out << ", ";
-  WriteAsOperandInternal(Out, DVR.getRawVariable(), WriterCtx, true);
+  PrintOrNull(DVR.getRawVariable());
   Out << ", ";
-  WriteAsOperandInternal(Out, DVR.getRawExpression(), WriterCtx, true);
+  PrintOrNull(DVR.getRawExpression());
   Out << ", ";
   if (DVR.isDbgAssign()) {
-    WriteAsOperandInternal(Out, DVR.getRawAssignID(), WriterCtx, true);
+    PrintOrNull(DVR.getRawAssignID());
     Out << ", ";
-    WriteAsOperandInternal(Out, DVR.getRawAddress(), WriterCtx, true);
+    PrintOrNull(DVR.getRawAddress());
     Out << ", ";
-    WriteAsOperandInternal(Out, DVR.getRawAddressExpression(), WriterCtx, true);
+    PrintOrNull(DVR.getRawAddressExpression());
     Out << ", ";
   }
-  WriteAsOperandInternal(Out, DVR.getDebugLoc().getAsMDNode(), WriterCtx, true);
+  PrintOrNull(DVR.getDebugLoc().getAsMDNode());
   Out << ")";
 }
 
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 814c00c669cb3..58875e59c727d 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1155,8 +1155,7 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
   case 'd':
     if (Name.consume_front("dbg.")) {
       // Mark debug intrinsics for upgrade to new debug format.
-      if (CanUpgradeDebugIntrinsicsToRecords &&
-          F->getParent()->IsNewDbgInfoFormat) {
+      if (CanUpgradeDebugIntrinsicsToRecords) {
         if (Name == "addr" || Name == "value" || Name == "assign" ||
             Name == "declare" || Name == "label") {
           // There's no function to replace these with.
@@ -4398,11 +4397,25 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, CallBase *CI,
 /// Helper to unwrap intrinsic call MetadataAsValue operands.
 template <typename MDType>
 static MDType *unwrapMAVOp(CallBase *CI, unsigned Op) {
-  if (MetadataAsValue *MAV = dyn_cast<MetadataAsValue>(CI->getArgOperand(Op)))
-    return dyn_cast<MDType>(MAV->getMetadata());
+  if (Op < CI->arg_size()) {
+    if (MetadataAsValue *MAV = dyn_cast<MetadataAsValue>(CI->getArgOperand(Op)))
+      // Use a reinterpret cast rather than a safe default-to-null cast: the
+      // autoupgrade process happens before the verifier, and thus there might
+      // be some nonsense metadata in there.
+      return reinterpret_cast<MDType*>(MAV->getMetadata());
+  }
   return nullptr;
 }
 
+static const DILocation *getDebugLocSafe(const Instruction *I) {
+  MDNode *MD = I->getDebugLoc().getAsMDNode();
+  // Use a C-style cast here rather than cast<DILocation>. The autoupgrader
+  // runs before the verifier, so the Metadata could refer to anything. Allow
+  // the verifier to detect and produce an error message, which will be much
+  // more ergonomic to the user.
+  return (const DILocation*)MD;
+}
+
 /// Convert debug intrinsic calls to non-instruction debug records.
 /// \p Name - Final part of the intrinsic name, e.g. 'value' in llvm.dbg.value.
 /// \p CI - The debug intrinsic call.
@@ -4415,11 +4428,11 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
         unwrapMAVOp<Metadata>(CI, 0), unwrapMAVOp<DILocalVariable>(CI, 1),
         unwrapMAVOp<DIExpression>(CI, 2), unwrapMAVOp<DIAssignID>(CI, 3),
         unwrapMAVOp<Metadata>(CI, 4), unwrapMAVOp<DIExpression>(CI, 5),
-        CI->getDebugLoc());
+        getDebugLocSafe(CI));
   } else if (Name == "declare") {
     DR = new DbgVariableRecord(
         unwrapMAVOp<Metadata>(CI, 0), unwrapMAVOp<DILocalVariable>(CI, 1),
-        unwrapMAVOp<DIExpression>(CI, 2), CI->getDebugLoc(),
+        unwrapMAVOp<DIExpression>(CI, 2), getDebugLocSafe(CI),
         DbgVariableRecord::LocationType::Declare);
   } else if (Name == "addr") {
     // Upgrade dbg.addr to dbg.value with DW_OP_deref.
@@ -4427,7 +4440,7 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
     Expr = DIExpression::append(Expr, dwarf::DW_OP_deref);
     DR = new DbgVariableRecord(unwrapMAVOp<Metadata>(CI, 0),
                                unwrapMAVOp<DILocalVariable>(CI, 1), Expr,
-                               CI->getDebugLoc());
+                               getDebugLocSafe(CI));
   } else if (Name == "value") {
     // An old version of dbg.value had an extra offset argument.
     unsigned VarOp = 1;
@@ -4442,7 +4455,7 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
     }
     DR = new DbgVariableRecord(
         unwrapMAVOp<Metadata>(CI, 0), unwrapMAVOp<DILocalVariable>(CI, VarOp),
-        unwrapMAVOp<DIExpression>(CI, ExprOp), CI->getDebugLoc());
+        unwrapMAVOp<DIExpression>(CI, ExprOp), getDebugLocSafe(CI));
   }
   assert(DR && "Unhandled intrinsic kind in upgrade to DbgRecord");
   CI->getParent()->insertDbgRecordBefore(DR, CI->getIterator());
diff --git a/llvm/lib/IR/BasicBlock.cpp b/llvm/lib/IR/BasicBlock.cpp
index ed11ea06398f1..f716e9970b841 100644
--- a/llvm/lib/IR/BasicBlock.cpp
+++ b/llvm/lib/IR/BasicBlock.cpp
@@ -32,8 +32,6 @@ using namespace llvm;
 STATISTIC(NumInstrRenumberings, "Number of renumberings across all blocks");
 
 DbgMarker *BasicBlock::createMarker(Instruction *I) {
-  assert(IsNewDbgInfoFormat &&
-         "Tried to create a marker in a non new debug-info block!");
   if (I->DebugMarker)
     return I->DebugMarker;
   DbgMarker *Marker = new DbgMarker();
@@ -43,8 +41,6 @@ DbgMarker *BasicBlock::createMarker(Instruction *I) {
 }
 
 DbgMarker *BasicBlock::createMarker(InstListType::iterator It) {
-  assert(IsNewDbgInfoFormat &&
-         "Tried to create a marker in a non new debug-info block!");
   if (It != end())
     return createMarker(&*It);
   DbgMarker *DM = getTrailingDbgRecords();
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 2d03a7a261b5b..17e76902fc0b5 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -6710,7 +6710,7 @@ void Verifier::visit(DbgVariableRecord &DVR) {
   CheckDI(DVR.getType() == DbgVariableRecord::LocationType::Value ||
               DVR.getType() == DbgVariableRecord::LocationType::Declare ||
               DVR.getType() == DbgVariableRecord::LocationType::Assign,
-          "invalid #dbg record type", &DVR, DVR.getType());
+          "invalid #dbg record type", &DVR, DVR.getType(), BB, F);
 
   // The location for a DbgVariableRecord must be either a ValueAsMetadata,
   // DIArgList, or an empty MDNode (which is a legacy representation for an
@@ -6718,30 +6718,30 @@ void Verifier::visit(DbgVariableRecord &DVR) {
   auto *MD = DVR.getRawLocation();
   CheckDI(MD && (isa<ValueAsMetadata>(MD) || isa<DIArgList>(MD) ||
                  (isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands())),
-          "invalid #dbg record address/value", &DVR, MD);
+          "invalid #dbg record address/value", &DVR, MD, BB, F);
   if (auto *VAM = dyn_cast<ValueAsMetadata>(MD)) {
     visitValueAsMetadata(*VAM, F);
     if (DVR.isDbgDeclare()) {
       // Allow integers here to support inttoptr salvage.
       Type *Ty = VAM->getValue()->getType();
       CheckDI(Ty->isPointerTy() || Ty->isIntegerTy(),
-              "location of #dbg_declare must be a pointer or int", &DVR, MD);
+              "location of #dbg_declare must be a pointer or int", &DVR, MD, BB, F);
     }
   } else if (auto *AL = dyn_cast<DIArgList>(MD)) {
     visitDIArgList(*AL, F);
   }
 
   CheckDI(isa_and_nonnull<DILocalVariable>(DVR.getRawVariable()),
-          "invalid #dbg record variable", &DVR, DVR.getRawVariable());
+          "invalid #dbg record variable", &DVR, DVR.getRawVariable(), BB, F);
   visitMDNode(*DVR.getRawVariable(), AreDebugLocsAllowed::No);
 
   CheckDI(isa_and_nonnull<DIExpression>(DVR.getRawExpression()),
-          "invalid #dbg record expression", &DVR, DVR.getRawExpression());
+          "invalid #dbg record expression", &DVR, DVR.getRawExpression(), BB, F);
   visitMDNode(*DVR.getExpression(), AreDebugLocsAllowed::No);
 
   if (DVR.isDbgAssign()) {
     CheckDI(isa_and_nonnull<DIAssignID>(DVR.getRawAssignID()),
-            "invalid #dbg_assign DIAssignID", &DVR, DVR.getRawAssignID());
+            "invalid #dbg_assign DIAssignID", &DVR, DVR.getRawAssignID(), BB, F);
     visitMDNode(*cast<DIAssignID>(DVR.getRawAssignID()),
                 AreDebugLocsAllowed::No);
 
@@ -6752,29 +6752,29 @@ void Verifier::visit(DbgVariableRecord &DVR) {
     CheckDI(
         isa<ValueAsMetadata>(RawAddr) ||
             (isa<MDNode>(RawAddr) && !cast<MDNode>(RawAddr)->getNumOperands()),
-        "invalid #dbg_assign address", &DVR, DVR.getRawAddress());
+        "invalid #dbg_assign address", &DVR, DVR.getRawAddress(), BB, F);
     if (auto *VAM = dyn_cast<ValueAsMetadata>(RawAddr))
       visitValueAsMetadata(*VAM, F);
 
     CheckDI(isa_and_nonnull<DIExpression>(DVR.getRawAddressExpression()),
             "invalid #dbg_assign address expression", &DVR,
-            DVR.getRawAddressExpression());
+            DVR.getRawAddressExpression(), BB, F);
     visitMDNode(*DVR.getAddressExpression(), AreDebugLocsAllowed::No);
 
     // All of the linked instructions should be in the same function as DVR.
     for (Instruction *I : at::getAssignmentInsts(&DVR))
       CheckDI(DVR.getFunction() == I->getFunction(),
-              "inst not in same function as #dbg_assign", I, &DVR);
+              "inst not in same function as #dbg_assign", I, &DVR, BB, F);
   }
 
   // This check is redundant with one in visitLocalVariable().
   DILocalVariable *Var = DVR.getVariable();
   CheckDI(isType(Var->getRawType()), "invalid type ref", Var,
-          Var->getRawType());
+          Var->getRawType(), BB, F);
 
   auto *DLNode = DVR.getDebugLoc().getAsMDNode();
   CheckDI(isa_and_nonnull<DILocation>(DLNode), "invalid #dbg record DILocation",
-          &DVR, DLNode);
+          &DVR, DLNode, BB, F);
   DILocation *Loc = DVR.getDebugLoc();
 
   // The scopes for variables and !dbg attachments must agree.
@@ -6786,7 +6786,7 @@ void Verifier::visit(DbgVariableRecord &DVR) {
   CheckDI(VarSP == LocSP,
           "mismatched subprogram between #dbg record variable and DILocation",
           &DVR, BB, F, Var, Var->getScope()->getSubprogram(), Loc,
-          Loc->getScope()->getSubprogram());
+          Loc->getScope()->getSubprogram(), BB, F);
 
   verifyFnArgs(DVR);
 }
diff --git a/llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll b/llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
index 2b089d2639375..c8b235757afba 100644
--- a/llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
+++ b/llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
@@ -12,8 +12,12 @@ entry:
       metadata ptr undef,
       metadata !DILocalVariable(scope: !1),
       metadata !DIExpression())
-; AS: llvm.dbg.value intrinsic requires a !dbg attachment
+; AS: invalid #dbg record DILocation
+; AS: #dbg_value(ptr undef, !{{[0-9]+}}, !DIExpression(), (null))
+; AS: label %entry
+; AS: ptr @foo
 ; AS: warning: ignoring invalid debug info in <stdin>
+
 ret void
 }
 
diff --git a/llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll b/llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll
index 0a4b7c255dc71..d1f1e1ce768dc 100644
--- a/llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll
+++ b/llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll
@@ -8,7 +8,7 @@
 
 define dso_local void @fun2() !dbg !15 {
   ;; DIAssignID copied here from @fun() where it is used by intrinsics.
-  ; CHECK: dbg.assign not in same function as inst
+  ; CHECK: DVRAssign not in same function as inst
   %x = alloca i32, align 4, !DIAssignID !14
   ret void
 }
@@ -17,24 +17,24 @@ define dso_local void @fun() !dbg !7 {
 entry:
   %a = alloca i32, align 4, !DIAssignID !14
   ;; Here something other than a dbg.assign intrinsic is using a DIAssignID.
-  ; CHECK: !DIAssignID should only be used by llvm.dbg.assign intrinsics
+  ; CHECK: !DIAssignID should only be used by Assign DVRs
   call void @llvm.dbg.value(metadata !14, metadata !10, metadata !DIExpression()), !dbg !13
 
   ;; Each following dbg.assign has an argument of the incorrect type.
-  ; CHECK: invalid llvm.dbg.assign intrinsic address/value
+  ; CHECK: invalid #dbg record address/value
   call void @llvm.dbg.assign(metadata !3, metadata !10, metadata !DIExpression(), metadata !14, metadata ptr undef, metadata !DIExpression()), !dbg !13
-  ; CHECK: invalid llvm.dbg.assign intrinsic variable
+  ; CHECK: invalid #dbg record variable
   call void @llvm.dbg.assign(metadata i32 0, metadata !2, metadata !DIExpression(), metadata !14, metadata ptr undef, metadata !DIExpression()), !dbg !13
-  ; CHECK: invalid llvm.dbg.assign intrinsic expression
+  ; CHECK: invalid #dbg record expression
   call void @llvm.dbg.assign(metadata !14, metadata !10, metadata !2, metadata !14, metadata ptr undef, metadata !DIExpression()), !dbg !13
-  ; CHECK: invalid llvm.dbg.assign intrinsic DIAssignID
+  ; CHECK: invalid #dbg_assign DIAssignID
   call void @llvm.dbg.assign(metadata !14, metadata !10, metadata !DIExpression(), metadata !2, metadata ptr undef, metadata !DIExpression()), !dbg !13
-  ; CHECK: invalid llvm.dbg.assign intrinsic address
+  ; CHECK: invalid #dbg_assign address
   call void @llvm.dbg.assign(metadata !14, metadata !10, metadata !DIExpression(), metadata !14, metadata !3, metadata !DIExpression()), !dbg !13
   ;; Empty metadata debug operands are allowed.
-  ; CHECK-NOT: invalid llvm.dbg.assign
+  ; CHECK-NOT: invalid #dbg record
   call void @llvm.dbg.assign(metadata !14, metadata !10, metadata !DIExpression(), metadata !14, metadata !2, metadata !DIExpression()), !dbg !13
-  ; CHECK: invalid llvm.dbg.assign intrinsic address expression
+  ; CHECK: invalid #dbg_assign address expression
   call void @llvm.dbg.assign(metadata !14, metadata !10, metadata !DIExpression(), metadata !14, metadata ptr undef, metadata !2), !dbg !13
   ret void
 }
diff --git a/llvm/test/DebugInfo/Generic/sroa-extract-bits.ll b/llvm/test/DebugInfo/Generic/sroa-extract-bits.ll
index f47e495db6617..6db453605cb57 100644
--- a/llvm/test/DebugInfo/Generic/sroa-extract-bits.ll
+++ b/llvm/test/DebugInfo/Generic/sroa-extract-bits.ll
@@ -13,8 +13,8 @@ define i8 @test1(i32 %arg) {
 ; CHECK-NEXT:      #dbg_value(i8 [[PTR_SROA_0_0_EXTRACT_TRUNC]], [[META2:![0-9]+]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7:![0-9]+]])
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i32 [[ARG]], 8
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[PTR_SROA_2_0_EXTRACT_SHIFT]] to i24
-; CHECK-NEXT:      #dbg_value(i24 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META8:![0-9]+]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 8, 16), [[META7]])
-; CHECK-NEXT:      #dbg_value(i24 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META9:![0-9]+]], !DIExpression(DW_OP_LLVM_extract_bits_sext, 0, 8), [[META7]])
+; CHECK-NEXT:      #dbg_value(i24 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META8:![0-9]+]], !DIExpression(DW_OP_LLVM_extract_bits_sext, 0, 8), [[META7]])
+; CHECK-NEXT:      #dbg_value(i24 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META10:![0-9]+]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 8, 16), [[META7]])
 ; CHECK-NEXT:    ret i8 [[PTR_SROA_0_0_EXTRACT_TRUNC]]
 ;
 entry:
@@ -36,11 +36,11 @@ define i8 @test2(i32 %arg1, i8 %arg2) {
 ; CHECK-NEXT:      #dbg_value(i8 [[PTR_SROA_0_0_EXTRACT_TRUNC]], [[META2]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i32 [[ARG1]], 8
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[PTR_SROA_2_0_EXTRACT_SHIFT]] to i16
-; CHECK-NEXT:      #dbg_value(i16 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META9]], !DIExpression(DW_OP_LLVM_extract_bits_sext, 0, 16), [[META7]])
+; CHECK-NEXT:      #dbg_value(i16 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META8]], !DIExpression(DW_OP_LLVM_extract_bits_sext, 0, 16), [[META7]])
 ; CHECK-NEXT:    [[PTR_SROA_21_0_EXTRACT_SHIFT:%.*]] = lshr i32 [[ARG1]], 24
 ; CHECK-NEXT:    [[PTR_SROA_21_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[PTR_SROA_21_0_EXTRACT_SHIFT]] to i8
-; CHECK-NEXT:      #dbg_value(i8 [[PTR_SROA_21_0_EXTRACT_TRUNC]], [[META8]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
-; CHECK-NEXT:      #dbg_value(i8 [[ARG2]], [[META8]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
+; CHECK-NEXT:      #dbg_value(i8 [[PTR_SROA_21_0_EXTRACT_TRUNC]], [[META10]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
+; CHECK-NEXT:      #dbg_value(i8 [[ARG2]], [[META10]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
 ; CHECK-NEXT:    ret i8 [[PTR_SROA_0_0_EXTRACT_TRUNC]]
 ;
 entry:
@@ -84,7 +84,7 @@ define i16 @test4(i32 %arg) {
 ; CHECK-NEXT:      #dbg_value(i16 [[PTR_SROA_0_0_EXTRACT_TRUNC]], [[META2]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 0, 8), [[META7]])
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i32 [[ARG]], 16
 ; CHECK-NEXT:    [[PTR_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[PTR_SROA_2_0_EXTRACT_SHIFT]] to i16
-; CHECK-NEXT:      #dbg_value(i16 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META8]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 8, 8), [[META7]])
+; CHECK-NEXT:      #dbg_value(i16 [[PTR_SROA_2_0_EXTRACT_TRUNC]], [[META10]], !DIExpression(DW_OP_LLVM_extract_bits_zext, 8, 8), [[META7]])
 ; CHECK-NEXT:    ret i16 [[PTR_SROA_0_0_EXTRACT_TRUNC]]
 ;
 entry:
@@ -107,8 +107,8 @@ define i8 @test5(i32 %arg) {
 ; CHECK-NEXT:      #dbg_value(i8 [[PTR_SROA_0_0_EXTRACT_TRUNC]], [[META11:![0-9]+]], !DIExpression...
[truncated]

Copy link

github-actions bot commented Jun 9, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

github-actions bot commented Jun 9, 2025

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Verifier/RemoveDI/invalid-dbg-declare-operands.ll llvm/test/Verifier/dbg-declare-invalid-debug-loc.ll llvm/lib/AsmParser/LLParser.cpp llvm/lib/IR/AsmWriter.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/BasicBlock.cpp llvm/lib/IR/Verifier.cpp llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll llvm/test/DebugInfo/Generic/assignment-tracking/parse-and-verify/verify.ll llvm/test/DebugInfo/Generic/sroa-extract-bits.ll llvm/test/Transforms/IROutliner/outlining-debug-statements.ll llvm/test/Transforms/ObjCARC/code-motion.ll llvm/test/Verifier/diexpression-entry-value-llvm-ir.ll llvm/test/Verifier/llvm.dbg.declare-address.ll llvm/test/Verifier/llvm.dbg.declare-expression.ll llvm/test/Verifier/llvm.dbg.declare-variable.ll llvm/test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll llvm/test/Verifier/llvm.dbg.value-expression.ll llvm/test/Verifier/llvm.dbg.value-value.ll llvm/test/Verifier/llvm.dbg.value-variable.ll llvm/unittests/IR/DebugInfoTest.cpp

The following files introduce new uses of undef:

  • llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

Copy link
Contributor

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the toggling on and off and various controls caught up with us somewhere, unfortunate. This SGTM but I have some questions and nits inline

Out << "(null)";
} else {
WriteAsOperandInternal(Out, M, WriterCtx, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: don't need braces on the if. Also clang-format bot looks upset

Comment on lines 4402 to 4405
// Use a reinterpret cast rather than a safe default-to-null cast: the
// autoupgrade process happens before the verifier, and thus there might
// be some nonsense metadata in there.
return reinterpret_cast<MDType*>(MAV->getMetadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this comment could be simplified? Something like "Use reinterpret_cast rather than dyn_cast because the autoupgrade process happens before the verifier, meaning we may see unexpected operands here". YMMV. Also I think the style guide says to put braces around multi-line ifs (even if it's a comment making it multi-line).

Comment on lines 44 to 46
attributes #0 = { alwaysinline nounwind sspstrong "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind sspstrong "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need all the attributes?

store i32 5, ptr %arr, align 4, !dbg !12
store i32 4, ptr %sum, align 4, !dbg !13
%0 = load i32, ptr %sum, align 4, !dbg !14
ret i32 %0, !dbg !15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this test could be simplified - do these functions need any bodies at all, or can we just have the dbg.declare? do we need two functions?

tail call void @llvm.dbg.value(metadata i32 undef, i64 0, metadata !64, metadata !DIExpression()), !dbg !71
%1 = trunc i64 %0 to i32
ret i32 %1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the function above adding anything to the test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked resolved, but AFAICT the test hasn't changed, was it by mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I stripped cruft from one test, but actually two tests wanted the cruft removing, I'm now sending up a further revision.

@@ -43,8 +41,6 @@ DbgMarker *BasicBlock::createMarker(Instruction *I) {
}

DbgMarker *BasicBlock::createMarker(InstListType::iterator It) {
assert(IsNewDbgInfoFormat &&
"Tried to create a marker in a non new debug-info block!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the IsNewDbgInfoFormat deletions (here and in the unittest) from a different patch in the series?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did actually hit this while developing this patch; LLParser was setting the flag to false, then Autoupgrade, upon upgrading, would create a marker and hit this assertion. Deleting the assertion (as the flag should be true everywhere anyway) was part of that process.

(Technically this could be taken out of this patch now as there's nothing setting the flag to false, anywhere, but it's got to be deleted at some point).

// runs before the verifier, so the Metadata could refer to anything. Allow
// the verifier to detect and produce an error message, which will be much
// more ergonomic to the user.
return (const DILocation*)MD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure but could this be UB if MD is in fact not a DILocation? Or is that only if it's read through the ptr... I guess it's still not ideal to introduce the chance of UB...

Would it be more correct to return nullptr in that case, and allow nullptrs in DbgRecords which are checked against in the verifier (I don't like it, but not sure this is right either?). I guess the same comment applies to the reinterpret_cast above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's the dereference that's the problem; while my instinct is to go "It'll be fiiiinnnneeee", it turns out we've been here before, and the createUnresolvedDbgVariableRecord constructor works exactly for this purpose. I've reshuffled this code to cast to MDNodes and pass into that constructor.

What we lose out on is some error-case ergonomics: in some of the verifier tests, we now no longer have a spurious empty-string metadata printed in the middle of a DbgRecord and instead have (null). That's because the autoupgrade process drops the string reference because it's Metadata* rather than an MDNode, so it returns null. Theoretically this harms the ability to localise where a faulty debug record is in the input. However, for MDNodes we'll still pass the illegal node in and print it out, I've added a little coverage to llvm.dbg.declare-variable.ll.

I don't think this is likely to seriously harm diagnosing what's wrong with an LLVM-IR file; there are also block and function references printed to localise where the faulty record is.

Copy link
Member Author

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various refinements, see comments inline

// runs before the verifier, so the Metadata could refer to anything. Allow
// the verifier to detect and produce an error message, which will be much
// more ergonomic to the user.
return (const DILocation*)MD;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's the dereference that's the problem; while my instinct is to go "It'll be fiiiinnnneeee", it turns out we've been here before, and the createUnresolvedDbgVariableRecord constructor works exactly for this purpose. I've reshuffled this code to cast to MDNodes and pass into that constructor.

What we lose out on is some error-case ergonomics: in some of the verifier tests, we now no longer have a spurious empty-string metadata printed in the middle of a DbgRecord and instead have (null). That's because the autoupgrade process drops the string reference because it's Metadata* rather than an MDNode, so it returns null. Theoretically this harms the ability to localise where a faulty debug record is in the input. However, for MDNodes we'll still pass the illegal node in and print it out, I've added a little coverage to llvm.dbg.declare-variable.ll.

I don't think this is likely to seriously harm diagnosing what's wrong with an LLVM-IR file; there are also block and function references printed to localise where the faulty record is.

@@ -43,8 +41,6 @@ DbgMarker *BasicBlock::createMarker(Instruction *I) {
}

DbgMarker *BasicBlock::createMarker(InstListType::iterator It) {
assert(IsNewDbgInfoFormat &&
"Tried to create a marker in a non new debug-info block!");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did actually hit this while developing this patch; LLParser was setting the flag to false, then Autoupgrade, upon upgrading, would create a marker and hit this assertion. Deleting the assertion (as the flag should be true everywhere anyway) was part of that process.

(Technically this could be taken out of this patch now as there's nothing setting the flag to false, anywhere, but it's got to be deleted at some point).

Comment on lines 4402 to 4407
if (MetadataAsValue *MAV =
dyn_cast<MetadataAsValue>(CI->getArgOperand(Op))) {
Metadata *MD = MAV->getMetadata();
if (isa<MDNode>(MD))
return reinterpret_cast<MDNode *>(MD);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (MetadataAsValue *MAV =
dyn_cast<MetadataAsValue>(CI->getArgOperand(Op))) {
Metadata *MD = MAV->getMetadata();
if (isa<MDNode>(MD))
return reinterpret_cast<MDNode *>(MD);
}
if (MetadataAsValue *MAV =
dyn_cast<MetadataAsValue>(CI->getArgOperand(Op))) {
return dyn_cast<MDNode>(MAV->getMetadata());

Do we still need the reinterpret_cast now given you're doing an isa check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using dyn_cast overall.

tail call void @llvm.dbg.value(metadata i32 undef, i64 0, metadata !64, metadata !DIExpression()), !dbg !71
%1 = trunc i64 %0 to i32
ret i32 %1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked resolved, but AFAICT the test hasn't changed, was it by mistake?

Copy link
Contributor

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmorse jmorse merged commit 3d7aa96 into llvm:main Jun 11, 2025
6 of 7 checks passed
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…vm#143452)

By chance, two things have prevented the autoupgrade path being
exercised much so far:
 * LLParser setting the debug-info mode to "old" on seeing intrinsics,
* The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of
things:
* Tolerates the various illegal inputs that can be written with
debug-intrinsics, and that must be tolerated until the Verifier runs,
 * Printing illegal/null DbgRecord fields must succeed,
* Verifier errors need to localise the function/block where the error
is,
 * Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes
I found. There are also two edge cases:
* Some of the unit tests switch back and forth between intrinsic and
record modes at will; I've deleted coverage and some assertions to
tolerate this as intrinsic support is now Gone (TM),
* In sroa-extract-bits.ll, the order of debug records flips. This is
because the autoupgrader upgrades in the opposite order to the basic
block conversion routines... which doesn't change the record order, but
_does_ change the use list order in Metadata! This should (TM) have no
consequence to the correctness of LLVM, but will change the order of
various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…vm#143452)

By chance, two things have prevented the autoupgrade path being
exercised much so far:
 * LLParser setting the debug-info mode to "old" on seeing intrinsics,
* The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of
things:
* Tolerates the various illegal inputs that can be written with
debug-intrinsics, and that must be tolerated until the Verifier runs,
 * Printing illegal/null DbgRecord fields must succeed,
* Verifier errors need to localise the function/block where the error
is,
 * Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes
I found. There are also two edge cases:
* Some of the unit tests switch back and forth between intrinsic and
record modes at will; I've deleted coverage and some assertions to
tolerate this as intrinsic support is now Gone (TM),
* In sroa-extract-bits.ll, the order of debug records flips. This is
because the autoupgrader upgrades in the opposite order to the basic
block conversion routines... which doesn't change the record order, but
_does_ change the use list order in Metadata! This should (TM) have no
consequence to the correctness of LLVM, but will change the order of
various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants