[RemoveDIs][DebugInfo][IR] Add parsing for non-intrinsic debug values #79818

SLTozer · 2024-01-29T12:47:22Z

This patch adds support for parsing the proposed non-instruction debug info ("RemoveDIs") from textual IR. This patch does not cover bitcode parsing or documentation, and printing and verifier support are added in prior PRs (#79281, #79810); this patch is presented as a standalone patch for the sake of review, but it is intended to land after the aforementioned patches and will not compile without being based on them[0]. This patch also contains a basic affirmative test ("can we parse the new format correctly"), but not the negative tests (testing for parser errors) - these will be added soon, but shouldn't hold up review of the implementation details of the patch.

An important detail of this patch is the fact that although we can now parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we will always convert back to the old format at the end of parsing - this is done for two reasons: firstly to ensure that every tool is able to process IR printed in the new format, regardless of whether that tool has had RemoveDIs support added, and secondly to maintain the effect of the existing flags: for the tools where support for the new format has been added, we will run LLVM passes in the new format iff --try-experimental-debuginfo-iterators=true, and we will print in the new format iff --write-experimental-debuginfo-iterators=true; the format of the textual IR should have no effect on either of these features.

For reference, the format of the RemoveDIs debug info is as follows:

Old: call void @llvm.dbg.value(metadata i32 %a, metadata !12, metadata !DIExpression()), !dbg !14
New:   #dbg_value { i32 %a, !12, !DIExpression(), !14 }

Old: call void @llvm.dbg.declare(metadata ptr %b, metadata !13, metadata !DIExpression()), !dbg !16
New:   #dbg_declare { ptr %b, !13, !DIExpression(), !16 }

Old: call void @llvm.dbg.assign(metadata i32 %add, metadata !13, metadata !DIExpression(), metadata !15, metadata ptr %b, metadata !DIExpression()), !dbg !18
New:   #dbg_assign { i32 %add, !13, !DIExpression(), !15, ptr %b, !DIExpression(), !18 }

For the sake of review, a few notes on the design and implementation of this patch (some of these will be added to the docs in a subsequent patch):

In order to support parsing this format, this patch adds # as a token to the lexer, and keywords for each debug record variant.
It is an error for a module to have a mixed format, so we cannot contain calls or declarations for the existing debug intrinsics in the same module as a debug record.
Although we print debug records with distinct indentation, it is not an error to parse a module with different indentation, just as with ordinary instructions.
Because the set of debug records are much more limited than the set of instructions and there is a large overlap between their arguments, the parser handles all types of debug records in a single function with hardcoded parameter expectations, rather than parsing a generic list of parameters and relying on the verifier to catch incorrect parameter lists. This will likely continue even when we add other types of records, e.g. llvm.dbg.label.

[0] If you want to test the parser patch locally, there is a branch with all the prerequisite patches applied at: https://github.com/SLTozer/llvm-project/tree/ddd-ir-parse

llvmbot · 2024-01-29T12:47:41Z

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Stephen Tozer (SLTozer)

Changes

This patch adds support for parsing the proposed non-instruction debug info ("RemoveDIs") from textual IR. This patch does not cover bitcode parsing or documentation, and printing and verifier support are added in prior PRs (#79281, #79810); this patch is presented as a standalone patch for the sake of review, but it is intended to land after the aforementioned patches and will not compile without being based on them[0]. This patch also contains a basic affirmative test ("can we parse the new format correctly"), but not the negative tests (testing for parser errors) - these will be added soon, but shouldn't hold up review of the implementation details of the patch.

An important detail of this patch is the fact that although we can now parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we will always convert back to the old format at the end of parsing - this is done for two reasons: firstly to ensure that every tool is able to process IR printed in the new format, regardless of whether that tool has had RemoveDIs support added, and secondly to maintain the effect of the existing flags: for the tools where support for the new format has been added, we will run LLVM passes in the new format iff --try-experimental-debuginfo-iterators=true, and we will print in the new format iff --write-experimental-debuginfo-iterators=true; the format of the textual IR should have no effect on either of these features.

For reference, the format of the RemoveDIs debug info is as follows:

Old: call void @<!-- -->llvm.dbg.value(metadata i32 %a, metadata !12, metadata !DIExpression()), !dbg !14
New:   #dbg_value { i32 %a, !12, !DIExpression(), !14 }

Old: call void @<!-- -->llvm.dbg.declare(metadata ptr %b, metadata !13, metadata !DIExpression()), !dbg !16
New:   #dbg_declare { ptr %b, !13, !DIExpression(), !16 }

Old: call void @<!-- -->llvm.dbg.assign(metadata i32 %add, metadata !13, metadata !DIExpression(), metadata !15, metadata ptr %b, metadata !DIExpression()), !dbg !18
New:   #dbg_assign { i32 %add, !13, !DIExpression(), !15, ptr %b, !DIExpression(), !18 }

For the sake of review, a few notes on the design and implementation of this patch (some of these will be added to the docs in a subsequent patch):

In order to support parsing this format, this patch adds # as a token to the lexer, and keywords for each debug record variant.
It is an error for a module to have a mixed format, so we cannot contain calls or declarations for the existing debug intrinsics in the same module as a debug record.
Although we print debug records with distinct indentation, it is not an error to parse a module with different indentation, just as with ordinary instructions.
Because the set of debug records are much more limited than the set of instructions and there is a large overlap between their arguments, the parser handles all types of debug records in a single function with hardcoded parameter expectations, rather than parsing a generic list of parameters and relying on the verifier to catch incorrect parameter lists. This will likely continue even when we add other types of records, e.g. llvm.dbg.label.

[0] If you want to test the parser patch locally, there is a branch with all the prerequisite patches applied at: https://github.com/SLTozer/llvm-project/tree/ddd-ir-parse

Full diff: https://github.com/llvm/llvm-project/pull/79818.diff

7 Files Affected:

(modified) llvm/include/llvm/AsmParser/LLParser.h (+4)
(modified) llvm/include/llvm/AsmParser/LLToken.h (+3)
(modified) llvm/include/llvm/IR/DebugProgramInstruction.h (+24)
(modified) llvm/lib/AsmParser/LLLexer.cpp (+17-1)
(modified) llvm/lib/AsmParser/LLParser.cpp (+143-1)
(modified) llvm/lib/IR/DebugProgramInstruction.cpp (+17)
(added) llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll (+87)

diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index cf358c384f5203..6e3facbfcea573 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -177,6 +177,9 @@ namespace llvm {
     /// UpgradeDebuginfo so it can generate broken bitcode.
     bool UpgradeDebugInfo;
 
+    bool SeenNewDbgInfoFormat = false;
+    bool SeenOldDbgInfoFormat = false;
+
     std::string SourceFileName;
 
   public:
@@ -585,6 +588,7 @@ namespace llvm {
     bool parseMDNodeTail(MDNode *&N);
     bool parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts);
     bool parseMetadataAttachment(unsigned &Kind, MDNode *&MD);
+    bool parseDebugProgramValue(DPValue *&DPV, PerFunctionState &PFS);
     bool parseInstructionMetadata(Instruction &Inst);
     bool parseGlobalObjectMetadataAttachment(GlobalObject &GO);
     bool parseOptionalFunctionMetadata(Function &F);
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index 147cf56c821aa1..4ce97041ca309e 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -36,6 +36,7 @@ enum Kind {
   exclaim, // !
   bar,     // |
   colon,   // :
+  hash,    // #
 
   kw_vscale,
   kw_x,
@@ -482,6 +483,8 @@ enum Kind {
   // Type valued tokens (TyVal).
   Type,
 
+  DbgRecordType,
+
   APFloat, // APFloatVal
   APSInt   // APSInt
 };
diff --git a/llvm/include/llvm/IR/DebugProgramInstruction.h b/llvm/include/llvm/IR/DebugProgramInstruction.h
index 737417fb9b9a54..284bfdd18db94f 100644
--- a/llvm/include/llvm/IR/DebugProgramInstruction.h
+++ b/llvm/include/llvm/IR/DebugProgramInstruction.h
@@ -129,6 +129,30 @@ class DPValue : public ilist_node<DPValue>, private DebugValueUser {
           DIAssignID *AssignID, Metadata *Address,
           DIExpression *AddressExpression, const DILocation *DI);
 
+private:
+  /// Private constructor for creating new instances during parsing only. Only
+  /// called through `createUnresolvedDPValue` below, which makes clear that
+  /// this is used for parsing only, and will later return a subclass depending
+  /// on which Type is passed.
+  DPValue(LocationType Type, Metadata *Val, MDNode *Variable,
+          DIExpression *Expression, MDNode *AssignID, Metadata *Address,
+          DIExpression *AddressExpression, MDNode *DI);
+
+public:
+  /// Used to create DPValues during parsing, where some metadata references may
+  /// still be unresolved. Although for some fields a generic `Metadata*`
+  /// argument is accepted for forward type-references, the verifier and
+  /// accessors will reject incorrect types later on. The function is used for
+  /// all types of DPValues for simplicity while parsing, but asserts if any
+  /// necessary fields are empty or unused fields are not empty, i.e. if the
+  /// #dbg_assign fields are used for a non-dbg-assign type.
+  static DPValue *createUnresolvedDPValue(LocationType Type, Metadata *Val,
+                                          MDNode *Variable,
+                                          DIExpression *Expression,
+                                          MDNode *AssignID, Metadata *Address,
+                                          DIExpression *AddressExpression,
+                                          MDNode *DI);
+
   static DPValue *createDPVAssign(Value *Val, DILocalVariable *Variable,
                                   DIExpression *Expression,
                                   DIAssignID *AssignID, Value *Address,
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index c8da3efbb68aff..4daf8a484da63e 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -440,7 +440,9 @@ lltok::Kind LLLexer::LexCaret() {
 ///    AttrGrpID ::= #[0-9]+
 lltok::Kind LLLexer::LexHash() {
   // Handle AttrGrpID: #[0-9]+
-  return LexUIntID(lltok::AttrGrpID);
+  if (isdigit(static_cast<unsigned char>(CurPtr[0])))
+    return LexUIntID(lltok::AttrGrpID);
+  return lltok::hash;
 }
 
 /// Lex a label, integer type, keyword, or hexadecimal integer constant.
@@ -922,6 +924,20 @@ lltok::Kind LLLexer::LexIdentifier() {
 
 #undef DWKEYWORD
 
+// Keywords for debug record types.
+#define DBGRECORDTYPEKEYWORD(STR)                                              \
+  do {                                                                         \
+    if (Keyword == "dbg_" #STR) {                                              \
+      StrVal = #STR;                                                           \
+      return lltok::DbgRecordType;                                             \
+    }                                                                          \
+  } while (false)
+
+  DBGRECORDTYPEKEYWORD(value);
+  DBGRECORDTYPEKEYWORD(declare);
+  DBGRECORDTYPEKEYWORD(assign);
+#undef DBGRECORDTYPEKEYWORD
+
   if (Keyword.starts_with("DIFlag")) {
     StrVal.assign(Keyword.begin(), Keyword.end());
     return lltok::DIFlag;
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index d6c5993797de11..b22d21f94a26ab 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -62,6 +62,8 @@ static cl::opt<bool> AllowIncompleteIR(
         "Allow incomplete IR on a best effort basis (references to unknown "
         "metadata will be dropped)"));
 
+extern llvm::cl::opt<bool> UseNewDbgInfoFormat;
+
 static std::string getTypeString(Type *T) {
   std::string Result;
   raw_string_ostream Tmp(Result);
@@ -6012,6 +6014,17 @@ bool LLParser::parseTypeAndBasicBlock(BasicBlock *&BB, LocTy &Loc,
   return false;
 }
 
+bool isOldDbgFormatIntrinsic(StringRef Name) {
+  // Exit early for the common (non-debug-intrinsic) case.
+  // We can make this the only check when we begin supporting all "llvm.dbg"
+  // intrinsics in the new debug info format.
+  if (!Name.starts_with("llvm.dbg."))
+    return false;
+  Intrinsic::ID FnID = Function::lookupIntrinsicID(Name);
+  return FnID == Intrinsic::dbg_declare || FnID == Intrinsic::dbg_value ||
+         FnID == Intrinsic::dbg_assign;
+}
+
 /// FunctionHeader
 ///   ::= OptionalLinkage OptionalPreemptionSpecifier OptionalVisibility
 ///       OptionalCallingConv OptRetAttrs OptUnnamedAddr Type GlobalName
@@ -6194,6 +6207,13 @@ bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine,
     }
   }
 
+  if (isOldDbgFormatIntrinsic(FunctionName)) {
+    if (SeenNewDbgInfoFormat)
+      return error(NameLoc, "llvm.dbg intrinsic should not appear in a module "
+                            "using non-intrinsic debug info");
+    SeenOldDbgInfoFormat = true;
+  }
+
   Fn = Function::Create(FT, GlobalValue::ExternalLinkage, AddrSpace,
                         FunctionName, M);
 
@@ -6359,9 +6379,29 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {
 
   std::string NameStr;
 
-  // parse the instructions in this block until we get a terminator.
+  // parse the instructions and debug values in this block until we get a
+  // terminator.
   Instruction *Inst;
+  DPValue *DPV;
+  SmallVector<std::unique_ptr<DPValue>> TrailingDPValues;
   do {
+    // Handle debug records first - there should always be an instruction
+    // following the debug records, i.e. they cannot appear after the block
+    // terminator.
+    while (Lex.getKind() == lltok::hash) {
+      if (SeenOldDbgInfoFormat)
+        return error(Lex.getLoc(), "debug record should not appear in a module "
+                                   "containing debug info intrinsics");
+      SeenNewDbgInfoFormat = true;
+      Lex.Lex();
+      if (!BB->getModule()->IsNewDbgInfoFormat)
+        BB->getModule()->convertToNewDbgValues();
+
+      if (parseDebugProgramValue(DPV, PFS))
+        return true;
+      TrailingDPValues.emplace_back(DPV);
+    }
+
     // This instruction may have three possibilities for a name: a) none
     // specified, b) name specified "%foo =", c) number specified: "%4 =".
     LocTy NameLoc = Lex.getLoc();
@@ -6406,11 +6446,103 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {
     // Set the name on the instruction.
     if (PFS.setInstName(NameID, NameStr, NameLoc, Inst))
       return true;
+
+    // Attach any preceding debug values to this instruction.
+    for (std::unique_ptr<DPValue> &DPV : TrailingDPValues) {
+      BB->insertDPValueBefore(DPV.release(), Inst->getIterator());
+    }
+    TrailingDPValues.clear();
   } while (!Inst->isTerminator());
 
+  assert(TrailingDPValues.empty() &&
+         "All debug values should have been attached to an instruction.");
+
   return false;
 }
 
+/// parseDebugProgramValue
+///   ::= #dbg_Type { (ValueAsMetadata|DIArgList|MDNode), MetadataID,
+///   DIExpression, DILocation }
+bool LLParser::parseDebugProgramValue(DPValue *&DPV, PerFunctionState &PFS) {
+  using LocType = DPValue::LocationType;
+  LocTy DPVLoc = Lex.getLoc();
+  if (Lex.getKind() != lltok::DbgRecordType) {
+    return error(DPVLoc, "expected debug record type here");
+  }
+  auto Type = StringSwitch<LocType>(Lex.getStrVal())
+                  .Case("declare", LocType::Declare)
+                  .Case("value", LocType::Value)
+                  .Case("assign", LocType::Assign)
+                  .Default(LocType::End);
+  if (Type == LocType::End)
+    return error(DPVLoc, "expected valid #dbg record here");
+  Lex.Lex();
+  if (parseToken(lltok::lbrace, "Expected '{' here"))
+    return true;
+
+  // Parse Value field...
+  Metadata *ValLocMD;
+  if (parseMetadata(ValLocMD, &PFS))
+    return true;
+  if (parseToken(lltok::comma, "Expected ',' here"))
+    return true;
+
+  // Parse Variable field...
+  MDNode *Variable;
+  if (parseMDNode(Variable))
+    return true;
+  if (parseToken(lltok::comma, "Expected ',' here"))
+    return true;
+
+  // Parse Expression field...
+  LocTy ExprLoc = Lex.getLoc();
+  Metadata *Expression;
+  if (parseMetadata(Expression, &PFS))
+    return true;
+  if (!isa<DIExpression>(Expression))
+    return error(ExprLoc, "expected valid DIExpression here");
+  if (parseToken(lltok::comma, "Expected ',' here"))
+    return true;
+
+  // Parse additional fields for #dbg_assign.
+  MDNode *AssignID = nullptr;
+  Metadata *AddressLocation = nullptr;
+  Metadata *AddressExpression = nullptr;
+  if (Type == LocType::Assign) {
+    // Parse DIAssignID...
+    if (parseMDNode(AssignID))
+      return true;
+    if (parseToken(lltok::comma, "Expected ',' here"))
+      return true;
+
+    // Parse address ValueAsMetadata...
+    if (parseMetadata(AddressLocation, &PFS))
+      return true;
+    if (parseToken(lltok::comma, "Expected ',' here"))
+      return true;
+
+    // Parse address DIExpression...
+    LocTy AddressExprLoc = Lex.getLoc();
+    if (parseMetadata(AddressExpression, &PFS))
+      return true;
+    if (!isa<DIExpression>(Expression))
+      return error(AddressExprLoc, "expected valid DIExpression here");
+    if (parseToken(lltok::comma, "Expected ',' here"))
+      return true;
+  }
+
+  /// Parse DILocation...
+  MDNode *DebugLoc;
+  if (parseMDNode(DebugLoc))
+    return true;
+
+  if (parseToken(lltok::rbrace, "Expected '}' here"))
+    return true;
+  DPV = DPValue::createUnresolvedDPValue(
+      Type, ValLocMD, Variable, cast<DIExpression>(Expression), AssignID,
+      AddressLocation, cast_or_null<DIExpression>(AddressExpression), DebugLoc);
+  return false;
+}
 //===----------------------------------------------------------------------===//
 // Instruction Parsing.
 //===----------------------------------------------------------------------===//
@@ -7638,6 +7770,16 @@ bool LLParser::parseCall(Instruction *&Inst, PerFunctionState &PFS,
     }
     CI->setFastMathFlags(FMF);
   }
+
+  if (CalleeID.Kind == ValID::t_GlobalName &&
+      isOldDbgFormatIntrinsic(CalleeID.StrVal)) {
+    if (SeenNewDbgInfoFormat) {
+      CI->deleteValue();
+      return error(CallLoc, "llvm.dbg intrinsic should not appear in a module "
+                            "using non-intrinsic debug info");
+    }
+    SeenOldDbgInfoFormat = true;
+  }
   CI->setAttributes(PAL);
   ForwardRefAttrGroups[CI] = FwdRefAttrGrps;
   Inst = CI;
diff --git a/llvm/lib/IR/DebugProgramInstruction.cpp b/llvm/lib/IR/DebugProgramInstruction.cpp
index fd234685d5fd4b..560ace9a84d70c 100644
--- a/llvm/lib/IR/DebugProgramInstruction.cpp
+++ b/llvm/lib/IR/DebugProgramInstruction.cpp
@@ -59,6 +59,23 @@ DPValue::DPValue(Metadata *Value, DILocalVariable *Variable,
 
 void DPValue::deleteInstr() { delete this; }
 
+DPValue::DPValue(DPValue::LocationType Type, Metadata *Val, MDNode *Variable,
+                 DIExpression *Expression, MDNode *AssignID, Metadata *Address,
+                 DIExpression *AddressExpression, MDNode *DI)
+    : DebugValueUser({Val, Address, AssignID}), Variable(Variable),
+      Expression(Expression), DbgLoc(DI), AddressExpression(AddressExpression),
+      Type(Type) {}
+
+DPValue *DPValue::createUnresolvedDPValue(DPValue::LocationType Type,
+                                          Metadata *Val, MDNode *Variable,
+                                          DIExpression *Expression,
+                                          MDNode *AssignID, Metadata *Address,
+                                          DIExpression *AddressExpression,
+                                          MDNode *DI) {
+  return new DPValue(Type, Val, Variable, Expression, AssignID, Address,
+                     AddressExpression, DI);
+}
+
 DPValue *DPValue::createDPValue(Value *Location, DILocalVariable *DV,
                                 DIExpression *Expr, const DILocation *DI) {
   return new DPValue(ValueAsMetadata::get(Location), DV, Expr, DI,
diff --git a/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll b/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll
new file mode 100644
index 00000000000000..f8c0024977bc22
--- /dev/null
+++ b/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll
@@ -0,0 +1,87 @@
+;; Test that we can write in the old debug info format.
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=false < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+
+;; Test that we can write in the new debug info format...
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=true < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+;; ...and then read the new format and write the old format.
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=true < %s \
+; RUN:   | opt --passes=verify -S --write-experimental-debuginfo=false \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,OLDDBG  --implicit-check-not=llvm.dbg
+
+;; Test also that the new flag is independent of the flag that enables use of
+;; these non-instruction debug info during LLVM passes.
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=false < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=true < %s \
+; RUN:   | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+; CHECK: @f(i32 %[[VAL_A:[0-9a-zA-Z]+]])
+; CHECK-NEXT: entry:
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata i32 %[[VAL_A]], metadata ![[VAR_A:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_1:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_value { i32 %[[VAL_A]], ![[VAR_A:[0-9]+]], !DIExpression(), ![[LOC_1:[0-9]+]] }
+; CHECK-NEXT: {{^}}  %[[VAL_B:[0-9a-zA-Z]+]] = alloca
+; OLDDBG-NEXT: call void @llvm.dbg.declare(metadata ptr %[[VAL_B]], metadata ![[VAR_B:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_2:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_declare { ptr %[[VAL_B]], ![[VAR_B:[0-9]+]], !DIExpression(), ![[LOC_2:[0-9]+]] }
+; CHECK-NEXT: {{^}}  %[[VAL_ADD:[0-9a-zA-Z]+]] = add i32 %[[VAL_A]], 5
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), metadata ![[VAR_A]], metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg ![[LOC_3:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_value { !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), ![[VAR_A]], !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus), ![[LOC_3:[0-9]+]] }
+; CHECK-NEXT: {{^}}  store i32 %[[VAL_ADD]]{{.+}}, !DIAssignID ![[ASSIGNID:[0-9]+]]
+; OLDDBG-NEXT: call void @llvm.dbg.assign(metadata i32 %[[VAL_ADD]], metadata ![[VAR_B]], metadata !DIExpression(), metadata ![[ASSIGNID]], metadata ptr %[[VAL_B]], metadata !DIExpression()), !dbg ![[LOC_4:[0-9]+]]
+; NEWDBG-NEXT: {{^}}    #dbg_assign { i32 %[[VAL_ADD]], ![[VAR_B]], !DIExpression(), ![[ASSIGNID]], ptr %[[VAL_B]], !DIExpression(), ![[LOC_4:[0-9]+]] }
+; CHECK-NEXT: {{^}}  ret i32
+
+; OLDDBG-DAG: declare void @llvm.dbg.value
+; OLDDBG-DAG: declare void @llvm.dbg.declare
+; OLDDBG-DAG: declare void @llvm.dbg.assign
+
+; CHECK-DAG: llvm.dbg.cu
+; CHECK-DAG: ![[VAR_A]] = !DILocalVariable(name: "a"
+; CHECK-DAG: ![[VAR_B]] = !DILocalVariable(name: "b"
+; CHECK-DAG: ![[LOC_1]] = !DILocation(line: 3, column: 15
+; CHECK-DAG: ![[LOC_2]] = !DILocation(line: 3, column: 20
+; CHECK-DAG: ![[LOC_3]] = !DILocation(line: 3, column: 25
+; CHECK-DAG: ![[LOC_4]] = !DILocation(line: 3, column: 30
+
+define dso_local i32 @f(i32 %a) !dbg !7 {
+entry:
+  call void @llvm.dbg.value(metadata i32 %a, metadata !20, metadata !DIExpression()), !dbg !30
+  %b = alloca i32, !dbg !30, !DIAssignID !40
+  call void @llvm.dbg.declare(metadata ptr %b, metadata !21, metadata !DIExpression()), !dbg !31
+  %add = add i32 %a, 5, !dbg !31
+  call void @llvm.dbg.value(metadata !DIArgList(i32 %a, i32 %add), metadata !20, metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg !32
+  store i32 %add, ptr %b, !dbg !32, !DIAssignID !40
+  call void @llvm.dbg.assign(metadata i32 %add, metadata !21, metadata !DIExpression(), metadata !40, metadata ptr %b, metadata !DIExpression()), !dbg !33
+  ret i32 %add, !dbg !33
+
+}
+
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+declare void @llvm.dbg.declare(metadata, metadata, metadata)
+declare void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 18.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
+!1 = !DIFile(filename: "print.c", directory: "/tmp")
+!2 = !{}
+!3 = !{i32 2, !"Dwarf Version", i32 5}
+!4 = !{i32 2, !"Debug Info Version", i32 3}
+!5 = !{i32 1, !"wchar_size", i32 4}
+!6 = !{!"clang version 18.0.0"}
+!7 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 3, type: !8, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !13)
+!8 = !DISubroutineType(types: !9)
+!9 = !{!12, !12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !{!20, !21}
+!20 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1, line: 3, type: !12)
+!21 = !DILocalVariable(name: "b", scope: !7, file: !1, line: 3, type: !12)
+!30 = !DILocation(line: 3, column: 15, scope: !7)
+!31 = !DILocation(line: 3, column: 20, scope: !7)
+!32 = !DILocation(line: 3, column: 25, scope: !7)
+!33 = !DILocation(line: 3, column: 30, scope: !7)
+!40 = distinct !DIAssignID()
\ No newline at end of file

OCHyams

Code and test seem reasonable (with a few small nits scattered inline). Probably worth adding some negative (bad-input) tests, and another pair of eyes on this wouldn't hurt.

OCHyams · 2024-01-29T14:44:53Z

llvm/lib/AsmParser/LLParser.cpp

@@ -6359,9 +6379,29 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {

  std::string NameStr;

-  // parse the instructions in this block until we get a terminator.
+  // parse the instructions and debug values in this block until we get a


nit: since you're touching this comment anyway can you capitalise parse?

OCHyams · 2024-01-29T14:47:28Z

llvm/lib/AsmParser/LLParser.cpp

@@ -6406,11 +6446,103 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {
    // Set the name on the instruction.
    if (PFS.setInstName(NameID, NameStr, NameLoc, Inst))
      return true;
+
+    // Attach any preceding debug values to this instruction.
+    for (std::unique_ptr<DPValue> &DPV : TrailingDPValues) {


nit: unneeded braces

OCHyams · 2024-01-29T14:50:32Z

llvm/lib/AsmParser/LLParser.cpp

+///   ::= #dbg_Type { (ValueAsMetadata|DIArgList|MDNode), MetadataID,
+///   DIExpression, DILocation }


nit: comment needs updating for dbg.assigns

OCHyams · 2024-01-29T14:51:33Z

llvm/lib/AsmParser/LLParser.cpp

+  if (parseToken(lltok::lbrace, "Expected '{' here"))
+    return true;
+
+  // Parse Value field...


nit: full stop rather than ellipsis please, unless this is a common style in this file

OCHyams · 2024-01-29T14:52:32Z

llvm/lib/AsmParser/LLParser.cpp

+  LocTy DPVLoc = Lex.getLoc();
+  if (Lex.getKind() != lltok::DbgRecordType) {
+    return error(DPVLoc, "expected debug record type here");
+  }


nit: braces

OCHyams · 2024-01-29T14:53:48Z

llvm/lib/AsmParser/LLParser.cpp

  Instruction *Inst;
+  DPValue *DPV;


Can you sink the DPV variable down to the use in the loop?

adrian-prantl · 2024-01-29T23:05:00Z

Let me know if this has been discussed elsewhere, but why is necessary to add a new # token to the grammar?

adrian-prantl · 2024-01-29T23:51:58Z

Whoops, sorry! No Idea why the "Close with comment button" is larger than the "Comment" button!

SLTozer · 2024-01-30T09:44:09Z

Let me know if this has been discussed elsewhere, but why is necessary to add a new # token to the grammar?

The purpose of the leading # is to create a syntactic distinction from instructions. Without using a leading symbol to specify that a non-instruction line is beginning, the only discriminating feature would be whitespace - which is not clear enough by itself, and can't be used by the parser.
The reason why # specifically is used here is simply that it's free real estate - most other symbols that LLVM uses followed by letters are already classified as specific tokens (i.e. !foo, @foo, $foo, etc), and we'd ideally avoid adding new symbols (such as ~ or &) or overloading existing symbols if alternatives are available.

jmorse · 2024-02-05T16:29:21Z

On top of what Stephen said, using a new token to make debug-info separate encourages everyone to consider debug-info as a first class element of LLVM. The new representation isn't either Metadata or Value based, it's the glue between the two.

SLTozer · 2024-02-16T14:14:52Z

I've had to put aside testing for this patch for a short while, but I'll soon be uploading about a dozen small parsing tests for the various parsing errors that can occur in this patch; meanwhile, are there are any more questions or requested changes for the functional component for this patch, or is it good to land once the tests are available and approved?

SLTozer · 2024-02-19T14:10:30Z

Updates added: besides adding some more tests for the negative cases as promised, and some slight fixes in the code itself, I've also updated this patch to use parentheses for debug records following the discussion on the printing patch.

danilaml · 2024-02-19T16:09:31Z

llvm/lib/AsmParser/LLParser.cpp

+// here.
+bool finalizeDebugInfoFormat(Module *M) {
+  if (M)
+    M->setIsNewDbgInfoFormat(false);


Why can't it use UseNewDbgInfoFormat value (and why isn't it used, if it's declared)?

Just due to an awkward combination of factors at the moment: not all LLVM features internally support the new debug info format yet, so we use UseNewDbgInfoFormat immediately before running optimization passes (the only part that fully supports the new format right now) and convert back afterwards. Right now UseNewDbgInfoFormat defaults to true, so if we use it here to conditionally use the new format then we'll break the tools that use the parser but don't support the new format. We might be able to change that so that UseNewDbgInfoFormat is only true when we use a tool that fully supports it, but that probably involves a separate patch.

In short, the declaration is the mistake here - I'll remove it.

jmorse

This is looking good, I've got some moderate nits inline.

I've got a slight twitching feeling about the addition of the debug-records as keywords (well, Identifiers) as it seems like quite a large commitment. However: we can always change the manner in which it's parsed, I think the format is good.

I'm not completely certain about making the mixing of debug-info formats a parsing error: wouldn't it be more natural to defer it until the Verifier? I get that that's impossible right now due to the conversions, but we could check for the existence of the three intrinsics by querying the Module in finalizeDebugInfoFormat and make that an error. That'd be less invasive than filtering the names of all call instruction targets. (Downside: no localised errors, but this doesn't strike me as being something hard to pin down).

Having an error on even the declaration of intrinsics, not just the use of them, seems like quite a firm requirement. I'm not sure if that'll bite us, but we can downgrade the check if that proves to be a problem.

Some sadly menial nits: IMO the test names shouldn't contain "removedi" as that's referring to a transitional project name, in several years time no-one will recognise that name, IMO they should be named something like "dbg-record-invalid-$num" or something? IMHO it's also worth generating a positive-input that successfully parses so that we can point people at it and indicate the range of different things that can be parsed.

(It feels like this is all pretty close!)

jmorse · 2024-02-22T13:19:38Z

llvm/include/llvm/AsmParser/LLToken.h

@@ -482,6 +483,8 @@ enum Kind {
  // Type valued tokens (TyVal).
  Type,

+  DbgRecordType,


How about a few words here indicating these are (presumably) entire-line records rather than something that can appear inline, or otherwise indicate it's significance?

I think actually these need to be moved to the "String valued tokens" above, since they follow the same model (a DbgRecordType token is just the text "dbg_foo").

jmorse · 2024-02-22T13:22:10Z

llvm/lib/AsmParser/LLLexer.cpp

@@ -440,7 +440,9 @@ lltok::Kind LLLexer::LexCaret() {
 ///    AttrGrpID ::= #[0-9]+


Should we be updating these BNF descriptions too? (probably just a case of adding '#'?

jmorse · 2024-02-22T13:39:08Z

llvm/lib/AsmParser/LLParser.cpp

+}
+
+/// parseDebugProgramValue
+///   ::= #dbg_Type '{' (ValueAsMetadata|DIArgList|MDNode) ',' MetadataID ','


We're now using parens rather than brackets.

For consistency, we should avoid using DIArgList etc here as that delves into the class hierarchy. The parsing is described in terms of BNF types, so I think we just say it's a Metadata?

I think it's not entirely described in terms of actual token types - we refer to "Instructions" in other descriptions, for example, which aren't tokens but parseable objects, i.e. anything that we can call parseFoo for - but we can follow that description more closely than I've done here, since we do for example call parseMetadata for the "value" arguments.

jmorse · 2024-02-22T14:01:04Z

llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll

@@ -0,0 +1,87 @@
+;; Test that we can write in the old debug info format.


Implicit check-not's for "#dbg" too?

SLTozer · 2024-02-22T15:37:30Z

I've got a slight twitching feeling about the addition of the debug-records as keywords (well, Identifiers) as it seems like quite a large commitment. However: we can always change the manner in which it's parsed, I think the format is good.

I get similar twitches, but at the same time it makes sense - the model here is the same, imo, as how we use a bunch of keywords - the names dbg_foo shouldn't clash with anything that currently exists, and they represent a small group of strings that, while it might be added to over compiler releases, is fixed for any given version of the compiler.

I'm not completely certain about making the mixing of debug-info formats a parsing error: wouldn't it be more natural to defer it until the Verifier?

There's been a lot of uncertainty for me over where to spread the checks between the parser and the verifier; one of the advantages of placing the checks in the parser is that we can give the user the actual location of a violating instance, while in the verifier not only can we not give a line+col position, but the original intrinsic may have been morphed by an upgrade (old IR files may contain dbg.values with extra arguments, for example), making it non-trivial to figure out where the offending line is, or what the problem is at all if you're unfamiliar with the new/old formats. On the other hand, it is aggressive to put these checks in the parser, so I'm open to changing it.

jmorse · 2024-02-26T10:50:09Z

For the mixing-old-and-new issue, AFAIUI the population of people hand-writing debug-info IR tests is small, and we usually copy existing tests to write them. It's also not a syntactic problem to report as parsing will succeed, but the Module will be ill formed, it's more akin to a Verifier error but it makes the most sense to detect it in the parser (to avoid answering questions about what mixture of old/new debug-info should be permitted).

IMO: it'd be better to print out an informative error at the end of parsing, something like "#dbg_* records were encountered during parsing, but llvm.dbg.* intrinsics were seen in functions [list first ten users of the relevant declaration?]. Cannot load a module with both flavours of debug-info". That avoids the invasive code changes but gives a good diagnosis of what's gone wrong.

SLTozer · 2024-02-26T11:56:51Z

It's also not a syntactic problem to report as parsing will succeed, but the Module will be ill formed, it's more akin to a Verifier error but it makes the most sense to detect it in the parser

SGTM, (this was the first way I implemented it anyway!)

github-actions · 2024-03-04T15:40:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

OCHyams

Latest code + test changes LGTM

@jmorse might want to check he's happy with the overall approach now though

OCHyams · 2024-03-07T09:38:17Z

llvm/test/Verifier/RemoveDI/blockbyref.ll

@@ -0,0 +1,18 @@
+; RUN: llvm-as -disable-output <%s 2>&1| FileCheck %s


nit: don't need <? and while you're there can you add a space between 1 and |?

EDIT: this is in a few of the other tests too. Were these copied and modified from existing tests? Feel free to ignore this nit

OCHyams · 2024-03-07T09:44:31Z

llvm/test/Verifier/RemoveDI/llvm.dbg.intrinsic-dbg-attachment.ll

+      !DILocalVariable(scope: !1),
+      !DIExpression(),
+      !{})
+; CHECK-LABEL: invalid #dbg record location


what's invalid about the "location" and which of the 3 locations (position, src-loc, var-loc) is this referring to?

The location overload strikes again... I'll update the verifier check to state more clearly that it's a DILocation.

SGTM thanks

jmorse · 2024-03-07T11:02:20Z

@jmorse might want to check he's happy with the overall approach now though

I'm fine with it, although we should be ready to adjust it if there's turbulence.

SLTozer added debuginfo llvm:ir labels Jan 29, 2024

SLTozer requested review from rnk, echristo, jmorse, adrian-prantl, OCHyams, dwblaikie and CarlosAlbertoEnciso January 29, 2024 12:47

SLTozer self-assigned this Jan 29, 2024

OCHyams reviewed Jan 29, 2024

View reviewed changes

adrian-prantl closed this Jan 29, 2024

adrian-prantl reopened this Jan 29, 2024

SLTozer mentioned this pull request Feb 8, 2024

[RemoveDIs] Add documentation for IR debug records #81156

Merged

danilaml reviewed Feb 19, 2024

View reviewed changes

jmorse reviewed Feb 22, 2024

View reviewed changes

SLTozer and others added 4 commits March 4, 2024 14:48

Add parsing for non-intrinsic debug values

df8fa60

Address review nits

617b529

Add tests, use brackets, minor fixes

c6e5aff

Rename tests, adjust lexer/token descriptions

ac969d3

SLTozer added 5 commits March 4, 2024 14:48

Parser fixes for DPLabel

2e929cb

Parser fixes

f1903e0

Unused variables

3bee366

Add tests (requires parsing)

ea2a924

Delete tests for DIExpression validity

b56190d

SLTozer force-pushed the ddd-ir-parse-standalone branch from 4b2e84d to b56190d Compare March 4, 2024 15:37

SLTozer added 3 commits March 4, 2024 17:00

clang-format

5702fea

Don't error-out when we see an intrinsic decl

758ef7c

Remove hanging comment

d87a9aa

OCHyams reviewed Mar 7, 2024

View reviewed changes

Update 'bad location' check to be more specific

6e90f7a

OCHyams approved these changes Mar 7, 2024

View reviewed changes

SLTozer merged commit 464d9d9 into llvm:main Mar 7, 2024

		/// ::= #dbg_Type { (ValueAsMetadata\|DIArgList\|MDNode), MetadataID,
		/// DIExpression, DILocation }

		@@ -440,7 +440,9 @@ lltok::Kind LLLexer::LexCaret() {
		/// AttrGrpID ::= #[0-9]+

		@@ -0,0 +1,87 @@
		;; Test that we can write in the old debug info format.

		@@ -0,0 +1,18 @@
		; RUN: llvm-as -disable-output <%s 2>&1\| FileCheck %s

[RemoveDIs][DebugInfo][IR] Add parsing for non-intrinsic debug values #79818

[RemoveDIs][DebugInfo][IR] Add parsing for non-intrinsic debug values #79818

Uh oh!

Conversation

SLTozer commented Jan 29, 2024

Uh oh!

llvmbot commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OCHyams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrian-prantl commented Jan 29, 2024

Uh oh!

adrian-prantl commented Jan 29, 2024

Uh oh!

SLTozer commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmorse commented Feb 5, 2024

Uh oh!

SLTozer commented Feb 16, 2024

Uh oh!

SLTozer commented Feb 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmorse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer commented Feb 22, 2024

Uh oh!

jmorse commented Feb 26, 2024

Uh oh!

SLTozer commented Feb 26, 2024

Uh oh!

github-actions bot commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OCHyams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmorse commented Mar 7, 2024

Uh oh!

llvmbot commented Jan 29, 2024 •

edited

Loading

SLTozer commented Jan 30, 2024 •

edited

Loading

github-actions bot commented Mar 4, 2024 •

edited

Loading