-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RemoveDIs][DebugInfo][IR] Add parsing for non-intrinsic debug values #79818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-llvm-ir Author: Stephen Tozer (SLTozer) ChangesThis patch adds support for parsing the proposed non-instruction debug info ("RemoveDIs") from textual IR. This patch does not cover bitcode parsing or documentation, and printing and verifier support are added in prior PRs (#79281, #79810); this patch is presented as a standalone patch for the sake of review, but it is intended to land after the aforementioned patches and will not compile without being based on them[0]. This patch also contains a basic affirmative test ("can we parse the new format correctly"), but not the negative tests (testing for parser errors) - these will be added soon, but shouldn't hold up review of the implementation details of the patch. An important detail of this patch is the fact that although we can now parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we will always convert back to the old format at the end of parsing - this is done for two reasons: firstly to ensure that every tool is able to process IR printed in the new format, regardless of whether that tool has had RemoveDIs support added, and secondly to maintain the effect of the existing flags: for the tools where support for the new format has been added, we will run LLVM passes in the new format iff For reference, the format of the RemoveDIs debug info is as follows:
For the sake of review, a few notes on the design and implementation of this patch (some of these will be added to the docs in a subsequent patch):
[0] If you want to test the parser patch locally, there is a branch with all the prerequisite patches applied at: https://github.com/SLTozer/llvm-project/tree/ddd-ir-parse Full diff: https://github.com/llvm/llvm-project/pull/79818.diff 7 Files Affected:
diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index cf358c384f5203..6e3facbfcea573 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -177,6 +177,9 @@ namespace llvm {
/// UpgradeDebuginfo so it can generate broken bitcode.
bool UpgradeDebugInfo;
+ bool SeenNewDbgInfoFormat = false;
+ bool SeenOldDbgInfoFormat = false;
+
std::string SourceFileName;
public:
@@ -585,6 +588,7 @@ namespace llvm {
bool parseMDNodeTail(MDNode *&N);
bool parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts);
bool parseMetadataAttachment(unsigned &Kind, MDNode *&MD);
+ bool parseDebugProgramValue(DPValue *&DPV, PerFunctionState &PFS);
bool parseInstructionMetadata(Instruction &Inst);
bool parseGlobalObjectMetadataAttachment(GlobalObject &GO);
bool parseOptionalFunctionMetadata(Function &F);
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index 147cf56c821aa1..4ce97041ca309e 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -36,6 +36,7 @@ enum Kind {
exclaim, // !
bar, // |
colon, // :
+ hash, // #
kw_vscale,
kw_x,
@@ -482,6 +483,8 @@ enum Kind {
// Type valued tokens (TyVal).
Type,
+ DbgRecordType,
+
APFloat, // APFloatVal
APSInt // APSInt
};
diff --git a/llvm/include/llvm/IR/DebugProgramInstruction.h b/llvm/include/llvm/IR/DebugProgramInstruction.h
index 737417fb9b9a54..284bfdd18db94f 100644
--- a/llvm/include/llvm/IR/DebugProgramInstruction.h
+++ b/llvm/include/llvm/IR/DebugProgramInstruction.h
@@ -129,6 +129,30 @@ class DPValue : public ilist_node<DPValue>, private DebugValueUser {
DIAssignID *AssignID, Metadata *Address,
DIExpression *AddressExpression, const DILocation *DI);
+private:
+ /// Private constructor for creating new instances during parsing only. Only
+ /// called through `createUnresolvedDPValue` below, which makes clear that
+ /// this is used for parsing only, and will later return a subclass depending
+ /// on which Type is passed.
+ DPValue(LocationType Type, Metadata *Val, MDNode *Variable,
+ DIExpression *Expression, MDNode *AssignID, Metadata *Address,
+ DIExpression *AddressExpression, MDNode *DI);
+
+public:
+ /// Used to create DPValues during parsing, where some metadata references may
+ /// still be unresolved. Although for some fields a generic `Metadata*`
+ /// argument is accepted for forward type-references, the verifier and
+ /// accessors will reject incorrect types later on. The function is used for
+ /// all types of DPValues for simplicity while parsing, but asserts if any
+ /// necessary fields are empty or unused fields are not empty, i.e. if the
+ /// #dbg_assign fields are used for a non-dbg-assign type.
+ static DPValue *createUnresolvedDPValue(LocationType Type, Metadata *Val,
+ MDNode *Variable,
+ DIExpression *Expression,
+ MDNode *AssignID, Metadata *Address,
+ DIExpression *AddressExpression,
+ MDNode *DI);
+
static DPValue *createDPVAssign(Value *Val, DILocalVariable *Variable,
DIExpression *Expression,
DIAssignID *AssignID, Value *Address,
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index c8da3efbb68aff..4daf8a484da63e 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -440,7 +440,9 @@ lltok::Kind LLLexer::LexCaret() {
/// AttrGrpID ::= #[0-9]+
lltok::Kind LLLexer::LexHash() {
// Handle AttrGrpID: #[0-9]+
- return LexUIntID(lltok::AttrGrpID);
+ if (isdigit(static_cast<unsigned char>(CurPtr[0])))
+ return LexUIntID(lltok::AttrGrpID);
+ return lltok::hash;
}
/// Lex a label, integer type, keyword, or hexadecimal integer constant.
@@ -922,6 +924,20 @@ lltok::Kind LLLexer::LexIdentifier() {
#undef DWKEYWORD
+// Keywords for debug record types.
+#define DBGRECORDTYPEKEYWORD(STR) \
+ do { \
+ if (Keyword == "dbg_" #STR) { \
+ StrVal = #STR; \
+ return lltok::DbgRecordType; \
+ } \
+ } while (false)
+
+ DBGRECORDTYPEKEYWORD(value);
+ DBGRECORDTYPEKEYWORD(declare);
+ DBGRECORDTYPEKEYWORD(assign);
+#undef DBGRECORDTYPEKEYWORD
+
if (Keyword.starts_with("DIFlag")) {
StrVal.assign(Keyword.begin(), Keyword.end());
return lltok::DIFlag;
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index d6c5993797de11..b22d21f94a26ab 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -62,6 +62,8 @@ static cl::opt<bool> AllowIncompleteIR(
"Allow incomplete IR on a best effort basis (references to unknown "
"metadata will be dropped)"));
+extern llvm::cl::opt<bool> UseNewDbgInfoFormat;
+
static std::string getTypeString(Type *T) {
std::string Result;
raw_string_ostream Tmp(Result);
@@ -6012,6 +6014,17 @@ bool LLParser::parseTypeAndBasicBlock(BasicBlock *&BB, LocTy &Loc,
return false;
}
+bool isOldDbgFormatIntrinsic(StringRef Name) {
+ // Exit early for the common (non-debug-intrinsic) case.
+ // We can make this the only check when we begin supporting all "llvm.dbg"
+ // intrinsics in the new debug info format.
+ if (!Name.starts_with("llvm.dbg."))
+ return false;
+ Intrinsic::ID FnID = Function::lookupIntrinsicID(Name);
+ return FnID == Intrinsic::dbg_declare || FnID == Intrinsic::dbg_value ||
+ FnID == Intrinsic::dbg_assign;
+}
+
/// FunctionHeader
/// ::= OptionalLinkage OptionalPreemptionSpecifier OptionalVisibility
/// OptionalCallingConv OptRetAttrs OptUnnamedAddr Type GlobalName
@@ -6194,6 +6207,13 @@ bool LLParser::parseFunctionHeader(Function *&Fn, bool IsDefine,
}
}
+ if (isOldDbgFormatIntrinsic(FunctionName)) {
+ if (SeenNewDbgInfoFormat)
+ return error(NameLoc, "llvm.dbg intrinsic should not appear in a module "
+ "using non-intrinsic debug info");
+ SeenOldDbgInfoFormat = true;
+ }
+
Fn = Function::Create(FT, GlobalValue::ExternalLinkage, AddrSpace,
FunctionName, M);
@@ -6359,9 +6379,29 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {
std::string NameStr;
- // parse the instructions in this block until we get a terminator.
+ // parse the instructions and debug values in this block until we get a
+ // terminator.
Instruction *Inst;
+ DPValue *DPV;
+ SmallVector<std::unique_ptr<DPValue>> TrailingDPValues;
do {
+ // Handle debug records first - there should always be an instruction
+ // following the debug records, i.e. they cannot appear after the block
+ // terminator.
+ while (Lex.getKind() == lltok::hash) {
+ if (SeenOldDbgInfoFormat)
+ return error(Lex.getLoc(), "debug record should not appear in a module "
+ "containing debug info intrinsics");
+ SeenNewDbgInfoFormat = true;
+ Lex.Lex();
+ if (!BB->getModule()->IsNewDbgInfoFormat)
+ BB->getModule()->convertToNewDbgValues();
+
+ if (parseDebugProgramValue(DPV, PFS))
+ return true;
+ TrailingDPValues.emplace_back(DPV);
+ }
+
// This instruction may have three possibilities for a name: a) none
// specified, b) name specified "%foo =", c) number specified: "%4 =".
LocTy NameLoc = Lex.getLoc();
@@ -6406,11 +6446,103 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) {
// Set the name on the instruction.
if (PFS.setInstName(NameID, NameStr, NameLoc, Inst))
return true;
+
+ // Attach any preceding debug values to this instruction.
+ for (std::unique_ptr<DPValue> &DPV : TrailingDPValues) {
+ BB->insertDPValueBefore(DPV.release(), Inst->getIterator());
+ }
+ TrailingDPValues.clear();
} while (!Inst->isTerminator());
+ assert(TrailingDPValues.empty() &&
+ "All debug values should have been attached to an instruction.");
+
return false;
}
+/// parseDebugProgramValue
+/// ::= #dbg_Type { (ValueAsMetadata|DIArgList|MDNode), MetadataID,
+/// DIExpression, DILocation }
+bool LLParser::parseDebugProgramValue(DPValue *&DPV, PerFunctionState &PFS) {
+ using LocType = DPValue::LocationType;
+ LocTy DPVLoc = Lex.getLoc();
+ if (Lex.getKind() != lltok::DbgRecordType) {
+ return error(DPVLoc, "expected debug record type here");
+ }
+ auto Type = StringSwitch<LocType>(Lex.getStrVal())
+ .Case("declare", LocType::Declare)
+ .Case("value", LocType::Value)
+ .Case("assign", LocType::Assign)
+ .Default(LocType::End);
+ if (Type == LocType::End)
+ return error(DPVLoc, "expected valid #dbg record here");
+ Lex.Lex();
+ if (parseToken(lltok::lbrace, "Expected '{' here"))
+ return true;
+
+ // Parse Value field...
+ Metadata *ValLocMD;
+ if (parseMetadata(ValLocMD, &PFS))
+ return true;
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+
+ // Parse Variable field...
+ MDNode *Variable;
+ if (parseMDNode(Variable))
+ return true;
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+
+ // Parse Expression field...
+ LocTy ExprLoc = Lex.getLoc();
+ Metadata *Expression;
+ if (parseMetadata(Expression, &PFS))
+ return true;
+ if (!isa<DIExpression>(Expression))
+ return error(ExprLoc, "expected valid DIExpression here");
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+
+ // Parse additional fields for #dbg_assign.
+ MDNode *AssignID = nullptr;
+ Metadata *AddressLocation = nullptr;
+ Metadata *AddressExpression = nullptr;
+ if (Type == LocType::Assign) {
+ // Parse DIAssignID...
+ if (parseMDNode(AssignID))
+ return true;
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+
+ // Parse address ValueAsMetadata...
+ if (parseMetadata(AddressLocation, &PFS))
+ return true;
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+
+ // Parse address DIExpression...
+ LocTy AddressExprLoc = Lex.getLoc();
+ if (parseMetadata(AddressExpression, &PFS))
+ return true;
+ if (!isa<DIExpression>(Expression))
+ return error(AddressExprLoc, "expected valid DIExpression here");
+ if (parseToken(lltok::comma, "Expected ',' here"))
+ return true;
+ }
+
+ /// Parse DILocation...
+ MDNode *DebugLoc;
+ if (parseMDNode(DebugLoc))
+ return true;
+
+ if (parseToken(lltok::rbrace, "Expected '}' here"))
+ return true;
+ DPV = DPValue::createUnresolvedDPValue(
+ Type, ValLocMD, Variable, cast<DIExpression>(Expression), AssignID,
+ AddressLocation, cast_or_null<DIExpression>(AddressExpression), DebugLoc);
+ return false;
+}
//===----------------------------------------------------------------------===//
// Instruction Parsing.
//===----------------------------------------------------------------------===//
@@ -7638,6 +7770,16 @@ bool LLParser::parseCall(Instruction *&Inst, PerFunctionState &PFS,
}
CI->setFastMathFlags(FMF);
}
+
+ if (CalleeID.Kind == ValID::t_GlobalName &&
+ isOldDbgFormatIntrinsic(CalleeID.StrVal)) {
+ if (SeenNewDbgInfoFormat) {
+ CI->deleteValue();
+ return error(CallLoc, "llvm.dbg intrinsic should not appear in a module "
+ "using non-intrinsic debug info");
+ }
+ SeenOldDbgInfoFormat = true;
+ }
CI->setAttributes(PAL);
ForwardRefAttrGroups[CI] = FwdRefAttrGrps;
Inst = CI;
diff --git a/llvm/lib/IR/DebugProgramInstruction.cpp b/llvm/lib/IR/DebugProgramInstruction.cpp
index fd234685d5fd4b..560ace9a84d70c 100644
--- a/llvm/lib/IR/DebugProgramInstruction.cpp
+++ b/llvm/lib/IR/DebugProgramInstruction.cpp
@@ -59,6 +59,23 @@ DPValue::DPValue(Metadata *Value, DILocalVariable *Variable,
void DPValue::deleteInstr() { delete this; }
+DPValue::DPValue(DPValue::LocationType Type, Metadata *Val, MDNode *Variable,
+ DIExpression *Expression, MDNode *AssignID, Metadata *Address,
+ DIExpression *AddressExpression, MDNode *DI)
+ : DebugValueUser({Val, Address, AssignID}), Variable(Variable),
+ Expression(Expression), DbgLoc(DI), AddressExpression(AddressExpression),
+ Type(Type) {}
+
+DPValue *DPValue::createUnresolvedDPValue(DPValue::LocationType Type,
+ Metadata *Val, MDNode *Variable,
+ DIExpression *Expression,
+ MDNode *AssignID, Metadata *Address,
+ DIExpression *AddressExpression,
+ MDNode *DI) {
+ return new DPValue(Type, Val, Variable, Expression, AssignID, Address,
+ AddressExpression, DI);
+}
+
DPValue *DPValue::createDPValue(Value *Location, DILocalVariable *DV,
DIExpression *Expr, const DILocation *DI) {
return new DPValue(ValueAsMetadata::get(Location), DV, Expr, DI,
diff --git a/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll b/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll
new file mode 100644
index 00000000000000..f8c0024977bc22
--- /dev/null
+++ b/llvm/test/DebugInfo/roundtrip-non-instruction-debug-info.ll
@@ -0,0 +1,87 @@
+;; Test that we can write in the old debug info format.
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=false < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+
+;; Test that we can write in the new debug info format...
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=true < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+;; ...and then read the new format and write the old format.
+; RUN: opt --passes=verify -S --write-experimental-debuginfo=true < %s \
+; RUN: | opt --passes=verify -S --write-experimental-debuginfo=false \
+; RUN: | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+
+;; Test also that the new flag is independent of the flag that enables use of
+;; these non-instruction debug info during LLVM passes.
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=false < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,OLDDBG --implicit-check-not=llvm.dbg
+; RUN: opt --passes=verify -S --try-experimental-debuginfo-iterators --write-experimental-debuginfo=true < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,NEWDBG --implicit-check-not=llvm.dbg
+
+; CHECK: @f(i32 %[[VAL_A:[0-9a-zA-Z]+]])
+; CHECK-NEXT: entry:
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata i32 %[[VAL_A]], metadata ![[VAR_A:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_1:[0-9]+]]
+; NEWDBG-NEXT: {{^}} #dbg_value { i32 %[[VAL_A]], ![[VAR_A:[0-9]+]], !DIExpression(), ![[LOC_1:[0-9]+]] }
+; CHECK-NEXT: {{^}} %[[VAL_B:[0-9a-zA-Z]+]] = alloca
+; OLDDBG-NEXT: call void @llvm.dbg.declare(metadata ptr %[[VAL_B]], metadata ![[VAR_B:[0-9]+]], metadata !DIExpression()), !dbg ![[LOC_2:[0-9]+]]
+; NEWDBG-NEXT: {{^}} #dbg_declare { ptr %[[VAL_B]], ![[VAR_B:[0-9]+]], !DIExpression(), ![[LOC_2:[0-9]+]] }
+; CHECK-NEXT: {{^}} %[[VAL_ADD:[0-9a-zA-Z]+]] = add i32 %[[VAL_A]], 5
+; OLDDBG-NEXT: call void @llvm.dbg.value(metadata !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), metadata ![[VAR_A]], metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg ![[LOC_3:[0-9]+]]
+; NEWDBG-NEXT: {{^}} #dbg_value { !DIArgList(i32 %[[VAL_A]], i32 %[[VAL_ADD]]), ![[VAR_A]], !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus), ![[LOC_3:[0-9]+]] }
+; CHECK-NEXT: {{^}} store i32 %[[VAL_ADD]]{{.+}}, !DIAssignID ![[ASSIGNID:[0-9]+]]
+; OLDDBG-NEXT: call void @llvm.dbg.assign(metadata i32 %[[VAL_ADD]], metadata ![[VAR_B]], metadata !DIExpression(), metadata ![[ASSIGNID]], metadata ptr %[[VAL_B]], metadata !DIExpression()), !dbg ![[LOC_4:[0-9]+]]
+; NEWDBG-NEXT: {{^}} #dbg_assign { i32 %[[VAL_ADD]], ![[VAR_B]], !DIExpression(), ![[ASSIGNID]], ptr %[[VAL_B]], !DIExpression(), ![[LOC_4:[0-9]+]] }
+; CHECK-NEXT: {{^}} ret i32
+
+; OLDDBG-DAG: declare void @llvm.dbg.value
+; OLDDBG-DAG: declare void @llvm.dbg.declare
+; OLDDBG-DAG: declare void @llvm.dbg.assign
+
+; CHECK-DAG: llvm.dbg.cu
+; CHECK-DAG: ![[VAR_A]] = !DILocalVariable(name: "a"
+; CHECK-DAG: ![[VAR_B]] = !DILocalVariable(name: "b"
+; CHECK-DAG: ![[LOC_1]] = !DILocation(line: 3, column: 15
+; CHECK-DAG: ![[LOC_2]] = !DILocation(line: 3, column: 20
+; CHECK-DAG: ![[LOC_3]] = !DILocation(line: 3, column: 25
+; CHECK-DAG: ![[LOC_4]] = !DILocation(line: 3, column: 30
+
+define dso_local i32 @f(i32 %a) !dbg !7 {
+entry:
+ call void @llvm.dbg.value(metadata i32 %a, metadata !20, metadata !DIExpression()), !dbg !30
+ %b = alloca i32, !dbg !30, !DIAssignID !40
+ call void @llvm.dbg.declare(metadata ptr %b, metadata !21, metadata !DIExpression()), !dbg !31
+ %add = add i32 %a, 5, !dbg !31
+ call void @llvm.dbg.value(metadata !DIArgList(i32 %a, i32 %add), metadata !20, metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus)), !dbg !32
+ store i32 %add, ptr %b, !dbg !32, !DIAssignID !40
+ call void @llvm.dbg.assign(metadata i32 %add, metadata !21, metadata !DIExpression(), metadata !40, metadata ptr %b, metadata !DIExpression()), !dbg !33
+ ret i32 %add, !dbg !33
+
+}
+
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+declare void @llvm.dbg.declare(metadata, metadata, metadata)
+declare void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 18.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
+!1 = !DIFile(filename: "print.c", directory: "/tmp")
+!2 = !{}
+!3 = !{i32 2, !"Dwarf Version", i32 5}
+!4 = !{i32 2, !"Debug Info Version", i32 3}
+!5 = !{i32 1, !"wchar_size", i32 4}
+!6 = !{!"clang version 18.0.0"}
+!7 = distinct !DISubprogram(name: "f", scope: !1, file: !1, line: 3, type: !8, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !13)
+!8 = !DISubroutineType(types: !9)
+!9 = !{!12, !12}
+!12 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!13 = !{!20, !21}
+!20 = !DILocalVariable(name: "a", arg: 1, scope: !7, file: !1, line: 3, type: !12)
+!21 = !DILocalVariable(name: "b", scope: !7, file: !1, line: 3, type: !12)
+!30 = !DILocation(line: 3, column: 15, scope: !7)
+!31 = !DILocation(line: 3, column: 20, scope: !7)
+!32 = !DILocation(line: 3, column: 25, scope: !7)
+!33 = !DILocation(line: 3, column: 30, scope: !7)
+!40 = distinct !DIAssignID()
\ No newline at end of file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and test seem reasonable (with a few small nits scattered inline). Probably worth adding some negative (bad-input) tests, and another pair of eyes on this wouldn't hurt.
llvm/lib/AsmParser/LLParser.cpp
Outdated
@@ -6359,9 +6379,29 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) { | |||
|
|||
std::string NameStr; | |||
|
|||
// parse the instructions in this block until we get a terminator. | |||
// parse the instructions and debug values in this block until we get a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since you're touching this comment anyway can you capitalise parse
?
llvm/lib/AsmParser/LLParser.cpp
Outdated
@@ -6406,11 +6446,103 @@ bool LLParser::parseBasicBlock(PerFunctionState &PFS) { | |||
// Set the name on the instruction. | |||
if (PFS.setInstName(NameID, NameStr, NameLoc, Inst)) | |||
return true; | |||
|
|||
// Attach any preceding debug values to this instruction. | |||
for (std::unique_ptr<DPValue> &DPV : TrailingDPValues) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unneeded braces
llvm/lib/AsmParser/LLParser.cpp
Outdated
/// ::= #dbg_Type { (ValueAsMetadata|DIArgList|MDNode), MetadataID, | ||
/// DIExpression, DILocation } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment needs updating for dbg.assigns
llvm/lib/AsmParser/LLParser.cpp
Outdated
if (parseToken(lltok::lbrace, "Expected '{' here")) | ||
return true; | ||
|
||
// Parse Value field... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: full stop rather than ellipsis please, unless this is a common style in this file
LocTy DPVLoc = Lex.getLoc(); | ||
if (Lex.getKind() != lltok::DbgRecordType) { | ||
return error(DPVLoc, "expected debug record type here"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: braces
llvm/lib/AsmParser/LLParser.cpp
Outdated
Instruction *Inst; | ||
DPValue *DPV; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sink the DPV
variable down to the use in the loop?
Let me know if this has been discussed elsewhere, but why is necessary to add a new |
Whoops, sorry! No Idea why the "Close with comment button" is larger than the "Comment" button! |
The purpose of the leading |
On top of what Stephen said, using a new token to make debug-info separate encourages everyone to consider debug-info as a first class element of LLVM. The new representation isn't either Metadata or Value based, it's the glue between the two. |
I've had to put aside testing for this patch for a short while, but I'll soon be uploading about a dozen small parsing tests for the various parsing errors that can occur in this patch; meanwhile, are there are any more questions or requested changes for the functional component for this patch, or is it good to land once the tests are available and approved? |
Updates added: besides adding some more tests for the negative cases as promised, and some slight fixes in the code itself, I've also updated this patch to use parentheses for debug records following the discussion on the printing patch. |
// here. | ||
bool finalizeDebugInfoFormat(Module *M) { | ||
if (M) | ||
M->setIsNewDbgInfoFormat(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't it use UseNewDbgInfoFormat
value (and why isn't it used, if it's declared)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just due to an awkward combination of factors at the moment: not all LLVM features internally support the new debug info format yet, so we use UseNewDbgInfoFormat
immediately before running optimization passes (the only part that fully supports the new format right now) and convert back afterwards. Right now UseNewDbgInfoFormat
defaults to true, so if we use it here to conditionally use the new format then we'll break the tools that use the parser but don't support the new format. We might be able to change that so that UseNewDbgInfoFormat
is only true when we use a tool that fully supports it, but that probably involves a separate patch.
In short, the declaration is the mistake here - I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, I've got some moderate nits inline.
I've got a slight twitching feeling about the addition of the debug-records as keywords (well, Identifiers) as it seems like quite a large commitment. However: we can always change the manner in which it's parsed, I think the format is good.
I'm not completely certain about making the mixing of debug-info formats a parsing error: wouldn't it be more natural to defer it until the Verifier? I get that that's impossible right now due to the conversions, but we could check for the existence of the three intrinsics by querying the Module
in finalizeDebugInfoFormat
and make that an error. That'd be less invasive than filtering the names of all call instruction targets. (Downside: no localised errors, but this doesn't strike me as being something hard to pin down).
Having an error on even the declaration of intrinsics, not just the use of them, seems like quite a firm requirement. I'm not sure if that'll bite us, but we can downgrade the check if that proves to be a problem.
Some sadly menial nits: IMO the test names shouldn't contain "removedi" as that's referring to a transitional project name, in several years time no-one will recognise that name, IMO they should be named something like "dbg-record-invalid-$num" or something? IMHO it's also worth generating a positive-input that successfully parses so that we can point people at it and indicate the range of different things that can be parsed.
(It feels like this is all pretty close!)
@@ -482,6 +483,8 @@ enum Kind { | |||
// Type valued tokens (TyVal). | |||
Type, | |||
|
|||
DbgRecordType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a few words here indicating these are (presumably) entire-line records rather than something that can appear inline, or otherwise indicate it's significance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually these need to be moved to the "String valued tokens" above, since they follow the same model (a DbgRecordType
token is just the text "dbg_foo").
@@ -440,7 +440,9 @@ lltok::Kind LLLexer::LexCaret() { | |||
/// AttrGrpID ::= #[0-9]+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be updating these BNF descriptions too? (probably just a case of adding '#'?
llvm/lib/AsmParser/LLParser.cpp
Outdated
} | ||
|
||
/// parseDebugProgramValue | ||
/// ::= #dbg_Type '{' (ValueAsMetadata|DIArgList|MDNode) ',' MetadataID ',' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're now using parens rather than brackets.
For consistency, we should avoid using DIArgList etc here as that delves into the class hierarchy. The parsing is described in terms of BNF types, so I think we just say it's a Metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not entirely described in terms of actual token types - we refer to "Instructions" in other descriptions, for example, which aren't tokens but parseable objects, i.e. anything that we can call parseFoo
for - but we can follow that description more closely than I've done here, since we do for example call parseMetadata
for the "value" arguments.
@@ -0,0 +1,87 @@ | |||
;; Test that we can write in the old debug info format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit check-not's for "#dbg" too?
I get similar twitches, but at the same time it makes sense - the model here is the same, imo, as how we use a bunch of keywords - the names
There's been a lot of uncertainty for me over where to spread the checks between the parser and the verifier; one of the advantages of placing the checks in the parser is that we can give the user the actual location of a violating instance, while in the verifier not only can we not give a line+col position, but the original intrinsic may have been morphed by an upgrade (old IR files may contain dbg.values with extra arguments, for example), making it non-trivial to figure out where the offending line is, or what the problem is at all if you're unfamiliar with the new/old formats. On the other hand, it is aggressive to put these checks in the parser, so I'm open to changing it. |
For the mixing-old-and-new issue, AFAIUI the population of people hand-writing debug-info IR tests is small, and we usually copy existing tests to write them. It's also not a syntactic problem to report as parsing will succeed, but the Module will be ill formed, it's more akin to a Verifier error but it makes the most sense to detect it in the parser (to avoid answering questions about what mixture of old/new debug-info should be permitted). IMO: it'd be better to print out an informative error at the end of parsing, something like "#dbg_* records were encountered during parsing, but llvm.dbg.* intrinsics were seen in functions [list first ten users of the relevant declaration?]. Cannot load a module with both flavours of debug-info". That avoids the invasive code changes but gives a good diagnosis of what's gone wrong. |
SGTM, (this was the first way I implemented it anyway!) |
4b2e84d
to
b56190d
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest code + test changes LGTM
@jmorse might want to check he's happy with the overall approach now though
@@ -0,0 +1,18 @@ | |||
; RUN: llvm-as -disable-output <%s 2>&1| FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't need <
? and while you're there can you add a space between 1
and |
?
EDIT: this is in a few of the other tests too. Were these copied and modified from existing tests? Feel free to ignore this nit
!DILocalVariable(scope: !1), | ||
!DIExpression(), | ||
!{}) | ||
; CHECK-LABEL: invalid #dbg record location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's invalid about the "location" and which of the 3 locations (position, src-loc, var-loc) is this referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location overload strikes again... I'll update the verifier check to state more clearly that it's a DILocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM thanks
I'm fine with it, although we should be ready to adjust it if there's turbulence. |
This patch adds support for parsing the proposed non-instruction debug info ("RemoveDIs") from textual IR. This patch does not cover bitcode parsing or documentation, and printing and verifier support are added in prior PRs (#79281, #79810); this patch is presented as a standalone patch for the sake of review, but it is intended to land after the aforementioned patches and will not compile without being based on them[0]. This patch also contains a basic affirmative test ("can we parse the new format correctly"), but not the negative tests (testing for parser errors) - these will be added soon, but shouldn't hold up review of the implementation details of the patch.
An important detail of this patch is the fact that although we can now parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we will always convert back to the old format at the end of parsing - this is done for two reasons: firstly to ensure that every tool is able to process IR printed in the new format, regardless of whether that tool has had RemoveDIs support added, and secondly to maintain the effect of the existing flags: for the tools where support for the new format has been added, we will run LLVM passes in the new format iff
--try-experimental-debuginfo-iterators=true
, and we will print in the new format iff--write-experimental-debuginfo-iterators=true
; the format of the textual IR should have no effect on either of these features.For reference, the format of the RemoveDIs debug info is as follows:
For the sake of review, a few notes on the design and implementation of this patch (some of these will be added to the docs in a subsequent patch):
#
as a token to the lexer, and keywords for each debug record variant.llvm.dbg.label
.[0] If you want to test the parser patch locally, there is a branch with all the prerequisite patches applied at: https://github.com/SLTozer/llvm-project/tree/ddd-ir-parse