-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer. This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken. The DWARF specification version 3 mentions in the Data Representation section, page 127: > If one of the DW_FORM_data<n> forms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>. Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer. Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether. This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation. One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.
@llvm/pr-subscribers-debuginfo Author: Victor Campos (vhscampos) ChangesBefore this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer. This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken. The DWARF specification version 3 mentions in the Data Representation section, page 127: > If one of the DW_FORM_data<n> forms is used to represent a signed or Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer. Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether. This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation. One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values. Full diff: https://github.com/llvm/llvm-project/pull/87994.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index c40beeeb925e0e..a21c7d80cbf25f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -30,6 +30,7 @@
#include "llvm/Target/TargetLoweringObjectFile.h"
#include <cassert>
#include <cstdint>
+#include <limits>
#include <string>
#include <utility>
@@ -1644,7 +1645,8 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
addUInt(MemberDie, dwarf::DW_AT_byte_size, std::nullopt, FieldSize / 8);
addUInt(MemberDie, dwarf::DW_AT_bit_size, std::nullopt, Size);
- uint64_t Offset = DT->getOffsetInBits();
+ assert(DT->getOffsetInBits() <= std::numeric_limits<int64_t>::max());
+ int64_t Offset = DT->getOffsetInBits();
// We can't use DT->getAlignInBits() here: AlignInBits for member type
// is non-zero if and only if alignment was forced (e.g. _Alignas()),
// which can't be done with bitfields. Thus we use FieldSize here.
@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
if (Asm->getDataLayout().isLittleEndian())
Offset = FieldSize - (Offset + Size);
- addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
+ if (Offset < 0)
+ addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,
+ Offset);
+ else
+ addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt,
+ (uint64_t)Offset);
OffsetInBytes = FieldOffset >> 3;
} else {
addUInt(MemberDie, dwarf::DW_AT_data_bit_offset, std::nullopt, Offset);
diff --git a/llvm/test/DebugInfo/ARM/bitfield.ll b/llvm/test/DebugInfo/ARM/bitfield.ll
index 5bd06b785b159c..672c61db6f4912 100644
--- a/llvm/test/DebugInfo/ARM/bitfield.ll
+++ b/llvm/test/DebugInfo/ARM/bitfield.ll
@@ -12,7 +12,7 @@
; CHECK: DW_AT_name {{.*}} "reserved"
; CHECK: DW_AT_byte_size {{.*}} (0x04)
; CHECK: DW_AT_bit_size {{.*}} (0x1c)
-; CHECK: DW_AT_bit_offset {{.*}} (0xfffffffffffffff8)
+; CHECK: DW_AT_bit_offset {{.*}} (-8)
; CHECK: DW_AT_data_member_location {{.*}} (DW_OP_plus_uconst 0x0)
%struct.anon = type { i8, [5 x i8] }
diff --git a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
index e2097d7f49b48f..62ffa0a4001f18 100644
--- a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
@@ -16,7 +16,7 @@
; CHECK-NEXT: .b8 1 // DW_AT_byte_size
; CHECK-NEXT: .b8 6 // DW_AT_bit_size
; Negative offset must be encoded as an unsigned integer.
-; CHECK-NEXT: .b64 0xffffffffffffffff // DW_AT_bit_offset
+; CHECK-NEXT: .b8 127 // DW_AT_bit_offset
; CHECK-NEXT: .b8 2 // DW_AT_data_member_location
%struct.anon = type { i16 }
diff --git a/llvm/test/DebugInfo/X86/packed_bitfields.ll b/llvm/test/DebugInfo/X86/packed_bitfields.ll
index 0e541f09d22703..614fa59c367848 100644
--- a/llvm/test/DebugInfo/X86/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/X86/packed_bitfields.ll
@@ -15,7 +15,7 @@
; CHECK-NOT: DW_TAG_member
; CHECK: DW_AT_byte_size {{.*}} (0x01)
; CHECK-NEXT: DW_AT_bit_size {{.*}} (0x06)
-; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff)
+; CHECK-NEXT: DW_AT_bit_offset {{.*}} (-1)
; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}})
; ModuleID = 'repro.c'
|
When is a negative bit offset used? |
This is apparent in packed structs with bitfields: struct {
char : 3;
char a : 6;
} __attribute__((__packed__)) b; gcc produces the following debug information with
.debug_info:
In contrast, clang generates this:
.debug_info:
Note how clang uses DW_FORM_data8 and a value of 0xffffffffffffffff |
LGTM for NVPTX test change. |
@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) { | |||
if (Asm->getDataLayout().isLittleEndian()) | |||
Offset = FieldSize - (Offset + Size); | |||
|
|||
addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset); | |||
if (Offset < 0) | |||
addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a version check to make sure it's used in DWARFv3, but not earlier? (or perhaps we don't support DWARFv2? Not sure - but the useDWARF2Bitfields
above gives me pause)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing the specifications for DWARF 2 and DWARF 3, I can see that the section about bit fields is semantically the same. In the DWARF 2 spec, it's on page 42.
Also my understanding is that useDWARF2Bitfields
includes v3 as well. Presumably because bit field's semantics haven't changed between the two versions.
As an extra check, I confirmed that gcc produces the same debug info for v2 and v3 in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shrug I'm still confused why we emit negative values here & not sure how much to support DWARFv3 these days... , but if it's consistent with GCC & doesn't seem too invasive, I can't fault it too much.
Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.
This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data as unsigned, then negative offsets would be completely broken.
The DWARF specification version 3 mentions in the Data Representation section, page 127:
Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.
Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.
This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.
One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.