Skip to content

[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2024

Conversation

vhscampos
Copy link
Member

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data as unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation section, page 127:

If one of the DW_FORM_data forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

Before this patch, the value of DW_AT_bit_offset, used for bitfields
before DWARF version 4, was always emitted as an unsigned integer using
the form DW_FORM_data<n>. If the value was originally a signed integer,
for instance in the case of negative offsets, it was up to debug
information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be
read as signed or unsigned was put onto the debug info consumers: the
DWARF specification doesn't define DW_AT_bit_offset's underlying type.
If a debugger decided to interpret this attribute in the form data<n> as
unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation
section, page 127:

> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context
necessary to determine which interpretation is intended. Producers are
therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for
signed and unsigned integers respectively, rather than DW_FORM_data<n>.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly
signed. This is an indication to consumers that the offset must be
parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the
potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of
DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any
potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the
sake of parity with gcc, it is safe to restrict the change to negative
values.
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2024

@llvm/pr-subscribers-debuginfo

Author: Victor Campos (vhscampos)

Changes

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation section, page 127:

> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.


Full diff: https://github.com/llvm/llvm-project/pull/87994.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+9-2)
  • (modified) llvm/test/DebugInfo/ARM/bitfield.ll (+1-1)
  • (modified) llvm/test/DebugInfo/NVPTX/packed_bitfields.ll (+1-1)
  • (modified) llvm/test/DebugInfo/X86/packed_bitfields.ll (+1-1)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index c40beeeb925e0e..a21c7d80cbf25f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -30,6 +30,7 @@
 #include "llvm/Target/TargetLoweringObjectFile.h"
 #include <cassert>
 #include <cstdint>
+#include <limits>
 #include <string>
 #include <utility>
 
@@ -1644,7 +1645,8 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         addUInt(MemberDie, dwarf::DW_AT_byte_size, std::nullopt, FieldSize / 8);
       addUInt(MemberDie, dwarf::DW_AT_bit_size, std::nullopt, Size);
 
-      uint64_t Offset = DT->getOffsetInBits();
+      assert(DT->getOffsetInBits() <= std::numeric_limits<int64_t>::max());
+      int64_t Offset = DT->getOffsetInBits();
       // We can't use DT->getAlignInBits() here: AlignInBits for member type
       // is non-zero if and only if alignment was forced (e.g. _Alignas()),
       // which can't be done with bitfields. Thus we use FieldSize here.
@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         if (Asm->getDataLayout().isLittleEndian())
           Offset = FieldSize - (Offset + Size);
 
-        addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
+        if (Offset < 0)
+          addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,
+                  Offset);
+        else
+          addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt,
+                  (uint64_t)Offset);
         OffsetInBytes = FieldOffset >> 3;
       } else {
         addUInt(MemberDie, dwarf::DW_AT_data_bit_offset, std::nullopt, Offset);
diff --git a/llvm/test/DebugInfo/ARM/bitfield.ll b/llvm/test/DebugInfo/ARM/bitfield.ll
index 5bd06b785b159c..672c61db6f4912 100644
--- a/llvm/test/DebugInfo/ARM/bitfield.ll
+++ b/llvm/test/DebugInfo/ARM/bitfield.ll
@@ -12,7 +12,7 @@
 ; CHECK:          DW_AT_name {{.*}} "reserved"
 ; CHECK:          DW_AT_byte_size  {{.*}} (0x04)
 ; CHECK:          DW_AT_bit_size   {{.*}} (0x1c)
-; CHECK:          DW_AT_bit_offset {{.*}} (0xfffffffffffffff8)
+; CHECK:          DW_AT_bit_offset {{.*}} (-8)
 ; CHECK:          DW_AT_data_member_location {{.*}} (DW_OP_plus_uconst 0x0)
 
 %struct.anon = type { i8, [5 x i8] }
diff --git a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
index e2097d7f49b48f..62ffa0a4001f18 100644
--- a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
@@ -16,7 +16,7 @@
 ; CHECK-NEXT: .b8 1    // DW_AT_byte_size
 ; CHECK-NEXT: .b8 6    // DW_AT_bit_size
 ; Negative offset must be encoded as an unsigned integer.
-; CHECK-NEXT: .b64 0xffffffffffffffff // DW_AT_bit_offset
+; CHECK-NEXT: .b8 127  // DW_AT_bit_offset
 ; CHECK-NEXT: .b8 2    // DW_AT_data_member_location
 
 %struct.anon = type { i16 }
diff --git a/llvm/test/DebugInfo/X86/packed_bitfields.ll b/llvm/test/DebugInfo/X86/packed_bitfields.ll
index 0e541f09d22703..614fa59c367848 100644
--- a/llvm/test/DebugInfo/X86/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/X86/packed_bitfields.ll
@@ -15,7 +15,7 @@
 ; CHECK-NOT: DW_TAG_member
 ; CHECK:      DW_AT_byte_size  {{.*}} (0x01)
 ; CHECK-NEXT: DW_AT_bit_size   {{.*}} (0x06)
-; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff)
+; CHECK-NEXT: DW_AT_bit_offset {{.*}} (-1)
 ; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}})
 
 ; ModuleID = 'repro.c'

@dwblaikie
Copy link
Collaborator

When is a negative bit offset used?

@vhscampos
Copy link
Member Author

vhscampos commented Apr 8, 2024

This is apparent in packed structs with bitfields:

struct {
  char : 3;
  char a : 6;
} __attribute__((__packed__)) b;

gcc produces the following debug information with -gdwarf-3:
.debug_abbrev:

  3      DW_TAG_member    [no children]
    DW_AT_name         DW_FORM_string
    DW_AT_decl_file    DW_FORM_data1
    DW_AT_decl_line    DW_FORM_data1
    DW_AT_decl_column  DW_FORM_data1
    DW_AT_type         DW_FORM_ref4
    DW_AT_byte_size    DW_FORM_data1
    DW_AT_bit_size     DW_FORM_data1
    DW_AT_bit_offset   DW_FORM_sdata
    DW_AT_data_member_location DW_FORM_data1

.debug_info:

 <2><26>: Abbrev Number: 3 (DW_TAG_member)
    <27>   DW_AT_name        : a
    <29>   DW_AT_decl_file   : 1
    <2a>   DW_AT_decl_line   : 3
    <2b>   DW_AT_decl_column : 8
    <2c>   DW_AT_type        : <0x35>
    <30>   DW_AT_byte_size   : 1
    <31>   DW_AT_bit_size    : 6
    <32>   DW_AT_bit_offset  : -1
    <33>   DW_AT_data_member_location: 0

In contrast, clang generates this:
.debug_abbrev:

   4      DW_TAG_member    [no children]
    DW_AT_name         DW_FORM_strp
    DW_AT_type         DW_FORM_ref4
    DW_AT_decl_file    DW_FORM_data1
    DW_AT_decl_line    DW_FORM_data1
    DW_AT_byte_size    DW_FORM_data1
    DW_AT_bit_size     DW_FORM_data1
    DW_AT_bit_offset   DW_FORM_data8
    DW_AT_data_member_location DW_FORM_udata

.debug_info:

 <2><38>: Abbrev Number: 4 (DW_TAG_member)
    <39>   DW_AT_name        : (indirect string, offset: 0x34): a
    <3d>   DW_AT_type        : <0x4f>
    <41>   DW_AT_decl_file   : 1
    <42>   DW_AT_decl_line   : 3
    <43>   DW_AT_byte_size   : 1
    <44>   DW_AT_bit_size    : 6
    <45>   DW_AT_bit_offset  : 0xffffffffffffffff
    <4d>   DW_AT_data_member_location: 0

Note how clang uses DW_FORM_data8 and a value of 0xffffffffffffffff

@Artem-B
Copy link
Member

Artem-B commented Apr 8, 2024

LGTM for NVPTX test change.

@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
if (Asm->getDataLayout().isLittleEndian())
Offset = FieldSize - (Offset + Size);

addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
if (Offset < 0)
addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a version check to make sure it's used in DWARFv3, but not earlier? (or perhaps we don't support DWARFv2? Not sure - but the useDWARF2Bitfields above gives me pause)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the specifications for DWARF 2 and DWARF 3, I can see that the section about bit fields is semantically the same. In the DWARF 2 spec, it's on page 42.

Also my understanding is that useDWARF2Bitfields includes v3 as well. Presumably because bit field's semantics haven't changed between the two versions.

As an extra check, I confirmed that gcc produces the same debug info for v2 and v3 in this case.

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug I'm still confused why we emit negative values here & not sure how much to support DWARFv3 these days... , but if it's consistent with GCC & doesn't seem too invasive, I can't fault it too much.

@vhscampos vhscampos merged commit 119aecb into llvm:main May 13, 2024
@vhscampos vhscampos deleted the sdata branch May 13, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants