[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

vhscampos · 2024-04-08T14:08:05Z

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data as unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation section, page 127:

If one of the DW_FORM_data forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer. This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken. The DWARF specification version 3 mentions in the Data Representation section, page 127: > If one of the DW_FORM_data<n> forms is used to represent a signed or unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>. Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer. Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether. This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation. One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

llvmbot · 2024-04-08T14:08:35Z

@llvm/pr-subscribers-debuginfo

Author: Victor Campos (vhscampos)

Changes

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation section, page 127:

> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

Full diff: https://github.com/llvm/llvm-project/pull/87994.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+9-2)
(modified) llvm/test/DebugInfo/ARM/bitfield.ll (+1-1)
(modified) llvm/test/DebugInfo/NVPTX/packed_bitfields.ll (+1-1)
(modified) llvm/test/DebugInfo/X86/packed_bitfields.ll (+1-1)

diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index c40beeeb925e0e..a21c7d80cbf25f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -30,6 +30,7 @@
 #include "llvm/Target/TargetLoweringObjectFile.h"
 #include <cassert>
 #include <cstdint>
+#include <limits>
 #include <string>
 #include <utility>
 
@@ -1644,7 +1645,8 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         addUInt(MemberDie, dwarf::DW_AT_byte_size, std::nullopt, FieldSize / 8);
       addUInt(MemberDie, dwarf::DW_AT_bit_size, std::nullopt, Size);
 
-      uint64_t Offset = DT->getOffsetInBits();
+      assert(DT->getOffsetInBits() <= std::numeric_limits<int64_t>::max());
+      int64_t Offset = DT->getOffsetInBits();
       // We can't use DT->getAlignInBits() here: AlignInBits for member type
       // is non-zero if and only if alignment was forced (e.g. _Alignas()),
       // which can't be done with bitfields. Thus we use FieldSize here.
@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         if (Asm->getDataLayout().isLittleEndian())
           Offset = FieldSize - (Offset + Size);
 
-        addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
+        if (Offset < 0)
+          addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,
+                  Offset);
+        else
+          addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt,
+                  (uint64_t)Offset);
         OffsetInBytes = FieldOffset >> 3;
       } else {
         addUInt(MemberDie, dwarf::DW_AT_data_bit_offset, std::nullopt, Offset);
diff --git a/llvm/test/DebugInfo/ARM/bitfield.ll b/llvm/test/DebugInfo/ARM/bitfield.ll
index 5bd06b785b159c..672c61db6f4912 100644
--- a/llvm/test/DebugInfo/ARM/bitfield.ll
+++ b/llvm/test/DebugInfo/ARM/bitfield.ll
@@ -12,7 +12,7 @@
 ; CHECK:          DW_AT_name {{.*}} "reserved"
 ; CHECK:          DW_AT_byte_size  {{.*}} (0x04)
 ; CHECK:          DW_AT_bit_size   {{.*}} (0x1c)
-; CHECK:          DW_AT_bit_offset {{.*}} (0xfffffffffffffff8)
+; CHECK:          DW_AT_bit_offset {{.*}} (-8)
 ; CHECK:          DW_AT_data_member_location {{.*}} (DW_OP_plus_uconst 0x0)
 
 %struct.anon = type { i8, [5 x i8] }
diff --git a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
index e2097d7f49b48f..62ffa0a4001f18 100644
--- a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
@@ -16,7 +16,7 @@
 ; CHECK-NEXT: .b8 1    // DW_AT_byte_size
 ; CHECK-NEXT: .b8 6    // DW_AT_bit_size
 ; Negative offset must be encoded as an unsigned integer.
-; CHECK-NEXT: .b64 0xffffffffffffffff // DW_AT_bit_offset
+; CHECK-NEXT: .b8 127  // DW_AT_bit_offset
 ; CHECK-NEXT: .b8 2    // DW_AT_data_member_location
 
 %struct.anon = type { i16 }
diff --git a/llvm/test/DebugInfo/X86/packed_bitfields.ll b/llvm/test/DebugInfo/X86/packed_bitfields.ll
index 0e541f09d22703..614fa59c367848 100644
--- a/llvm/test/DebugInfo/X86/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/X86/packed_bitfields.ll
@@ -15,7 +15,7 @@
 ; CHECK-NOT: DW_TAG_member
 ; CHECK:      DW_AT_byte_size  {{.*}} (0x01)
 ; CHECK-NEXT: DW_AT_bit_size   {{.*}} (0x06)
-; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff)
+; CHECK-NEXT: DW_AT_bit_offset {{.*}} (-1)
 ; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}})
 
 ; ModuleID = 'repro.c'

dwblaikie · 2024-04-08T14:47:28Z

When is a negative bit offset used?

vhscampos · 2024-04-08T15:32:07Z

This is apparent in packed structs with bitfields:

struct {
  char : 3;
  char a : 6;
} __attribute__((__packed__)) b;

gcc produces the following debug information with -gdwarf-3:
.debug_abbrev:

  3      DW_TAG_member    [no children]
    DW_AT_name         DW_FORM_string
    DW_AT_decl_file    DW_FORM_data1
    DW_AT_decl_line    DW_FORM_data1
    DW_AT_decl_column  DW_FORM_data1
    DW_AT_type         DW_FORM_ref4
    DW_AT_byte_size    DW_FORM_data1
    DW_AT_bit_size     DW_FORM_data1
    DW_AT_bit_offset   DW_FORM_sdata
    DW_AT_data_member_location DW_FORM_data1

.debug_info:

 <2><26>: Abbrev Number: 3 (DW_TAG_member)
    <27>   DW_AT_name        : a
    <29>   DW_AT_decl_file   : 1
    <2a>   DW_AT_decl_line   : 3
    <2b>   DW_AT_decl_column : 8
    <2c>   DW_AT_type        : <0x35>
    <30>   DW_AT_byte_size   : 1
    <31>   DW_AT_bit_size    : 6
    <32>   DW_AT_bit_offset  : -1
    <33>   DW_AT_data_member_location: 0

In contrast, clang generates this:
.debug_abbrev:

   4      DW_TAG_member    [no children]
    DW_AT_name         DW_FORM_strp
    DW_AT_type         DW_FORM_ref4
    DW_AT_decl_file    DW_FORM_data1
    DW_AT_decl_line    DW_FORM_data1
    DW_AT_byte_size    DW_FORM_data1
    DW_AT_bit_size     DW_FORM_data1
    DW_AT_bit_offset   DW_FORM_data8
    DW_AT_data_member_location DW_FORM_udata

.debug_info:

 <2><38>: Abbrev Number: 4 (DW_TAG_member)
    <39>   DW_AT_name        : (indirect string, offset: 0x34): a
    <3d>   DW_AT_type        : <0x4f>
    <41>   DW_AT_decl_file   : 1
    <42>   DW_AT_decl_line   : 3
    <43>   DW_AT_byte_size   : 1
    <44>   DW_AT_bit_size    : 6
    <45>   DW_AT_bit_offset  : 0xffffffffffffffff
    <4d>   DW_AT_data_member_location: 0

Note how clang uses DW_FORM_data8 and a value of 0xffffffffffffffff

llvm/test/DebugInfo/NVPTX/packed_bitfields.ll

Artem-B · 2024-04-08T17:54:47Z

LGTM for NVPTX test change.

dwblaikie · 2024-04-12T20:30:10Z

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp

@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
        if (Asm->getDataLayout().isLittleEndian())
          Offset = FieldSize - (Offset + Size);

-        addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
+        if (Offset < 0)
+          addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,


Does this need a version check to make sure it's used in DWARFv3, but not earlier? (or perhaps we don't support DWARFv2? Not sure - but the useDWARF2Bitfields above gives me pause)

Comparing the specifications for DWARF 2 and DWARF 3, I can see that the section about bit fields is semantically the same. In the DWARF 2 spec, it's on page 42.

Also my understanding is that useDWARF2Bitfields includes v3 as well. Presumably because bit field's semantics haven't changed between the two versions.

As an extra check, I confirmed that gcc produces the same debug info for v2 and v3 in this case.

dwblaikie

shrug I'm still confused why we emit negative values here & not sure how much to support DWARFv3 these days... , but if it's consistent with GCC & doesn't seem too invasive, I can't fault it too much.

llvmbot added the debuginfo label Apr 8, 2024

vhscampos requested review from Artem-B, DavidSpickett, adrian-prantl and ostannard April 8, 2024 14:14

Artem-B reviewed Apr 8, 2024

View reviewed changes

llvm/test/DebugInfo/NVPTX/packed_bitfields.ll Show resolved Hide resolved

dwblaikie reviewed Apr 12, 2024

View reviewed changes

dwblaikie approved these changes Apr 22, 2024

View reviewed changes

vhscampos merged commit 119aecb into llvm:main May 13, 2024

vhscampos deleted the sdata branch May 13, 2024 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

Uh oh!

vhscampos commented Apr 8, 2024

Uh oh!

llvmbot commented Apr 8, 2024

Uh oh!

dwblaikie commented Apr 8, 2024

Uh oh!

vhscampos commented Apr 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

Artem-B commented Apr 8, 2024

Uh oh!

dwblaikie Apr 12, 2024

Uh oh!

vhscampos Apr 18, 2024

Uh oh!

dwblaikie left a comment

Uh oh!

Uh oh!

[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

[DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form #87994

Uh oh!

Conversation

vhscampos commented Apr 8, 2024

Uh oh!

llvmbot commented Apr 8, 2024

Uh oh!

dwblaikie commented Apr 8, 2024

Uh oh!

vhscampos commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Artem-B commented Apr 8, 2024

Uh oh!

dwblaikie Apr 12, 2024

Choose a reason for hiding this comment

Uh oh!

vhscampos Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

dwblaikie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vhscampos commented Apr 8, 2024 •

edited

Loading