[lldb] Support custom LLVM formatting for variables #91868

kastiglione · 2024-05-11T20:50:30Z

Re-apply #81196, with a fix that handles the absence of llvm formatting: 3ba650e

Adds support for applying LLVM formatting to variables. The reason for this is to support cases such as the following. Let's say you have two separate bytes that you want to print as a combined hex value. Consider the following summary string: ``` ${var.byte1%x}${var.byte2%x} ``` The output of this will be: `0x120x34`. That is, a `0x` prefix is unconditionally applied to each byte. This is unlike printf formatting where you must include the `0x` yourself. Currently, there's no way to do this with summary strings, instead you'll need a summary provider in python or c++. This change introduces formatting support using LLVM's formatter system. This allows users to achieve the desired custom formatting using: ``` ${var.byte1:x-}${var.byte2:x-} ``` Here, each variable is suffixed with `:x-`. This is passed to the LLVM formatter as `{0:x-}`. For integer values, `x` declares the output as hex, and `-` declares that no `0x` prefix is to be used. Further, one could write: ``` ${var.byte1:x-2}${var.byte2:x-2} ``` Where the added `2` results in these bytes being written with a minimum of 2 digits. An alternative considered was to add a new format specifier that would print hex values without the `0x` prefix. The reason that approach was not taken is because in addition to forcing a `0x` prefix, hex values are also forced to use leading zeros. This approach lets the user have full control over formatting.

llvmbot · 2024-05-11T20:51:02Z

@llvm/pr-subscribers-lldb

Author: Dave Lee (kastiglione)

Changes

[lldb] Support custom LLVM formatting for variables (#81196)
[lldb] Handle non-existent llvm_format

Full diff: https://github.com/llvm/llvm-project/pull/91868.diff

5 Files Affected:

(modified) lldb/docs/use/variable.rst (+9)
(modified) lldb/source/Core/FormatEntity.cpp (+63-10)
(added) lldb/test/API/functionalities/data-formatter/custom-printf-summary/Makefile (+2)
(added) lldb/test/API/functionalities/data-formatter/custom-printf-summary/TestCustomSummaryLLVMFormat.py (+20)
(added) lldb/test/API/functionalities/data-formatter/custom-printf-summary/main.c (+13)

diff --git a/lldb/docs/use/variable.rst b/lldb/docs/use/variable.rst
index 8eaed6405315b..e9175b25336ba 100644
--- a/lldb/docs/use/variable.rst
+++ b/lldb/docs/use/variable.rst
@@ -460,6 +460,15 @@ summary strings, regardless of the format they have applied to their types. To
 do that, you can use %format inside an expression path, as in ${var.x->x%u},
 which would display the value of x as an unsigned integer.
 
+Additionally, custom output can be achieved by using an LLVM format string,
+commencing with the ``:`` marker. To illustrate, compare ``${var.byte%x}`` and
+``${var.byte:x-}``. The former uses lldb's builtin hex formatting (``x``),
+which unconditionally inserts a ``0x`` prefix, and also zero pads the value to
+match the size of the type. The latter uses ``llvm::formatv`` formatting
+(``:x-``), and will print only the hex value, with no ``0x`` prefix, and no
+padding. This raw control is useful when composing multiple pieces into a
+larger whole.
+
 You can also use some other special format markers, not available for formats
 themselves, but which carry a special meaning when used in this context:
 
diff --git a/lldb/source/Core/FormatEntity.cpp b/lldb/source/Core/FormatEntity.cpp
index ba62e26252591..af316e1044d2a 100644
--- a/lldb/source/Core/FormatEntity.cpp
+++ b/lldb/source/Core/FormatEntity.cpp
@@ -57,6 +57,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/Regex.h"
 #include "llvm/TargetParser/Triple.h"
 
 #include <cctype>
@@ -658,6 +659,38 @@ static char ConvertValueObjectStyleToChar(
   return '\0';
 }
 
+static llvm::Regex LLVMFormatPattern{"x[-+]?\\d*|n|d", llvm::Regex::IgnoreCase};
+
+static bool DumpValueWithLLVMFormat(Stream &s, llvm::StringRef options,
+                                    ValueObject &valobj) {
+  std::string formatted;
+  std::string llvm_format = ("{0:" + options + "}").str();
+
+  // Options supported by format_provider<T> for integral arithmetic types.
+  // See table in FormatProviders.h.
+
+  auto type_info = valobj.GetTypeInfo();
+  if (type_info & eTypeIsInteger && LLVMFormatPattern.match(options)) {
+    if (type_info & eTypeIsSigned) {
+      bool success = false;
+      int64_t integer = valobj.GetValueAsSigned(0, &success);
+      if (success)
+        formatted = llvm::formatv(llvm_format.data(), integer);
+    } else {
+      bool success = false;
+      uint64_t integer = valobj.GetValueAsUnsigned(0, &success);
+      if (success)
+        formatted = llvm::formatv(llvm_format.data(), integer);
+    }
+  }
+
+  if (formatted.empty())
+    return false;
+
+  s.Write(formatted.data(), formatted.size());
+  return true;
+}
+
 static bool DumpValue(Stream &s, const SymbolContext *sc,
                       const ExecutionContext *exe_ctx,
                       const FormatEntity::Entry &entry, ValueObject *valobj) {
@@ -728,9 +761,12 @@ static bool DumpValue(Stream &s, const SymbolContext *sc,
     return RunScriptFormatKeyword(s, sc, exe_ctx, valobj, entry.string.c_str());
   }
 
-  llvm::StringRef subpath(entry.string);
+  auto split = llvm::StringRef(entry.string).split(':');
+  auto subpath = split.first;
+  auto llvm_format = split.second;
+
   // simplest case ${var}, just print valobj's value
-  if (entry.string.empty()) {
+  if (subpath.empty()) {
     if (entry.printf_format.empty() && entry.fmt == eFormatDefault &&
         entry.number == ValueObject::eValueObjectRepresentationStyleValue)
       was_plain_var = true;
@@ -739,7 +775,7 @@ static bool DumpValue(Stream &s, const SymbolContext *sc,
     target = valobj;
   } else // this is ${var.something} or multiple .something nested
   {
-    if (entry.string[0] == '[')
+    if (subpath[0] == '[')
       was_var_indexed = true;
     ScanBracketedRange(subpath, close_bracket_index,
                        var_name_final_if_array_range, index_lower,
@@ -747,14 +783,11 @@ static bool DumpValue(Stream &s, const SymbolContext *sc,
 
     Status error;
 
-    const std::string &expr_path = entry.string;
-
-    LLDB_LOGF(log, "[Debugger::FormatPrompt] symbol to expand: %s",
-              expr_path.c_str());
+    LLDB_LOG(log, "[Debugger::FormatPrompt] symbol to expand: {0}", subpath);
 
     target =
         valobj
-            ->GetValueForExpressionPath(expr_path.c_str(), &reason_to_stop,
+            ->GetValueForExpressionPath(subpath, &reason_to_stop,
                                         &final_value_type, options, &what_next)
             .get();
 
@@ -883,8 +916,18 @@ static bool DumpValue(Stream &s, const SymbolContext *sc,
   }
 
   if (!is_array_range) {
-    LLDB_LOGF(log,
-              "[Debugger::FormatPrompt] dumping ordinary printable output");
+    if (!llvm_format.empty()) {
+      if (DumpValueWithLLVMFormat(s, llvm_format, *target)) {
+        LLDB_LOGF(log, "dumping using llvm format");
+        return true;
+      } else {
+        LLDB_LOG(
+            log,
+            "empty output using llvm format '{0}' - with type info flags {1}",
+            entry.printf_format, target->GetTypeInfo());
+      }
+    }
+    LLDB_LOGF(log, "dumping ordinary printable output");
     return target->DumpPrintableRepresentation(s, val_obj_display,
                                                custom_format);
   } else {
@@ -2227,6 +2270,16 @@ static Status ParseInternal(llvm::StringRef &format, Entry &parent_entry,
           if (error.Fail())
             return error;
 
+          llvm::StringRef entry_string(entry.string);
+          if (entry_string.contains(':')) {
+            auto [_, llvm_format] = entry_string.split(':');
+            if (!llvm_format.empty() && !LLVMFormatPattern.match(llvm_format)) {
+              error.SetErrorStringWithFormat("invalid llvm format: '%s'",
+                                             llvm_format.data());
+              return error;
+            }
+          }
+
           if (verify_is_thread_id) {
             if (entry.type != Entry::Type::ThreadID &&
                 entry.type != Entry::Type::ThreadProtocolID) {
diff --git a/lldb/test/API/functionalities/data-formatter/custom-printf-summary/Makefile b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/Makefile
new file mode 100644
index 0000000000000..c9319d6e6888a
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/Makefile
@@ -0,0 +1,2 @@
+C_SOURCES := main.c
+include Makefile.rules
diff --git a/lldb/test/API/functionalities/data-formatter/custom-printf-summary/TestCustomSummaryLLVMFormat.py b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/TestCustomSummaryLLVMFormat.py
new file mode 100644
index 0000000000000..d6906a49463ba
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/TestCustomSummaryLLVMFormat.py
@@ -0,0 +1,20 @@
+import lldb
+from lldbsuite.test.lldbtest import *
+import lldbsuite.test.lldbutil as lldbutil
+
+
+class TestCase(TestBase):
+    def test_raw_bytes(self):
+        self.build()
+        lldbutil.run_to_source_breakpoint(self, "break here", lldb.SBFileSpec("main.c"))
+        self.runCmd("type summary add -s '${var.ubyte:x-2}${var.sbyte:x-2}!' Bytes")
+        self.expect("v bytes", substrs=[" = 3001!"])
+
+    def test_bad_format(self):
+        self.build()
+        lldbutil.run_to_source_breakpoint(self, "break here", lldb.SBFileSpec("main.c"))
+        self.expect(
+            "type summary add -s '${var.ubyte:y}!' Bytes",
+            error=True,
+            substrs=["invalid llvm format"],
+        )
diff --git a/lldb/test/API/functionalities/data-formatter/custom-printf-summary/main.c b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/main.c
new file mode 100644
index 0000000000000..4164aff7dbf6f
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/custom-printf-summary/main.c
@@ -0,0 +1,13 @@
+#include <stdint.h>
+#include <stdio.h>
+
+struct Bytes {
+  uint8_t ubyte;
+  int8_t sbyte;
+};
+
+int main() {
+  struct Bytes bytes = {0x30, 0x01};
+  (void)bytes;
+  printf("break here\n");
+}

lldb/source/Core/FormatEntity.cpp

adrian-prantl · 2024-05-14T00:46:36Z

Re-apply #81196, with a fix.

It's usually helpful to indicate what/where the fix is to make it easier to re-review.

kastiglione · 2024-05-15T20:13:45Z

@adrian-prantl I've updated the description:

Re-apply #81196, with a fix that handles the absence of llvm formatting: 3ba650e

kastiglione added 2 commits May 11, 2024 13:46

[lldb] Handle non-existent llvm_format

3ba650e

kastiglione requested a review from JDevlieghere as a code owner May 11, 2024 20:50

llvmbot added the lldb label May 11, 2024

kastiglione changed the title ~~lldb Support custom LLVM formatting for variables~~ [lldb] Support custom LLVM formatting for variables May 11, 2024

kastiglione requested a review from adrian-prantl May 11, 2024 20:52

adrian-prantl approved these changes May 14, 2024

View reviewed changes

lldb/source/Core/FormatEntity.cpp Outdated Show resolved Hide resolved

kastiglione added 2 commits May 15, 2024 14:08

Wrap & operation in parens

f209929

Move comment along with variable

c255923

kastiglione merged commit 8530b1c into llvm:main May 15, 2024
5 checks passed

kastiglione deleted the lldb-Support-custom-LLVM-formatting-for-variables branch May 15, 2024 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Support custom LLVM formatting for variables #91868

[lldb] Support custom LLVM formatting for variables #91868

Uh oh!

kastiglione commented May 11, 2024 •

edited

Loading

Uh oh!

llvmbot commented May 11, 2024

Uh oh!

Uh oh!

adrian-prantl commented May 14, 2024

Uh oh!

kastiglione commented May 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[lldb] Support custom LLVM formatting for variables #91868

[lldb] Support custom LLVM formatting for variables #91868

Uh oh!

Conversation

kastiglione commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 11, 2024

Uh oh!

Uh oh!

adrian-prantl commented May 14, 2024

Uh oh!

kastiglione commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kastiglione commented May 11, 2024 •

edited

Loading

kastiglione commented May 15, 2024 •

edited

Loading