[lldb] Improve summary string handling of dollar chars #98190

kastiglione · 2024-07-09T17:08:47Z

This improves the handling of $ (dollar) characters in summary strings in the following ways:

When a $ is not followed by an open paren ({), it should be treated as a literal character and preserved in the output. Previously, the dollar would be consumed by the parser and not shown in the output.
When a $ is the last character of a format string, this change eliminates the infinite loop lldb would enter into.

rdar://131392446

llvmbot · 2024-07-09T17:09:19Z

@llvm/pr-subscribers-lldb

Author: Dave Lee (kastiglione)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/98190.diff

4 Files Affected:

(modified) lldb/source/Core/FormatEntity.cpp (+126-132)
(added) lldb/test/API/functionalities/data-formatter/special-chars/Makefile (+2)
(added) lldb/test/API/functionalities/data-formatter/special-chars/TestSummaryStringSpecialChars.py (+19)
(added) lldb/test/API/functionalities/data-formatter/special-chars/main.c (+9)

diff --git a/lldb/source/Core/FormatEntity.cpp b/lldb/source/Core/FormatEntity.cpp
index fe95858f35c9f..4e1f37099148b 100644
--- a/lldb/source/Core/FormatEntity.cpp
+++ b/lldb/source/Core/FormatEntity.cpp
@@ -2170,154 +2170,148 @@ static Status ParseInternal(llvm::StringRef &format, Entry &parent_entry,
     } break;
 
     case '$':
-      if (format.size() == 1) {
-        // '$' at the end of a format string, just print the '$'
+      format = format.drop_front(); // Skip the '$'
+      if (format.empty() || format.front() != '{') {
+        // Print '$' when not followed by '{'.
         parent_entry.AppendText("$");
       } else {
-        format = format.drop_front(); // Skip the '$'
-
-        if (format[0] == '{') {
-          format = format.drop_front(); // Skip the '{'
-
-          llvm::StringRef variable, variable_format;
-          error = FormatEntity::ExtractVariableInfo(format, variable,
-                                                    variable_format);
-          if (error.Fail())
-            return error;
-          bool verify_is_thread_id = false;
-          Entry entry;
-          if (!variable_format.empty()) {
-            entry.printf_format = variable_format.str();
-
-            // If the format contains a '%' we are going to assume this is a
-            // printf style format. So if you want to format your thread ID
-            // using "0x%llx" you can use: ${thread.id%0x%llx}
-            //
-            // If there is no '%' in the format, then it is assumed to be a
-            // LLDB format name, or one of the extended formats specified in
-            // the switch statement below.
-
-            if (entry.printf_format.find('%') == std::string::npos) {
-              bool clear_printf = false;
-
-              if (entry.printf_format.size() == 1) {
-                switch (entry.printf_format[0]) {
-                case '@': // if this is an @ sign, print ObjC description
-                  entry.number = ValueObject::
-                      eValueObjectRepresentationStyleLanguageSpecific;
-                  clear_printf = true;
-                  break;
-                case 'V': // if this is a V, print the value using the default
-                          // format
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleValue;
-                  clear_printf = true;
-                  break;
-                case 'L': // if this is an L, print the location of the value
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleLocation;
-                  clear_printf = true;
-                  break;
-                case 'S': // if this is an S, print the summary after all
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleSummary;
-                  clear_printf = true;
-                  break;
-                case '#': // if this is a '#', print the number of children
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleChildrenCount;
-                  clear_printf = true;
-                  break;
-                case 'T': // if this is a 'T', print the type
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleType;
-                  clear_printf = true;
-                  break;
-                case 'N': // if this is a 'N', print the name
-                  entry.number =
-                      ValueObject::eValueObjectRepresentationStyleName;
-                  clear_printf = true;
-                  break;
-                case '>': // if this is a '>', print the expression path
-                  entry.number = ValueObject::
-                      eValueObjectRepresentationStyleExpressionPath;
-                  clear_printf = true;
-                  break;
-                }
+        format = format.drop_front(); // Skip the '{'
+
+        llvm::StringRef variable, variable_format;
+        error = FormatEntity::ExtractVariableInfo(format, variable,
+                                                  variable_format);
+        if (error.Fail())
+          return error;
+        bool verify_is_thread_id = false;
+        Entry entry;
+        if (!variable_format.empty()) {
+          entry.printf_format = variable_format.str();
+
+          // If the format contains a '%' we are going to assume this is a
+          // printf style format. So if you want to format your thread ID
+          // using "0x%llx" you can use: ${thread.id%0x%llx}
+          //
+          // If there is no '%' in the format, then it is assumed to be a
+          // LLDB format name, or one of the extended formats specified in
+          // the switch statement below.
+
+          if (entry.printf_format.find('%') == std::string::npos) {
+            bool clear_printf = false;
+
+            if (entry.printf_format.size() == 1) {
+              switch (entry.printf_format[0]) {
+              case '@': // if this is an @ sign, print ObjC description
+                entry.number = ValueObject::
+                    eValueObjectRepresentationStyleLanguageSpecific;
+                clear_printf = true;
+                break;
+              case 'V': // if this is a V, print the value using the default
+                        // format
+                entry.number =
+                    ValueObject::eValueObjectRepresentationStyleValue;
+                clear_printf = true;
+                break;
+              case 'L': // if this is an L, print the location of the value
+                entry.number =
+                    ValueObject::eValueObjectRepresentationStyleLocation;
+                clear_printf = true;
+                break;
+              case 'S': // if this is an S, print the summary after all
+                entry.number =
+                    ValueObject::eValueObjectRepresentationStyleSummary;
+                clear_printf = true;
+                break;
+              case '#': // if this is a '#', print the number of children
+                entry.number =
+                    ValueObject::eValueObjectRepresentationStyleChildrenCount;
+                clear_printf = true;
+                break;
+              case 'T': // if this is a 'T', print the type
+                entry.number = ValueObject::eValueObjectRepresentationStyleType;
+                clear_printf = true;
+                break;
+              case 'N': // if this is a 'N', print the name
+                entry.number = ValueObject::eValueObjectRepresentationStyleName;
+                clear_printf = true;
+                break;
+              case '>': // if this is a '>', print the expression path
+                entry.number =
+                    ValueObject::eValueObjectRepresentationStyleExpressionPath;
+                clear_printf = true;
+                break;
               }
+            }
 
-              if (entry.number == 0) {
-                if (FormatManager::GetFormatFromCString(
-                        entry.printf_format.c_str(), entry.fmt)) {
-                  clear_printf = true;
-                } else if (entry.printf_format == "tid") {
-                  verify_is_thread_id = true;
-                } else {
-                  error.SetErrorStringWithFormat("invalid format: '%s'",
-                                                 entry.printf_format.c_str());
-                  return error;
-                }
+            if (entry.number == 0) {
+              if (FormatManager::GetFormatFromCString(
+                      entry.printf_format.c_str(), entry.fmt)) {
+                clear_printf = true;
+              } else if (entry.printf_format == "tid") {
+                verify_is_thread_id = true;
+              } else {
+                error.SetErrorStringWithFormat("invalid format: '%s'",
+                                               entry.printf_format.c_str());
+                return error;
               }
-
-              // Our format string turned out to not be a printf style format
-              // so lets clear the string
-              if (clear_printf)
-                entry.printf_format.clear();
             }
-          }
 
-          // Check for dereferences
-          if (variable[0] == '*') {
-            entry.deref = true;
-            variable = variable.drop_front();
+            // Our format string turned out to not be a printf style format
+            // so lets clear the string
+            if (clear_printf)
+              entry.printf_format.clear();
           }
+        }
 
-          error = ParseEntry(variable, &g_root, entry);
-          if (error.Fail())
-            return error;
+        // Check for dereferences
+        if (variable[0] == '*') {
+          entry.deref = true;
+          variable = variable.drop_front();
+        }
 
-          llvm::StringRef entry_string(entry.string);
-          if (entry_string.contains(':')) {
-            auto [_, llvm_format] = entry_string.split(':');
-            if (!llvm_format.empty() && !LLVMFormatPattern.match(llvm_format)) {
-              error.SetErrorStringWithFormat("invalid llvm format: '%s'",
-                                             llvm_format.data());
-              return error;
-            }
+        error = ParseEntry(variable, &g_root, entry);
+        if (error.Fail())
+          return error;
+
+        llvm::StringRef entry_string(entry.string);
+        if (entry_string.contains(':')) {
+          auto [_, llvm_format] = entry_string.split(':');
+          if (!llvm_format.empty() && !LLVMFormatPattern.match(llvm_format)) {
+            error.SetErrorStringWithFormat("invalid llvm format: '%s'",
+                                           llvm_format.data());
+            return error;
           }
+        }
 
-          if (verify_is_thread_id) {
-            if (entry.type != Entry::Type::ThreadID &&
-                entry.type != Entry::Type::ThreadProtocolID) {
-              error.SetErrorString("the 'tid' format can only be used on "
-                                   "${thread.id} and ${thread.protocol_id}");
-            }
+        if (verify_is_thread_id) {
+          if (entry.type != Entry::Type::ThreadID &&
+              entry.type != Entry::Type::ThreadProtocolID) {
+            error.SetErrorString("the 'tid' format can only be used on "
+                                 "${thread.id} and ${thread.protocol_id}");
           }
+        }
 
-          switch (entry.type) {
-          case Entry::Type::Variable:
-          case Entry::Type::VariableSynthetic:
-            if (entry.number == 0) {
-              if (entry.string.empty())
-                entry.number =
-                    ValueObject::eValueObjectRepresentationStyleValue;
-              else
-                entry.number =
-                    ValueObject::eValueObjectRepresentationStyleSummary;
-            }
-            break;
-          default:
-            // Make sure someone didn't try to dereference anything but ${var}
-            // or ${svar}
-            if (entry.deref) {
-              error.SetErrorStringWithFormat(
-                  "${%s} can't be dereferenced, only ${var} and ${svar} can.",
-                  variable.str().c_str());
-              return error;
-            }
+        switch (entry.type) {
+        case Entry::Type::Variable:
+        case Entry::Type::VariableSynthetic:
+          if (entry.number == 0) {
+            if (entry.string.empty())
+              entry.number = ValueObject::eValueObjectRepresentationStyleValue;
+            else
+              entry.number =
+                  ValueObject::eValueObjectRepresentationStyleSummary;
+          }
+          break;
+        default:
+          // Make sure someone didn't try to dereference anything but ${var}
+          // or ${svar}
+          if (entry.deref) {
+            error.SetErrorStringWithFormat(
+                "${%s} can't be dereferenced, only ${var} and ${svar} can.",
+                variable.str().c_str());
+            return error;
           }
-          parent_entry.AppendEntry(std::move(entry));
         }
+        parent_entry.AppendEntry(std::move(entry));
       }
       break;
     }
diff --git a/lldb/test/API/functionalities/data-formatter/special-chars/Makefile b/lldb/test/API/functionalities/data-formatter/special-chars/Makefile
new file mode 100644
index 0000000000000..197a5cc71fd75
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/special-chars/Makefile
@@ -0,0 +1,2 @@
+C_SOURCES = main.c
+include Makefile.rules
diff --git a/lldb/test/API/functionalities/data-formatter/special-chars/TestSummaryStringSpecialChars.py b/lldb/test/API/functionalities/data-formatter/special-chars/TestSummaryStringSpecialChars.py
new file mode 100644
index 0000000000000..98de031ac3b36
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/special-chars/TestSummaryStringSpecialChars.py
@@ -0,0 +1,19 @@
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+
+class TestCase(TestBase):
+
+    def test_summary_string_with_bare_dollar_char(self):
+        self.build()
+        lldbutil.run_to_source_breakpoint(self, "break here", lldb.SBFileSpec("main.c"))
+        self.runCmd("type summary add --summary-string '$ $CASH $' --no-value Dollars")
+        self.expect("v cash", startstr="(Dollars) cash = $ $CASH $")
+
+    def test_summary_string_with_bare_dollar_char_before_var(self):
+        self.build()
+        lldbutil.run_to_source_breakpoint(self, "break here", lldb.SBFileSpec("main.c"))
+        self.runCmd("type summary add --summary-string '$${var}' --no-value Dollars")
+        self.expect("v cash", startstr="(Dollars) cash = $99")
diff --git a/lldb/test/API/functionalities/data-formatter/special-chars/main.c b/lldb/test/API/functionalities/data-formatter/special-chars/main.c
new file mode 100644
index 0000000000000..36e6ea91da1fb
--- /dev/null
+++ b/lldb/test/API/functionalities/data-formatter/special-chars/main.c
@@ -0,0 +1,9 @@
+#include <stdio.h>
+
+typedef int Dollars;
+
+int main() {
+  Dollars cash = 99;
+  printf("break here: %d\n", cash);
+  return 0;
+}

github-actions · 2024-07-09T17:12:53Z

✅ With the latest revision this PR passed the Python code formatter.

kastiglione · 2024-07-09T18:19:40Z

Note that this is essentially a few lines of concrete change, most of the change is whitespace only. Use ?w=1 to filter out whitespace only changes: https://github.com/llvm/llvm-project/pull/98190/files?w=1

jimingham

LGTM

This improves the handling of `$` (dollar) characters in summary strings in the following ways: 1. When a `$` is not followed by an open paren (`{`), it should be treated as a literal character and preserved in the output. Previously, the dollar would be consumed by the parser and not shown in the output. 2. When a `$` is the last character of a format string, this change eliminates the infinite loop lldb would enter into. rdar://131392446 (cherry picked from commit 10f3f06)

This improves the handling of `$` (dollar) characters in summary strings in the following ways: 1. When a `$` is not followed by an open paren (`{`), it should be treated as a literal character and preserved in the output. Previously, the dollar would be consumed by the parser and not shown in the output. 2. When a `$` is the last character of a format string, this change eliminates the infinite loop lldb would enter into. rdar://131392446

[lldb] Improve summary string handling of dollar chars

24f2c8e

kastiglione requested a review from JDevlieghere as a code owner July 9, 2024 17:08

llvmbot added the lldb label Jul 9, 2024

kastiglione requested a review from jimingham July 9, 2024 17:10

python formatting

14c0671

jimingham approved these changes Jul 9, 2024

View reviewed changes

kastiglione merged commit 10f3f06 into llvm:main Jul 9, 2024
4 of 5 checks passed

kastiglione deleted the lldb-Improve-summary-string-handling-of-dollar-chars branch July 9, 2024 20:35

kastiglione mentioned this pull request Jul 9, 2024

[lldb] Improve summary string handling of dollar chars swiftlang/llvm-project#8961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Improve summary string handling of dollar chars #98190

[lldb] Improve summary string handling of dollar chars #98190

Uh oh!

kastiglione commented Jul 9, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jul 9, 2024

Uh oh!

github-actions bot commented Jul 9, 2024 •

edited

Loading

Uh oh!

kastiglione commented Jul 9, 2024

Uh oh!

jimingham left a comment

Uh oh!

Uh oh!

Uh oh!

[lldb] Improve summary string handling of dollar chars #98190

[lldb] Improve summary string handling of dollar chars #98190

Uh oh!

Conversation

kastiglione commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 9, 2024

Uh oh!

github-actions bot commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kastiglione commented Jul 9, 2024

Uh oh!

jimingham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kastiglione commented Jul 9, 2024 •

edited

Loading

github-actions bot commented Jul 9, 2024 •

edited

Loading