[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang… #95738

cmtice · 2024-06-17T05:46:35Z

…uage (DIL).

The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator.

This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.

…uage (DIL). The Data Inspection Language (DIL), described in https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator. This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.

llvmbot · 2024-06-17T05:47:06Z

@llvm/pr-subscribers-lldb

Author: None (cmtice)

Changes

…uage (DIL).

The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator.

This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.

Patch is 50.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95738.diff

4 Files Affected:

(added) lldb/docs/dil-expr-lang.ebnf (+165)
(added) lldb/include/lldb/Core/DILAST.h (+690)
(modified) lldb/source/Core/CMakeLists.txt (+1)
(added) lldb/source/Core/DILAST.cpp (+568)

diff --git a/lldb/docs/dil-expr-lang.ebnf b/lldb/docs/dil-expr-lang.ebnf
new file mode 100644
index 0000000000000..40c678c25cda5
--- /dev/null
+++ b/lldb/docs/dil-expr-lang.ebnf
@@ -0,0 +1,165 @@
+(* LLDB Debug Expressions, a subset of C++ *)
+(* Insired by https://www.nongnu.org/hcb *)
+
+expression = assignment_expression ;
+
+assignment_expression = conditional_expression
+                        logical_or_expression assignment_operator assignment_expression ;
+
+assignment_operator = "="
+                    | "*="
+                    | "/="
+                    | "%="
+                    | "+="
+                    | "-="
+                    | ">>="
+                    | "<<="
+                    | "&="
+                    | "^="
+                    | "|=" ;
+
+conditional_expression = logical_or_expression
+                       | logical_or_expression "?" expression ":" assignment_expression ;
+
+logical_or_expression = logical_and_expression {"||" logical_and_expression} ;
+
+logical_and_expression = inclusive_or_expression {"&&" inclusive_or_expression} ;
+
+inclusive_or_expression = exclusive_or_expression {"|" exclusive_or_expression} ;
+
+exclusive_or_expression = and_expression {"^" and_expression} ;
+
+and_expression = equality_expression {"&" equality_expression} ;
+
+equality_expression = relational_expression {"==" relational_expression}
+                    | relational_expression {"!=" relational_expression} ;
+
+relational_expression = shift_expression {"<" shift_expression}
+                      | shift_expression {">" shift_expression}
+                      | shift_expression {"<=" shift_expression}
+                      | shift_expression {">=" shift_expression} ;
+
+shift_expression = additive_expression {"<<" additive_expression}
+                 | additive_expression {">>" additive_expression} ;
+
+additive_expression = multiplicative_expression {"+" multiplicative_expression}
+                    | multiplicative_expression {"-" multiplicative_expression} ;
+
+multiplicative_expression = cast_expression {"*" cast_expression}
+                          | cast_expression {"/" cast_expression}
+                          | cast_expression {"%" cast_expression} ;
+
+cast_expression = unary_expression
+                | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression
+                 | "++" cast_expression
+                 | "--" cast_expression
+                 | unary_operator cast_expression
+                 | "sizeof" unary_expression
+                 | "sizeof" "(" type_id ")" ;
+
+unary_operator = "*" | "&" | "+" | "-" | "!" | "~" ;
+
+postfix_expression = primary_expression
+                   | postfix_expression "[" expression "]"
+                   | postfix_expression "." id_expression
+                   | postfix_expression "->" id_expression
+                   | postfix_expression "++"
+                   | postfix_expression "--"
+                   | static_cast "<" type_id ">" "(" expression ")" ;
+                   | dynamic_cast "<" type_id ">" "(" expression ")" ;
+                   | reinterpret_cast "<" type_id ">" "(" expression ")" ;
+
+primary_expression = numeric_literal
+                   | boolean_literal
+                   | pointer_literal
+                   | id_expression
+                   | "this"
+                   | "(" expression ")"
+                   | builtin_func ;
+
+type_id = type_specifier_seq [abstract_declarator] ;
+
+type_specifier_seq = type_specifier [type_specifier_seq] ;
+
+type_specifier = simple_type_specifier
+               | cv_qualifier ;
+
+simple_type_specifier = ["::"] [nested_name_specifier] type_name
+                      | "char"
+                      | "char16_t"
+                      | "char32_t"
+                      | "wchar_t"
+                      | "bool"
+                      | "short"
+                      | "int"
+                      | "long"
+                      | "signed"
+                      | "unsigned"
+                      | "float"
+                      | "double"
+                      | "void" ;
+
+nested_name_specifier = type_name "::"
+                      | namespace_name '::'
+                      | nested_name_specifier identifier "::"
+                      | nested_name_specifier simple_template_id "::"
+
+type_name = class_name
+          | enum_name
+          | typedef_name
+          | simple_template_id ;
+
+class_name = identifier ;
+
+enum_name = identifier ;
+
+typedef_name = identifier ;
+
+simple_template_id = template_name "<" [template_argument_list] ">" ;
+
+template_name = identifier ;
+
+template_argument_list = template_argument
+                       | template_argument_list "," template_argument ;
+
+template_argument = type_id
+                  | numeric_literal
+                  | id_expression ;
+
+namespace_name = identifier ;
+
+cv_qualifier = "const" | "volatile" ;
+
+cv_qualifier_seq = cv_qualifier [cv_qualifier_seq] ;
+
+abstract_declarator = ptr_operator [abstract_declarator] ;
+
+ptr_operator = "*" [cv_qualifier_seq]
+             | "&" ;
+
+id_expression = unqualified_id
+              | qualified_id ;
+
+unqualified_id = identifier ;
+
+qualified_id = ["::"] [nested_name_specifier] unqualified_id
+             | ["::"] identifier ;
+
+identifier = ? clang::tok::identifier ? ;
+
+numeric_literal = ? clang::tok::numeric_constant ? ;
+
+boolean_literal = "true" | "false" ;
+
+pointer_literal = "nullptr" ;
+
+builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;
+
+builtin_func_name = "__log2" ;
+
+builtin_func_argument_list = builtin_func_argument
+                           | builtin_func_argument_list "," builtin_func_argument
+
+builtin_func_argument = expression ;
diff --git a/lldb/include/lldb/Core/DILAST.h b/lldb/include/lldb/Core/DILAST.h
new file mode 100644
index 0000000000000..acb9da088eda8
--- /dev/null
+++ b/lldb/include/lldb/Core/DILAST.h
@@ -0,0 +1,690 @@
+//===-- DILAST.h ------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_DIL_AST_H_
+#define LLDB_DIL_AST_H_
+
+#include <memory>
+#include <optional>
+#include <string>
+#include <variant>
+#include <vector>
+
+#include "lldb/Core/ValueObject.h"
+#include "lldb/Symbol/Type.h"
+#include "lldb/Symbol/TypeList.h"
+#include "lldb/Target/LanguageRuntime.h"
+#include "lldb/Utility/ConstString.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/TokenKinds.h"
+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+
+namespace lldb_private {
+
+/// Struct to hold information about member fields. Used by the parser for the
+/// Data Inspection Language (DIL).
+struct DILMemberInfo {
+  std::optional<std::string> name;
+  CompilerType type;
+  bool is_bitfield;
+  uint32_t bitfield_size_in_bits;
+  bool is_synthetic;
+  bool is_dynamic;
+  lldb::ValueObjectSP val_obj_sp;
+
+  explicit operator bool() const { return type.IsValid(); }
+};
+
+/// This determines if the type is a shared, unique or weak pointer, either
+/// from stdlibc++ or libc+++.
+bool IsSmartPtrType(CompilerType type);
+
+/// Finds the member field with the given name and type, stores the child index
+/// corresponding to the field in the idx vector and returns a DILMemberInfo
+/// struct with appropriate information about the field.
+DILMemberInfo GetFieldWithNameIndexPath(lldb::ValueObjectSP lhs_val_sp,
+                                        CompilerType type,
+                                        const std::string &name,
+                                        std::vector<uint32_t> *idx,
+                                        CompilerType empty_type,
+                                        bool use_synthetic, bool is_dynamic);
+
+std::tuple<DILMemberInfo, std::vector<uint32_t>>
+GetMemberInfo(lldb::ValueObjectSP lhs_val_sp, CompilerType type,
+              const std::string &name, bool use_synthetic);
+
+/// Get the appropriate ValueObjectSP, consulting the use_dynamic and
+/// use_synthetic options passed, acquiring the process & target locks if
+/// appropriate.
+lldb::ValueObjectSP
+DILGetSPWithLock(lldb::ValueObjectSP valobj_sp,
+                 lldb::DynamicValueType use_dynamic = lldb::eNoDynamicValues,
+                 bool use_synthetic = false);
+
+/// The various types DIL AST nodes (used by the DIL parser).
+enum class DILNodeKind {
+  kDILErrorNode,
+  kLiteralNode,
+  kIdentifierNode,
+  kSizeOfNode,
+  kBuiltinFunctionCallNode,
+  kCStyleCastNode,
+  kCxxStaticCastNode,
+  kCxxReinterpretCastNode,
+  kMemberOfNode,
+  kArraySubscriptNode,
+  kBinaryOpNode,
+  kUnaryOpNode,
+  kTernaryOpNode,
+  kSmartPtrToPtrDecay
+};
+
+/// The C-Style casts allowed by DIL.
+enum class CStyleCastKind {
+  kArithmetic,
+  kEnumeration,
+  kPointer,
+  kNullptr,
+  kReference,
+};
+
+/// The Cxx static casts allowed by DIL.
+enum class CxxStaticCastKind {
+  kNoOp,
+  kArithmetic,
+  kEnumeration,
+  kPointer,
+  kNullptr,
+  kBaseToDerived,
+  kDerivedToBase,
+};
+
+/// The binary operators recognized by DIL.
+enum class BinaryOpKind {
+  Mul,       // "*"
+  Div,       // "/"
+  Rem,       // "%"
+  Add,       // "+"
+  Sub,       // "-"
+  Shl,       // "<<"
+  Shr,       // ">>"
+  LT,        // "<"
+  GT,        // ">"
+  LE,        // "<="
+  GE,        // ">="
+  EQ,        // "=="
+  NE,        // "!="
+  And,       // "&"
+  Xor,       // "^"
+  Or,        // "|"
+  LAnd,      // "&&"
+  LOr,       // "||"
+  Assign,    // "="
+  MulAssign, // "*="
+  DivAssign, // "/="
+  RemAssign, // "%="
+  AddAssign, // "+="
+  SubAssign, // "-="
+  ShlAssign, // "<<="
+  ShrAssign, // ">>="
+  AndAssign, // "&="
+  XorAssign, // "^="
+  OrAssign,  // "|="
+};
+
+/// The Unary operators recognized by DIL.
+enum class UnaryOpKind {
+  PostInc, // "++"
+  PostDec, // "--"
+  PreInc,  // "++"
+  PreDec,  // "--"
+  AddrOf,  // "&"
+  Deref,   // "*"
+  Plus,    // "+"
+  Minus,   // "-"
+  Not,     // "~"
+  LNot,    // "!"
+};
+
+/// Helper functions for DIL AST node parsing.
+
+/// Translates clang tokens to BinaryOpKind.
+BinaryOpKind
+clang_token_kind_to_binary_op_kind(clang::tok::TokenKind token_kind);
+
+/// Returns bool indicating whether or not the input kind is an assignment.
+bool binary_op_kind_is_comp_assign(BinaryOpKind kind);
+
+/// Given a string representing a type, returns the CompilerType corresponding
+/// to the named type, if it exists.
+CompilerType
+ResolveTypeByName(const std::string &name,
+                  std::shared_ptr<ExecutionContextScope> ctx_scope);
+
+/// Quick lookup to check if a type name already exists in a
+/// name-to-CompilerType map the DIL parser keeps of previously found
+/// name/type pairs.
+bool IsContextVar(const std::string &name);
+
+/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak)
+/// or not. Only applicable for C++, which is why this is here and not part of
+/// the CompilerType class.
+bool IsSmartPtrType(CompilerType type);
+
+/// Class used to store & manipulate information about identifiers.
+class IdentifierInfo {
+private:
+  using MemberPath = std::vector<uint32_t>;
+  using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>;
+
+public:
+  enum class Kind {
+    kValue,
+    kContextArg,
+    kMemberPath,
+    kThisKeyword,
+  };
+
+  static IdentifierInfoPtr FromValue(lldb::ValueObjectSP value_sp) {
+    CompilerType type;
+    lldb::ValueObjectSP value = DILGetSPWithLock(value_sp);
+    if (value)
+      type = value->GetCompilerType();
+    return IdentifierInfoPtr(new IdentifierInfo(Kind::kValue, type, value, {}));
+  }
+
+  static IdentifierInfoPtr FromContextArg(CompilerType type) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(
+        new IdentifierInfo(Kind::kContextArg, type, empty_value, {}));
+  }
+
+  static IdentifierInfoPtr FromMemberPath(CompilerType type, MemberPath path) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(new IdentifierInfo(Kind::kMemberPath, type,
+                                                empty_value, std::move(path)));
+  }
+
+  static IdentifierInfoPtr FromThisKeyword(CompilerType type) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(
+        new IdentifierInfo(Kind::kThisKeyword, type, empty_value, {}));
+  }
+
+  Kind kind() const { return m_kind; }
+  lldb::ValueObjectSP value() const { return m_value; }
+  const MemberPath &path() const { return m_path; }
+
+  CompilerType GetType() { return m_type; }
+  bool IsValid() const { return m_type.IsValid(); }
+
+  IdentifierInfo(Kind kind, CompilerType type, lldb::ValueObjectSP value,
+                 MemberPath path)
+      : m_kind(kind), m_type(type), m_value(std::move(value)),
+        m_path(std::move(path)) {}
+
+private:
+  Kind m_kind;
+  CompilerType m_type;
+  lldb::ValueObjectSP m_value;
+  MemberPath m_path;
+};
+
+/// Given the name of an identifier (variable name, member name, type name,
+/// etc.), find the ValueObject for that name (if it exists) and create and
+/// return an IdentifierInfo object containing all the relevant information
+/// about that object (for DIL parsing and evaluating).
+std::unique_ptr<IdentifierInfo> LookupIdentifier(
+    const std::string &name, std::shared_ptr<ExecutionContextScope> ctx_scope,
+    lldb::DynamicValueType use_dynamic, CompilerType *scope_ptr = nullptr);
+
+/// Forward declaration, for use in DIL AST nodes. Definition is at the very
+/// end of this file.
+class DILVisitor;
+
+/// The rest of the classes in this file, except for the DILVisitor class at the
+/// very end, define all the types of AST nodes used by the DIL parser and
+/// expression evaluator. The DIL parser parses the input string and creates the
+/// AST parse tree from the AST nodes. The resulting AST node tree gets passed
+/// to the DIL expression evaluator, which evaluates the DIL AST nodes and
+/// creates/returns a ValueObjectSP containing the result.
+
+/// Base class for AST nodes used by the Data Inspection Language (DIL) parser.
+/// All of the specialized types of AST nodes inherit from this (virtual) base
+/// class.
+class DILASTNode {
+public:
+  DILASTNode(clang::SourceLocation location) : location_(location) {}
+  virtual ~DILASTNode() {}
+
+  virtual void Accept(DILVisitor *v) const = 0;
+
+  virtual bool is_error() const { return false; };
+  virtual bool is_rvalue() const = 0;
+  virtual bool is_bitfield() const { return false; };
+  virtual bool is_context_var() const { return false; };
+  virtual bool is_literal_zero() const { return false; }
+  virtual uint32_t bitfield_size() const { return 0; }
+  virtual CompilerType result_type() const = 0;
+
+  virtual DILNodeKind what_am_i() const = 0;
+
+  clang::SourceLocation location() const { return location_; }
+
+  // The expression result type, but dereferenced in case it's a reference. This
+  // is for convenience, since for the purposes of the semantic analysis only
+  // the dereferenced type matters.
+  CompilerType result_type_deref() const;
+
+private:
+  clang::SourceLocation location_;
+};
+
+using ParseResult = std::unique_ptr<DILASTNode>;
+
+class DILErrorNode : public DILASTNode {
+public:
+  DILErrorNode(CompilerType empty_type)
+      : DILASTNode(clang::SourceLocation()), m_empty_type(empty_type) {}
+  void Accept(DILVisitor *v) const override;
+  bool is_error() const override { return true; }
+  bool is_rvalue() const override { return false; }
+  CompilerType result_type() const override { return m_empty_type; }
+  CompilerType result_type_real() const { return m_empty_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kDILErrorNode; }
+
+private:
+  CompilerType m_empty_type;
+};
+
+class LiteralNode : public DILASTNode {
+public:
+  template <typename ValueT>
+  LiteralNode(clang::SourceLocation location, CompilerType type, ValueT &&value,
+              bool is_literal_zero)
+      : DILASTNode(location), m_type(type),
+        m_value(std::forward<ValueT>(value)),
+        m_is_literal_zero(is_literal_zero) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  bool is_literal_zero() const override { return m_is_literal_zero; }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kLiteralNode; }
+
+  template <typename ValueT> ValueT value() const {
+    return std::get<ValueT>(m_value);
+  }
+
+  auto value() const { return m_value; }
+
+private:
+  CompilerType m_type;
+  std::variant<llvm::APInt, llvm::APFloat, bool, std::vector<char>> m_value;
+  bool m_is_literal_zero;
+};
+
+class IdentifierNode : public DILASTNode {
+public:
+  IdentifierNode(clang::SourceLocation location, std::string name,
+                 std::unique_ptr<IdentifierInfo> identifier, bool is_rvalue,
+                 bool is_context_var)
+      : DILASTNode(location), m_is_rvalue(is_rvalue),
+        m_is_context_var(is_context_var), m_name(std::move(name)),
+        m_identifier(std::move(identifier)) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return m_is_rvalue; }
+  bool is_context_var() const override { return m_is_context_var; };
+  CompilerType result_type() const override { return m_identifier->GetType(); }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kIdentifierNode;
+  }
+
+  std::string name() const { return m_name; }
+  const IdentifierInfo &info() const { return *m_identifier; }
+
+private:
+  bool m_is_rvalue;
+  bool m_is_context_var;
+  std::string m_name;
+  std::unique_ptr<IdentifierInfo> m_identifier;
+};
+
+class SizeOfNode : public DILASTNode {
+public:
+  SizeOfNode(clang::SourceLocation location, CompilerType type,
+             CompilerType operand)
+      : DILASTNode(location), m_type(type), m_operand(operand) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kSizeOfNode; }
+
+  CompilerType operand() const { return m_operand; }
+
+private:
+  CompilerType m_type;
+  CompilerType m_operand;
+};
+
+class BuiltinFunctionCallNode : public DILASTNode {
+public:
+  BuiltinFunctionCallNode(clang::SourceLocation location,
+                          CompilerType result_type, std::string name,
+                          std::vector<ParseResult> arguments)
+      : DILASTNode(location), m_result_type(result_type),
+        m_name(std::move(name)), m_arguments(std::move(arguments)) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  CompilerType result_type() const override { return m_result_type; }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kBuiltinFunctionCallNode;
+  }
+
+  std::string name() const { return m_name; }
+  const std::vector<ParseResult> &arguments() const { return m_arguments; };
+
+private:
+  CompilerType m_result_type;
+  std::string m_name;
+  std::vector<ParseResult> m_arguments;
+};
+
+class CStyleCastNode : public DILASTNode {
+public:
+  CStyleCastNode(clang::SourceLocation location, CompilerType type,
+                 ParseResult rhs, CStyleCastKind kind)
+      : DILASTNode(location), m_type(type), m_rhs(std::move(rhs)),
+        m_kind(kind) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override {
+    return m_kind != CStyleCastKind::kReference;
+  }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kCStyleCastNode;
+  }
+
+  CompilerType type() const { return m_type; }
+  DILASTNode *rhs() const { return m_rhs.get(); }
+  CStyleCastKind kind() const { return m_kind; }
+
+private:
+  CompilerType m_type;
+  ParseResult m_rhs;
+  CStyleCastKind m_kind;
+};
+
+class CxxStaticCastNode : public DILASTNode {
+public:
+  CxxStaticCastNode(clang::SourceLocation location, CompilerType type,
+                    ParseResult rhs, CxxStaticCastKind ki...
[truncated]

adrian-prantl · 2024-06-17T17:11:57Z

lldb/docs/dil-expr-lang.ebnf

+(* LLDB Debug Expressions, a subset of C++ *)
+(* Insired by https://www.nongnu.org/hcb *)
+
+expression = assignment_expression ;


Nice to actually have the grammar documented!

werat · 2024-06-17T20:28:43Z

lldb/docs/dil-expr-lang.ebnf

+                   | static_cast "<" type_id ">" "(" expression ")" ;
+                   | dynamic_cast "<" type_id ">" "(" expression ")" ;
+                   | reinterpret_cast "<" type_id ">" "(" expression ")" ;


Suggested change

| static_cast "<" type_id ">" "(" expression ")" ;

| dynamic_cast "<" type_id ">" "(" expression ")" ;

| reinterpret_cast "<" type_id ">" "(" expression ")" ;

| static_cast "<" type_id ">" "(" expression ")"

| dynamic_cast "<" type_id ">" "(" expression ")"

| reinterpret_cast "<" type_id ">" "(" expression ")" ;

werat · 2024-06-17T20:28:57Z

lldb/docs/dil-expr-lang.ebnf

+nested_name_specifier = type_name "::"
+                      | namespace_name '::'
+                      | nested_name_specifier identifier "::"
+                      | nested_name_specifier simple_template_id "::"


Suggested change

| nested_name_specifier simple_template_id "::"

| nested_name_specifier simple_template_id "::" ;

werat · 2024-06-17T20:29:12Z

lldb/docs/dil-expr-lang.ebnf

+builtin_func_name = "__log2" ;
+
+builtin_func_argument_list = builtin_func_argument
+                           | builtin_func_argument_list "," builtin_func_argument


Suggested change

| builtin_func_argument_list "," builtin_func_argument

| builtin_func_argument_list "," builtin_func_argument ;

werat · 2024-06-17T20:30:13Z

lldb/docs/dil-expr-lang.ebnf

+                   | static_cast "<" type_id ">" "(" expression ")" ;
+                   | dynamic_cast "<" type_id ">" "(" expression ")" ;
+                   | reinterpret_cast "<" type_id ">" "(" expression ")" ;


Not sure how I made those mistakes over time, but the semicolons are wrong in some places 😅

…uage Fix semicolons in the .ebnf file.

Michael137

Haven't gone through this in detail yet (having an example of how these structures will be used would be helpful), but something I was curious about is whether this DIL is going to be used in the implementation of a new command, or if we're going to extend the frame var language? I didn't see a concrete conclusion on this in the RFC

Michael137 · 2024-06-24T08:00:57Z

lldb/docs/dil-expr-lang.ebnf

+cast_expression = unary_expression
+                | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression


Do we really want to allow state-changing expressions in the frame var language?

It is pretty useful to be able to easily change the program state and afaik it's already possible with expr command. Generally other debugger allow this as well (e.g. Visual Studio), so I think it's a reasonable thing to do. However, for increments specifically there's a deeper reason here (and I would agree that they probably shouldn't be available for general use). See my big comment below for justification.

I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath don't exactly sound like they ought to be modifying the variable in the process of "getting" it).

On the other hand, many users confuse or conflate "frame variable" with "p" (the short-cut for the expression evaluator), the "p" certainly allows updating the variable value...

That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...

Michael137 · 2024-06-24T08:03:49Z

lldb/docs/dil-expr-lang.ebnf

+
+pointer_literal = "nullptr" ;
+
+builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;


What are builtins here? A set of functions that LLDB knows how to execute? How are those chosen?

lldb-eval (DIL's predecessor) supports exactly two builtin functions at the moment: __log2 and cryptic __findnonnull (https://github.com/google/lldb-eval/blob/323d48f0b06d7dcf6ed70cdde17b42285450a32a/lldb-eval/parser.cc#L619). The latter is commonly used in NatVis for complex visualisers, e.g. https://github.com/EQEmu/Server/blob/7918fed81cae8700d73319d4bed5c9f265dbbf3d/libs/perlbind/src/perlbind.natvis#L88

Michael137 · 2024-06-24T08:05:08Z

lldb/docs/dil-expr-lang.ebnf

+
+typedef_name = identifier ;
+
+simple_template_id = template_name "<" [template_argument_list] ">" ;


Curious what the use-case for instantiating templates is in the DIL. I guess referencing static data members of different instantiations?

Also how will this be implemented? I guess the simple-template-names infrastructure could help here?
AFAIU, in the expression evaluator, we would ideally support templates by either using modules or extending DWARF in a way that allows us to construct more faithful ASTs.

Typically this is used in casts like ((Foo<int>)some_ptr)->field. In the world of dynamic typing this is not necessary, however VisualStudio debugger doesn't support dynamic typing like this, so everyone is casting things.

Michael137 · 2024-06-24T08:08:45Z

lldb/docs/dil-expr-lang.ebnf

+                   | postfix_expression "--"
+                   | static_cast "<" type_id ">" "(" expression ")"
+                   | dynamic_cast "<" type_id ">" "(" expression ")"
+                   | reinterpret_cast "<" type_id ">" "(" expression ")" ;


Wouldn't C-style casts suffice? (i'm not sure how closely we want to model this language after C++, but these casts make it seem like we're just supporting C++)

See the reply above, C++-style casts are often used in NatVis (visual studio debugger), which lldb-eval originally aimed to support.

cmtice · 2024-06-25T04:20:04Z

Haven't gone through this in detail yet (having an example of how these structures will be used would be helpful), but something I was curious about is whether this DIL is going to be used in the implementation of a new command, or if we're going to extend the frame var language? I didn't see a concrete conclusion on this in the RFC

The plan is for the DIL implementation to eventually replace the current 'frame variable' command (see Milestone 1 in the implementation plan). Once that is done, it will be expanded to also replace calls to GetValueForExpressionPath, for parsing & handling pieces of LLDB's data formatters.

I couldn't put the whole implementation into a single PR, as it would have been much too big to review (even some of the pieces are going to be pretty big). So I broke it up into: the helper functions (which have already been committed now); the AST (this PR); and then three more PR's, each one coming after the preceding one is approved: the DIL Parser (a recursive-descent parser), which constructs (and returns) the appropriate DIL AST; the DIL Evaluator, which takes a DIL AST, evaluates it in the current context (frame), and returns an appropriate ValueObjectSP; and the code to actually hook this up to the 'frame variable' command. I have the whole thing implemented (complete with test cases). If you really want to dig through the complete code, you can see it at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ (note that I will be cleaning up the Parser & Evaluator code before actually being put into a PR). The current implementation there still has both implementations for 'frame variable' (original and DIL), and with 'original' being the default and 'DIL' being turned on by an option.

labath · 2024-06-25T07:30:51Z

I've been trying to stay out of this, as I don't think I can be objective here, but since I've been summoned, here's what I think. :)

For me, the worst possible outcome of this is an incomplete transition -- having a third (maybe fourth, since I sort of also count dwim-print) expression-like syntax, which is "almost" complete, but for whatever reason (technical or political) can't replace the previous one. With that in mind, I'd try to stage this in a way that minimizes this risk. Building the whole thing in layers from the bottom up is not the best way to achieve that, I think.

If I were doing this, I'd prioritise replacing the current "frame var" implementation as soon as possible. Ideally, the steps would be something like:

implement the minimal necessary functionality to match current "frame var" feature set. (I think that's &, [], * and ->)
flip the switch
remove the old implementation
add other features, one group of related operators at a time

Besides managing that risk (if this fades out for whatever reason, there's less dead code around), I think this would also make it easier to review, as we could have patches like "add operator/ to the DIL", which would focus solely on that, and also come with associated tests. I think it would also make it easier to discuss things like whether we want to add modifying operations (operator++) or obviously language-specific features (dynamic_cast), both of which I think are very good questions.

walter-erquinigo · 2024-06-25T16:16:43Z

I second everything that Pavel says. I think this would be the best approach. The existing frame var implementation is not that "smart", so it should be practical to fully replace it with the new DIL in its first stage.
An additional consideration is that lldb-dap, which is being more and more frequently used, relies on frame var for executing most expressions. expr is just a fallback. Therefore, replacing frame var with DIL right away has the benefit of getting tested right away by all the lldb-dap users.

Michael137 · 2024-06-26T07:41:40Z

I have the whole thing implemented (complete with test cases). If you really want to dig through the complete code, you can see it at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ (note that I will be cleaning up the Parser & Evaluator code before actually being put into a PR).

Thanks! that'll be a useful reference

Besides managing that risk (if this fades out for whatever reason, there's less dead code around), I think this would also make it easier to review, as we could have patches like "add operator/ to the DIL", which would focus solely on that, and also come with associated tests. I think it would also make it easier to discuss things like whether we want to add modifying operations (operator++) or obviously language-specific features (dynamic_cast), both of which I think are very good questions.

Makes sense!

werat · 2024-06-27T20:22:13Z

Providing more context for answers to Michael's questions.

The original motivation for lldb-eval (https://github.com/google/lldb-eval, which is a predecessor for DIL) was to support the evaluation of expressions used in NatVis rules. NatVis visualizers support complex logic, including C++-like casts, restricted function calls and state-changing operations like increment/decrement.

To be able to evaluate those expressions as is, lldb-eval had to support all of the commonly used features, hence the extensive and somewhat selective list of supported features :)

For example, reinterpret_cast<> is commonly used to cast pointers to structs/classes in order to access their fields (example -- https://github.com/godotengine/godot/blob/cae2f853dcd1ecc26ca68de08cec62089dee1f26/platform/windows/godot.natvis#L146). Visual Studio NatVis evaluator doesn't support dynamic typing (e.g. resolving foo->bar, if bar is defined in a subclass), therefore the casts are necessary. In the world of dynamic typing this may be not necessary.

State altering operations like increment/decrement are used in advanced visualizers like <CustomListItems> for control flow -- e.g. you can define a loop variable, then increment it and break out of the loop under some condition. A real life example -- https://github.com/GaijinEntertainment/DagorEngine/blob/8b340d10814f06aa364e75769d406ec7373c5574/prog/_jBuild/_natvis/dagor_types.natvis#L746. It's not visible in the grammar, but lldb-eval has additional knobs for controlling when state changing is allowed. If I recall correctly, this can be done in two cases:

the variable being incremented is a "context variable", e.g. an artificial SBValue passed to the expression as context, which doesn't exist in the program. See some explanation and example here -- https://werat.dev/blog/blazing-fast-expression-evaluation-for-c-in-lldb/#maintaning-state
when side effects are explecitely allowed via options -- https://github.com/google/lldb-eval/blob/323d48f0b06d7dcf6ed70cdde17b42285450a32a/lldb-eval/api.h#L77. In our debugger this was disabled for all visualizer-related evaluations and enabled for one-off interactive evaluations, basically mimicking the behaviour of Visual Studio debugger.

adrian-prantl · 2024-07-01T14:07:30Z

A high-level concern that I have with this is the grammar is full of C++-specific nodes, which is not something I would like to see in living in Core. Have we previously reached any consensus on how we want to handle different languages in DIL?
I would be more comfortable with moving most of this into the Plugins/Language/CPlusPlus directory and having only a generic, language-agnostic driver live in Core.

jimingham · 2024-07-01T17:35:53Z

A high-level concern that I have with this is the grammar is full of C++-specific nodes, which is not something I would like to see in living in Core. Have we previously reached any consensus on how we want to handle different languages in DIL? I would be more comfortable with moving most of this into the Plugins/Language/CPlusPlus directory and having only a generic, language-agnostic driver live in Core.

My .02:

The problem here is that the commands that use the DIL often have no idea what language the path expression they are being asked to parse is going to end up being.

(lldb) target var some_DIL_expression

could find C/ObjC/C++/Rust/Swift variables that match that expression.

The idea for the DIL is to have enough syntax to express sensible data lookup and presentation that would cover the variety of languages we support. That's going to end up looking like the languages we are using it to model, but it shouldn't be driven by them. I think that's better than having the DIL have language specific variants and the user somehow has to know to pick the right one for the context.

…uage Remove all the pieces for adding functionality beyond what the current default "frame variable" implementation does. Update the .ebnf file to reflect the current simplified version of DIL.

cmtice · 2024-07-02T15:53:18Z

As requested, I have updated this PR to only include the pieces needed to reproduce current (default) 'frame variable' behavior with DIL. I have also updated the BNF file accordingly.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

github-actions · 2024-07-02T15:53:29Z

✅ With the latest revision this PR passed the C/C++ code formatter.

cmtice · 2024-07-03T19:30:14Z

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

labath

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

That's cool. Just to confirm, have you looked at replacing target variable as well? It uses the same language as "frame var" under the hood, which but it has a somewhat different starting point (it basically ignores the local scope and looks only at globals), which means it may need some special handling in the new parser.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

This direction makes sense to me, but I think that's all the more reason to be conservative/cautious about adding new features to the language. We can't just put every possible feature of every language into it, as we'd end up with a unmaintainable mess.

labath · 2024-07-08T14:23:11Z

lldb/include/lldb/Core/DILAST.h

+class IdentifierInfo {
+private:
+  using MemberPath = std::vector<uint32_t>;
+  using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>;


elsewhere, we'd use IdentifierInfoUP

labath · 2024-07-09T08:40:16Z

lldb/docs/dil-expr-lang.ebnf

+                   | boolean_literal
+                   | pointer_literal
+                   | id_expression
+                   | "this" ;


Other languages may use a different keyword (self typically) to refer to "this" object (in which case this can refer to a regular variable). Special-casing this in the parser could make that difficult, so it may be better to treat this as a regular identifier, and only resolve it later, once we know which language we're in.

Removed the quoted strings from the grammar for now.

labath · 2024-07-09T08:47:04Z

lldb/docs/dil-expr-lang.ebnf

+boolean_literal = "true" | "false" ;
+
+pointer_literal = "nullptr" ;


true/false/nullptr are also spelled differently in different languages, and I don't believe they're necessary for the basic "frame variable" implementation. Can we skip those for now?

Would it be acceptable for me to replace the quoted strings with something like: true_token, false_token, nullptr_token?

Removed the quoted strings from the grammar for now.

labath · 2024-07-09T08:48:10Z

lldb/include/lldb/Core/DILAST.h

+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_DIL_AST_H_


LLDB_CORE_DILAST_H

labath · 2024-07-09T08:53:49Z

lldb/include/lldb/Core/DILAST.h

+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+
+namespace lldb_private {


Some of the names defined here have embedded namespace names (DILMemberInfo), while others (IdentifierInfo) have deceptively generic name even though they're specific to the DIL. What do you (all) think about putting all of this in a namespace (I suggest dil)?

Done (namespace "DIL")

It should really be a lowercase dil. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)

labath · 2024-07-09T09:41:54Z

lldb/source/Core/DILAST.cpp

+    if (str_ref_name.size() > 2 && str_ref_name[0] == ':' &&
+        str_ref_name[1] == ':')
+      str_ref_name = str_ref_name.drop_front(2);


str_ref_name.consume_front("::");

labath · 2024-07-09T09:43:32Z

lldb/source/Core/DILAST.cpp

+    ConstString tmp_name(str_ref_name);
+    if (tmp_name == name)


Constifying str_ref_name doesn't help anything, since you have to touch every byte of the string just to compute its checksum. if(name.GetStringRef() == str_ref_name) is better here.

labath · 2024-07-09T09:45:24Z

lldb/source/Core/DILAST.cpp

+  for (auto var_sp : possible_matches)
+    if (var_sp->GetName() == name) {
+      exact_match = var_sp;
+      break;
+    }


auto exact_match = llvm::find_if(possible_matches, [&](VariableSP var_sp) { return var_sp->GetName() == name; });

labath · 2024-07-09T09:53:24Z

lldb/source/Core/DILAST.cpp

+    Target *target = ctx_scope->CalculateTarget().get();
+    Process *process = ctx_scope->CalculateProcess().get();
+    if (target && process) {
+      Process::StopLocker stop_locker;


I find the presence of the StopLocker here fairly surprising. What is this trying to achieve? If something like this was necessary (I'm not sure if the current frame var implementation does that), I would expect it be be scoped around the entire expression evaluation process, and not a single operation.

Turns out it's not actually needed here. Have removed it.

labath · 2024-07-09T10:06:27Z

lldb/docs/dil-expr-lang.ebnf

+cast_expression = unary_expression
+                | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression


I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath don't exactly sound like they ought to be modifying the variable in the process of "getting" it).

cmtice · 2024-07-11T04:12:01Z

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

That's cool. Just to confirm, have you looked at replacing target variable as well? It uses the same language as "frame var" under the hood, which but it has a somewhat different starting point (it basically ignores the local scope and looks only at globals), which means it may need some special handling in the new parser.

No, I have not looked at 'target variable' at this time.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

This direction makes sense to me, but I think that's all the more reason to be conservative/cautious about adding new features to the language. We can't just put every possible feature of every language into it, as we'd end up with a unmaintainable mess.

jimingham · 2024-07-11T16:40:14Z

On Jul 10, 2024, at 9:22 PM, cmtice ***@***.***> wrote: @cmtice commented on this pull request. In lldb/include/lldb/Core/DILAST.h <#95738 (comment)>: > +/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak) +/// or not. Only applicable for C++, which is why this is here and not part of +/// the CompilerType class. When we go to dereference a pointer we need to know whether or not it's a smart pointer, because if it's a smart pointer we need to get at the pointer inside before we can dereference it. So we use IsSmartPtrType to determine whether or not to insert this layer of unwrapping (converting the smart ptr into a normal ptr).

I'm not sure it's the job of the DIL to do this sort of unwrapping. Then it would have to support all the ways this might need to be done in every language we might end up supporting. This seems more a job for the data formatters provided by the various languages. That's why Pavel was asking about the special deference Synthetic Child Provider API, I think. That provides a way for various languages to encode this sort of behavior w/o it having to be know in advance by lldb. Jim

…

— Reply to this email directly, view it on GitHub <#95738 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW4J5FKAPSUTJYXSJODZLYB7TAVCNFSM6AAAAABJNJ3CAKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZQHA4DAOJYGE>. You are receiving this because you are on a team that was mentioned.

cmtice · 2024-07-15T06:14:35Z

On Jul 10, 2024, at 9:22 PM, cmtice @.***> wrote: @cmtice commented on this pull request. In lldb/include/lldb/Core/DILAST.h <#95738 (comment)>: > +/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak) +/// or not. Only applicable for C++, which is why this is here and not part of +/// the CompilerType class. When we go to dereference a pointer we need to know whether or not it's a smart pointer, because if it's a smart pointer we need to get at the pointer inside before we can dereference it. So we use IsSmartPtrType to determine whether or not to insert this layer of unwrapping (converting the smart ptr into a normal ptr).
I'm not sure it's the job of the DIL to do this sort of unwrapping. Then it would have to support all the ways this might need to be done in every language we might end up supporting. This seems more a job for the data formatters provided by the various languages. That's why Pavel was asking about the special deference Synthetic Child Provider API, I think. That provides a way for various languages to encode this sort of behavior w/o it having to be know in advance by lldb. Jim
…
— Reply to this email directly, view it on GitHub <#95738 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW4J5FKAPSUTJYXSJODZLYB7TAVCNFSM6AAAAABJNJ3CAKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZQHA4DAOJYGE. You are receiving this because you are on a team that was mentioned.

I can use the machinery that's already there in LLDB for dereferencig the smart pointers. But I can't treat smart pointers exactly the same as normal pointers -- among other things a ValueObject whose CompilerType is a smart pointer returns False for IsPointerType(), which causes errors in the code for handling normal pointers. So as far as I can tell I still need my function (declared in DILAST.h) for telling me whether or not a type is a smart pointer type (which is what Pavel's original comment was about).

Address review comments: Remove decls for functions not defined or used in AST files; add DIL namespace, code cleanups, etc.

labath

I still have a lot of comments, but I think we're making progress.

The most important questions are about the parts where we appear to be reimplementing some existing lldb functionality. Basically all of the functionality for looking up names (for members, variables, types, etc.) appears to be more complex than necessary. For each of those things, we already have lldb functions, which one would (perhaps naively) expect to just call and have them do the right thing. I'd like to understand why that isn't the case, and whether we can do something about it (there are certainly bugs in those functions, and if that's the case it'd be better to fix those bugs than to work around them here).

At this point, perhaps the only way to figure that out is to strip them and see what breaks.

labath · 2024-07-29T13:04:53Z

lldb/docs/dil-expr-lang.ebnf

+cast_expression = unary_expression
+                | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression


That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...

labath · 2024-07-29T13:06:09Z

lldb/include/lldb/Core/DILAST.h

+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+
+namespace lldb_private {


It should really be a lowercase dil. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)

labath · 2024-07-29T13:07:11Z

lldb/include/lldb/Core/DILAST.h

+/// use_synthetic options passed, acquiring the process & target locks if
+/// appropriate.


Suggested change

/// use_synthetic options passed, acquiring the process & target locks if

/// appropriate.

/// use_synthetic options passed.

labath · 2024-07-29T13:12:43Z

lldb/include/lldb/Core/DILAST.h

+  eMemberOfNode,
+  eArraySubscriptNode,
+  eUnaryOpNode,
+  eSmartPtrToPtrDecayNode


Can we get rid of this as well (for the time being, at least)? It sounds like one of those things that may require rearchitecting to avoid smart pointer assumptions...

(And even we do end up adding it, I don't think it should be called "smart pointer decay", as, as far as this code is concerned, I think we shouldn't be making a difference between std::unique_ptr and say std::optional. I guess you could stretch the "smart pointer" definition to cover optionals, but that doesn't seem ideal)

labath · 2024-07-29T13:25:25Z

lldb/include/lldb/Core/DILAST.h

+class IdentifierInfo {
+private:
+  using MemberPath = std::vector<uint32_t>;
+  using IdentifierInfoUP = std::unique_ptr<IdentifierInfo>;


This would normally be outside of the class, so it can be used from other code as well. If you're only using it for internal code, then it probably doesn't need to exist (note you can get rid of about half of the uses of this name by using std::make_unique)

labath · 2024-07-29T13:57:54Z

lldb/source/Core/DILAST.cpp

+    }
+  }
+
+  // Find the corrent variable by matching the name.


Suggested change

// Find the corrent variable by matching the name.

// Find the current variable by matching the name.

labath · 2024-07-29T14:00:53Z

lldb/source/Core/DILAST.cpp

+  // List global variable with the same "basename". There can be many matches
+  // from other scopes (namespaces, classes), so we do additional filtering
+  // later.


It's definitely better, but I'd still like to understand what is the purpose of the subsequent filtering (lines 229--235). If we didn't have that, we could just pass 1 as the "limit" to FindGlobalVariables and potentially speed things up a lot.

(Somewhat tangentially, I'm also wondering if this function belongs in the "AST" part of DIL. I think I could see a case for putting it into the parser.)

labath · 2024-07-29T14:01:27Z

lldb/source/Core/DILAST.cpp

+  lldb::ValueObjectSP empty_obj_sp;
+  return empty_obj_sp;


return nullptr or return {} works just fine.

labath · 2024-07-29T14:05:17Z

lldb/source/Core/DILAST.cpp

+  // Internally types don't have global scope qualifier in their names and
+  // LLDB doesn't support queries with it too.
+  llvm::StringRef name_ref(name);
+  bool global_scope = false;
+
+  if (name_ref.starts_with("::")) {
+    name_ref = name_ref.drop_front(2);
+    global_scope = true;
+  }


I'm not sure that's true, or at least "true anymore". I'm pretty sure I can look up types with leading "::" now.

Sometimes it works and sometimes it doesn't. It appears that I can prefix plain global vars with '::', but I can't (e.g.) prefix namespaces with '::'. So current frame var works with '::globalVar', and also on 'ns::globalVar', but fails on '::ns::globalVar'.

I am unclear if there is any work or change you are requesting here, other than commenting on the comment?

For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.

I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.

The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.

labath · 2024-07-29T14:06:35Z

lldb/source/Core/DILAST.cpp

+  // We've found multiple types, try finding the "correct" one.
+  CompilerType full_match;
+  std::vector<CompilerType> partial_matches;
+
+  for (uint32_t i = 0; i < result_type_list.size(); ++i) {
+    CompilerType type = result_type_list[i];
+    llvm::StringRef type_name_ref = type.GetTypeName().GetStringRef();
+    ;
+
+    if (type_name_ref == name_ref)
+      full_match = type;
+    else if (type_name_ref.ends_with(name_ref))
+      partial_matches.push_back(type);
+  }
+
+  // Full match is always correct.
+  if (full_match.IsValid())
+    return full_match;
+
+  if (!global_scope) {
+    // We're looking for type, but there may be multiple candidates and which
+    // one is correct may depend on the currect scope. For now just pick the
+    // most "probable" type (pick a random one). TODO: Try to find a better way
+    // to do this.
+    if (partial_matches.size() > 0)
+      return partial_matches.back();
+  }


And if that's true, then maybe this filtering isn't needed?

…uage Address latest round of review comments.

cmtice · 2024-08-10T05:23:05Z

I think I have addressed all the review comments; please take another look. :-)

labath

Even though you've removed the smart pointer checking function (the most obvious example of language-specific behavior), I'm still worried about whether the overall design of this code will allow you to avoid it.

Looking at the IdentifierInfo class, I see that it can be constructed with only a type (the value is optional). I think this is going to be very hard to make work, since a type does not know if its a smart pointer (only a value does). Have you already figured out a way to make that work?

IndentifierInfo isn't the only suspicious place though. All of the places that are doing something with bitfield offsets are very suspicious to me, as that's something that the Values should already know.

I haven't seen the parser, but I wouldn't be surprised if this has something to do with the vexing parse issues of C languages, where one needs to know if something is a type just to parse the expression correctly. Now, if that's the case, and if really need to know if something is a smart pointer (i.e. whether it is dereferencable) without having access to a value of that type, then we may need to have a bigger discussion about what to do about that.

labath · 2024-08-12T11:37:22Z

lldb/include/lldb/Core/DILAST.h

+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/TokenKinds.h"


What are these doing here? Can this be a language-agnostic implementation if it depends on clang?

Doesn't clang compile all the languages we want to support?

Nope. Swift uses llvm, but not clang. The same goes for rust, and potentially other downstream languages.

ok, then we've run into a very serious problem: the DIL implementation (and lldb-eval, on which it was based) makes extensive use of Clang for parsing & lexing. I'm guessing the best way to handle this is to try to bundle the parsing/lexing stuff into some kind of plugin, with different plugins for the different languages? OR...is it possible that the small language subset that DIL wants to parse/lex/evaluate is roughly the same across all the languages?

I'm really not sure about the best way forward here...

Yeah, that does sound like a problem.

Putting this stuff in a plugin would be possible, but it would be a departure from what we've talked about earlier, since we would essentially end up with a different flavour of DIL for every language.

Saying that clang is the parser for the DIL might also be possible, but that would also warrant more discussion, because we have a (maybe undocumented?) rule of no clang dependencies on lldb core.

A third option might be to roll our own parser. Unlike lldb-eval, I think the DIL does not have the ambition of matching c++ (or any other language) precisely, so maybe we have some leeway to remove/skip/alter the features which make parsing c(++) the mess that it is.

Okay, so could one say that it's usage is similar to the code in CPlusPlusNameParser, which uses clang to tokenize the (function) name, but then does the parsing itself? Or does it go deeper than that?

The thing I'm thinking about is that "having all of the tokens defined" may not be such an advantage if we later add some features from other languages (I don't know, let's say the ?. operator from Swift). For lexing, that may not matter cause you could maybe lex ? and . separately and then join them later, but if it's doing more, then that may be a problem..

Yes, it's similar to what CPlusPlusNameParser does. DIL uses clang to tokenize the expression and parsing is done separately by DIL itself.

Yes, Andy (werat) is correct. I mis-spoke in my previous comment (apologies!). We use the Clang tokenizer & lexer but wrote our own parser.

Since that is the case, is this ok?

I think I'd be fine with that, but I also think it's not my call to make. @jimingham, @adrian-prantl, @bulbazord, what do you think?

I haven't been following the evolution of this PR closely, but from this comment chain it looks like the question is "Can we have clang dependencies in lldbCore?". I won't gatekeep this PR for this reason, but here are my positions:

We should avoid having clang in lldbCore if possible.

I think you should write a lexer for the DIL.

As Pavel already pointed out, plenty of folks downstream want to use LLDB to debug their programs with languages other than C/C++/ObjC/ObjC++. Swift and Rust are great examples and on occasion I see people try to add minor changes to support their languages too. Adding clang as a core dependency makes it more difficult to decouple LLDB from C/C++ specifically. LLDB has the potential to be a more language-agnostic debugger where the core of the engine is flexible and focuses on operating on common abstractions while the Plugins dictate how to interact with the specific language/OS/platform. I think LLDB is still pretty far away from being that, but myself and other contributors have done a lot of work over the last few years to bring us closer to that reality.

As for the DIL, I think we should consider the goal. If the language is meant to use C/C++ syntax to inspect data, I suppose it makes sense to use clang's lexer to tokenize input. CPlusPlusNameParser uses clang because it's trying to parse C++ names. However, using clang now makes it harder to extend to support other languages in the future. What if clang doesn't chop up my input into tokens correctly or the way I want? Pavel mentioned a case where we can work around this, but the parser is just going to be workaround after workaround trying to fit a square peg into a round hole. You've already written the parser, what's a lexer too? :)

labath · 2024-08-12T11:39:17Z

lldb/include/lldb/Core/DILAST.h

+private:
+
+public:


Suggested change

private:

public:

public:

labath · 2024-08-12T11:44:40Z

lldb/include/lldb/Core/DILAST.h

+  const std::vector<uint32_t> &GetPath() const { return m_path; }
+
+  CompilerType GetType() { return m_type; }
+  bool IsValid() const { return m_type.IsValid(); }


Could we remove this function and use an null IdentifierInfo pointer to denote invalidness?

labath · 2024-08-12T11:56:14Z

lldb/source/Core/DILAST.cpp

+  // Internally types don't have global scope qualifier in their names and
+  // LLDB doesn't support queries with it too.
+  llvm::StringRef name_ref(name);
+  bool global_scope = false;
+
+  if (name_ref.starts_with("::")) {
+    name_ref = name_ref.drop_front(2);
+    global_scope = true;
+  }


For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.

I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.

The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.

cmtice · 2024-08-14T15:04:03Z

Even though you've removed the smart pointer checking function (the most obvious example of language-specific behavior), I'm still worried about whether the overall design of this code will allow you to avoid it.

I believe I have fixed all the code to avoid any need to do smart pointer checking -- I went back and looked very closely at what the current frame variable implementation does when handling smart pointers, and tried to do the same thing, which appears to have worked.

Looking at the IdentifierInfo class, I see that it can be constructed with only a type (the value is optional). I think this is going to be very hard to make work, since a type does not know if its a smart pointer (only a value does). Have you already figured out a way to make that work?

The case where it is constructed only with a type is meant to be when handling LLDB convenience variables (the dynamic variables names "$..."); I'm not sure if those can be smart pointers?

IndentifierInfo isn't the only suspicious place though. All of the places that are doing something with bitfield offsets are very suspicious to me, as that's something that the Values should already know.

I haven't seen the parser, but I wouldn't be surprised if this has something to do with the vexing parse issues of C languages, where one needs to know if something is a type just to parse the expression correctly. Now, if that's the case, and if really need to know if something is a smart pointer (i.e. whether it is dereferencable) without having access to a value of that type, then we may need to have a bigger discussion about what to do about that.

I would be happy to share the parser code with you and walk/talk you through it, if that would help?

Address review comments.

cmtice · 2024-09-07T05:07:35Z

I think I have addressed all of the review comments now, except for the dependency on the Clang lexer. If it's ok with the reviewers, I would rather not put that change into this PR, but instead I will make that the next PR that I submit in the DIL (Data Inspection Language) series. However if you are strongly against that, I will add the new lexer to this PR as soon as I finish writing it.

So please review this again. Thank you!

cmtice · 2024-09-16T16:26:33Z

Ping? Anybody?

Michael137 · 2024-09-16T18:40:48Z

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

cmtice · 2024-09-16T21:59:25Z

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).

The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.

As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.

Does that answer all of your questions? Or is there more you would like to know or see?

Michael137 · 2024-09-17T07:56:01Z

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).

The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.

As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.

Does that answer all of your questions? Or is there more you would like to know or see?

Thanks for the breakdown! I was just wondering whether the plan was to merge this before the other PRs are up. I think it'd be preferable to merge them once they all have been approved. Though if others are ok with the current strategy I don't want to block this.

cmtice · 2024-09-17T15:15:51Z

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).
The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.
As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.
Does that answer all of your questions? Or is there more you would like to know or see?

Thanks for the breakdown! I was just wondering whether the plan was to merge this before the other PRs are up. I think it'd be preferable to merge them once they all have been approved. Though if others are ok with the current strategy I don't want to block this.

I was under the impression that the plan was to merge each PR as it was approved (I thought things might be less confusing that way). But I am certainly open to alternatives. If the folks reviewing these DIL PR's would prefer to have all the PRs available for review at the same time, I can put most of the others out for review now as well. Just let me know.

cmtice requested a review from JDevlieghere as a code owner June 17, 2024 05:46

llvmbot added the lldb label Jun 17, 2024

cmtice requested review from jim22k, jimingham, adrian-prantl and werat and removed request for jim22k June 17, 2024 05:47

adrian-prantl reviewed Jun 17, 2024

View reviewed changes

adrian-prantl requested a review from kastiglione June 17, 2024 17:12

werat reviewed Jun 17, 2024

View reviewed changes

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang…

d38cdf0

…uage Fix semicolons in the .ebnf file.

Michael137 reviewed Jun 24, 2024

View reviewed changes

Michael137 requested a review from labath June 24, 2024 08:08

Michael137 reviewed Jun 24, 2024

View reviewed changes

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang…

c230c28

…uage Remove all the pieces for adding functionality beyond what the current default "frame variable" implementation does. Update the .ebnf file to reflect the current simplified version of DIL.

Fix clang format issue.

17a5934

labath reviewed Jul 9, 2024

View reviewed changes

cmtice added 2 commits July 27, 2024 19:26

Merge remote-tracking branch 'origin/main' into dil-ast

1b4cdb7

[LLDB] Add AST node classes & functions for Data Inspection Language

d5f7e77

Address review comments: Remove decls for functions not defined or used in AST files; add DIL namespace, code cleanups, etc.

labath reviewed Jul 29, 2024

View reviewed changes

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang…

33d1345

…uage Address latest round of review comments.

labath reviewed Aug 12, 2024

View reviewed changes

cmtice added 3 commits September 2, 2024 19:59

Merge remote-tracking branch 'origin/main' into dil-ast

e9ea8b2

[LLDB] Add AST node clases, functions, etc. for Data Inspection Language

41ced65

Address review comments.

Fix clang format issues.

4ff808a

	\| nested_name_specifier simple_template_id "::"
	\| nested_name_specifier simple_template_id "::" ;

	\| builtin_func_argument_list "," builtin_func_argument
	\| builtin_func_argument_list "," builtin_func_argument ;


		pointer_literal = "nullptr" ;

		builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;


		typedef_name = identifier ;

		simple_template_id = template_name "<" [template_argument_list] ">" ;

		boolean_literal = "true" \| "false" ;

		pointer_literal = "nullptr" ;

		/// use_synthetic options passed, acquiring the process & target locks if
		/// appropriate.

	/// use_synthetic options passed, acquiring the process & target locks if
	/// appropriate.
	/// use_synthetic options passed.

	// Find the corrent variable by matching the name.
	// Find the current variable by matching the name.

		#include "clang/Basic/SourceLocation.h"
		#include "clang/Basic/TokenKinds.h"

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang… #95738

Are you sure you want to change the base?

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang… #95738

Uh oh!

Conversation

cmtice commented Jun 17, 2024

Uh oh!

llvmbot commented Jun 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Michael137 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Michael137 Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Michael137 Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmtice commented Jun 25, 2024

Uh oh!

labath commented Jun 25, 2024

Uh oh!

walter-erquinigo commented Jun 25, 2024

Uh oh!

Michael137 commented Jun 26, 2024

Uh oh!

werat commented Jun 27, 2024

Uh oh!

adrian-prantl commented Jul 1, 2024

Uh oh!

jimingham commented Jul 1, 2024

Uh oh!

cmtice commented Jul 2, 2024

Uh oh!

github-actions bot commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmtice commented Jul 3, 2024

Uh oh!

labath left a comment

Choose a reason for hiding this comment

Uh oh!

Michael137 Jun 24, 2024 •

edited

Loading

Michael137 Jun 24, 2024 •

edited

Loading

github-actions bot commented Jul 2, 2024 •

edited

Loading