Skip to content

[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang… #95738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

cmtice
Copy link
Contributor

@cmtice cmtice commented Jun 17, 2024

…uage (DIL).

The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator.

This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.

…uage (DIL).

The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes
its own parser and expression evaluator.

This change defines the AST nodes, classes and functions which are used by the
DIL parser and expression evaluator. It also adds the .ebnf file documenting
the (current) expression language handled by the DIL parser and expression
evaluator.
@cmtice cmtice requested a review from JDevlieghere as a code owner June 17, 2024 05:46
@llvmbot llvmbot added the lldb label Jun 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-lldb

Author: None (cmtice)

Changes

…uage (DIL).

The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator.

This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.


Patch is 50.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95738.diff

4 Files Affected:

  • (added) lldb/docs/dil-expr-lang.ebnf (+165)
  • (added) lldb/include/lldb/Core/DILAST.h (+690)
  • (modified) lldb/source/Core/CMakeLists.txt (+1)
  • (added) lldb/source/Core/DILAST.cpp (+568)
diff --git a/lldb/docs/dil-expr-lang.ebnf b/lldb/docs/dil-expr-lang.ebnf
new file mode 100644
index 0000000000000..40c678c25cda5
--- /dev/null
+++ b/lldb/docs/dil-expr-lang.ebnf
@@ -0,0 +1,165 @@
+(* LLDB Debug Expressions, a subset of C++ *)
+(* Insired by https://www.nongnu.org/hcb *)
+
+expression = assignment_expression ;
+
+assignment_expression = conditional_expression
+                        logical_or_expression assignment_operator assignment_expression ;
+
+assignment_operator = "="
+                    | "*="
+                    | "/="
+                    | "%="
+                    | "+="
+                    | "-="
+                    | ">>="
+                    | "<<="
+                    | "&="
+                    | "^="
+                    | "|=" ;
+
+conditional_expression = logical_or_expression
+                       | logical_or_expression "?" expression ":" assignment_expression ;
+
+logical_or_expression = logical_and_expression {"||" logical_and_expression} ;
+
+logical_and_expression = inclusive_or_expression {"&&" inclusive_or_expression} ;
+
+inclusive_or_expression = exclusive_or_expression {"|" exclusive_or_expression} ;
+
+exclusive_or_expression = and_expression {"^" and_expression} ;
+
+and_expression = equality_expression {"&" equality_expression} ;
+
+equality_expression = relational_expression {"==" relational_expression}
+                    | relational_expression {"!=" relational_expression} ;
+
+relational_expression = shift_expression {"<" shift_expression}
+                      | shift_expression {">" shift_expression}
+                      | shift_expression {"<=" shift_expression}
+                      | shift_expression {">=" shift_expression} ;
+
+shift_expression = additive_expression {"<<" additive_expression}
+                 | additive_expression {">>" additive_expression} ;
+
+additive_expression = multiplicative_expression {"+" multiplicative_expression}
+                    | multiplicative_expression {"-" multiplicative_expression} ;
+
+multiplicative_expression = cast_expression {"*" cast_expression}
+                          | cast_expression {"/" cast_expression}
+                          | cast_expression {"%" cast_expression} ;
+
+cast_expression = unary_expression
+                | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression
+                 | "++" cast_expression
+                 | "--" cast_expression
+                 | unary_operator cast_expression
+                 | "sizeof" unary_expression
+                 | "sizeof" "(" type_id ")" ;
+
+unary_operator = "*" | "&" | "+" | "-" | "!" | "~" ;
+
+postfix_expression = primary_expression
+                   | postfix_expression "[" expression "]"
+                   | postfix_expression "." id_expression
+                   | postfix_expression "->" id_expression
+                   | postfix_expression "++"
+                   | postfix_expression "--"
+                   | static_cast "<" type_id ">" "(" expression ")" ;
+                   | dynamic_cast "<" type_id ">" "(" expression ")" ;
+                   | reinterpret_cast "<" type_id ">" "(" expression ")" ;
+
+primary_expression = numeric_literal
+                   | boolean_literal
+                   | pointer_literal
+                   | id_expression
+                   | "this"
+                   | "(" expression ")"
+                   | builtin_func ;
+
+type_id = type_specifier_seq [abstract_declarator] ;
+
+type_specifier_seq = type_specifier [type_specifier_seq] ;
+
+type_specifier = simple_type_specifier
+               | cv_qualifier ;
+
+simple_type_specifier = ["::"] [nested_name_specifier] type_name
+                      | "char"
+                      | "char16_t"
+                      | "char32_t"
+                      | "wchar_t"
+                      | "bool"
+                      | "short"
+                      | "int"
+                      | "long"
+                      | "signed"
+                      | "unsigned"
+                      | "float"
+                      | "double"
+                      | "void" ;
+
+nested_name_specifier = type_name "::"
+                      | namespace_name '::'
+                      | nested_name_specifier identifier "::"
+                      | nested_name_specifier simple_template_id "::"
+
+type_name = class_name
+          | enum_name
+          | typedef_name
+          | simple_template_id ;
+
+class_name = identifier ;
+
+enum_name = identifier ;
+
+typedef_name = identifier ;
+
+simple_template_id = template_name "<" [template_argument_list] ">" ;
+
+template_name = identifier ;
+
+template_argument_list = template_argument
+                       | template_argument_list "," template_argument ;
+
+template_argument = type_id
+                  | numeric_literal
+                  | id_expression ;
+
+namespace_name = identifier ;
+
+cv_qualifier = "const" | "volatile" ;
+
+cv_qualifier_seq = cv_qualifier [cv_qualifier_seq] ;
+
+abstract_declarator = ptr_operator [abstract_declarator] ;
+
+ptr_operator = "*" [cv_qualifier_seq]
+             | "&" ;
+
+id_expression = unqualified_id
+              | qualified_id ;
+
+unqualified_id = identifier ;
+
+qualified_id = ["::"] [nested_name_specifier] unqualified_id
+             | ["::"] identifier ;
+
+identifier = ? clang::tok::identifier ? ;
+
+numeric_literal = ? clang::tok::numeric_constant ? ;
+
+boolean_literal = "true" | "false" ;
+
+pointer_literal = "nullptr" ;
+
+builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;
+
+builtin_func_name = "__log2" ;
+
+builtin_func_argument_list = builtin_func_argument
+                           | builtin_func_argument_list "," builtin_func_argument
+
+builtin_func_argument = expression ;
diff --git a/lldb/include/lldb/Core/DILAST.h b/lldb/include/lldb/Core/DILAST.h
new file mode 100644
index 0000000000000..acb9da088eda8
--- /dev/null
+++ b/lldb/include/lldb/Core/DILAST.h
@@ -0,0 +1,690 @@
+//===-- DILAST.h ------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_DIL_AST_H_
+#define LLDB_DIL_AST_H_
+
+#include <memory>
+#include <optional>
+#include <string>
+#include <variant>
+#include <vector>
+
+#include "lldb/Core/ValueObject.h"
+#include "lldb/Symbol/Type.h"
+#include "lldb/Symbol/TypeList.h"
+#include "lldb/Target/LanguageRuntime.h"
+#include "lldb/Utility/ConstString.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/TokenKinds.h"
+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+
+namespace lldb_private {
+
+/// Struct to hold information about member fields. Used by the parser for the
+/// Data Inspection Language (DIL).
+struct DILMemberInfo {
+  std::optional<std::string> name;
+  CompilerType type;
+  bool is_bitfield;
+  uint32_t bitfield_size_in_bits;
+  bool is_synthetic;
+  bool is_dynamic;
+  lldb::ValueObjectSP val_obj_sp;
+
+  explicit operator bool() const { return type.IsValid(); }
+};
+
+/// This determines if the type is a shared, unique or weak pointer, either
+/// from stdlibc++ or libc+++.
+bool IsSmartPtrType(CompilerType type);
+
+/// Finds the member field with the given name and type, stores the child index
+/// corresponding to the field in the idx vector and returns a DILMemberInfo
+/// struct with appropriate information about the field.
+DILMemberInfo GetFieldWithNameIndexPath(lldb::ValueObjectSP lhs_val_sp,
+                                        CompilerType type,
+                                        const std::string &name,
+                                        std::vector<uint32_t> *idx,
+                                        CompilerType empty_type,
+                                        bool use_synthetic, bool is_dynamic);
+
+std::tuple<DILMemberInfo, std::vector<uint32_t>>
+GetMemberInfo(lldb::ValueObjectSP lhs_val_sp, CompilerType type,
+              const std::string &name, bool use_synthetic);
+
+/// Get the appropriate ValueObjectSP, consulting the use_dynamic and
+/// use_synthetic options passed, acquiring the process & target locks if
+/// appropriate.
+lldb::ValueObjectSP
+DILGetSPWithLock(lldb::ValueObjectSP valobj_sp,
+                 lldb::DynamicValueType use_dynamic = lldb::eNoDynamicValues,
+                 bool use_synthetic = false);
+
+/// The various types DIL AST nodes (used by the DIL parser).
+enum class DILNodeKind {
+  kDILErrorNode,
+  kLiteralNode,
+  kIdentifierNode,
+  kSizeOfNode,
+  kBuiltinFunctionCallNode,
+  kCStyleCastNode,
+  kCxxStaticCastNode,
+  kCxxReinterpretCastNode,
+  kMemberOfNode,
+  kArraySubscriptNode,
+  kBinaryOpNode,
+  kUnaryOpNode,
+  kTernaryOpNode,
+  kSmartPtrToPtrDecay
+};
+
+/// The C-Style casts allowed by DIL.
+enum class CStyleCastKind {
+  kArithmetic,
+  kEnumeration,
+  kPointer,
+  kNullptr,
+  kReference,
+};
+
+/// The Cxx static casts allowed by DIL.
+enum class CxxStaticCastKind {
+  kNoOp,
+  kArithmetic,
+  kEnumeration,
+  kPointer,
+  kNullptr,
+  kBaseToDerived,
+  kDerivedToBase,
+};
+
+/// The binary operators recognized by DIL.
+enum class BinaryOpKind {
+  Mul,       // "*"
+  Div,       // "/"
+  Rem,       // "%"
+  Add,       // "+"
+  Sub,       // "-"
+  Shl,       // "<<"
+  Shr,       // ">>"
+  LT,        // "<"
+  GT,        // ">"
+  LE,        // "<="
+  GE,        // ">="
+  EQ,        // "=="
+  NE,        // "!="
+  And,       // "&"
+  Xor,       // "^"
+  Or,        // "|"
+  LAnd,      // "&&"
+  LOr,       // "||"
+  Assign,    // "="
+  MulAssign, // "*="
+  DivAssign, // "/="
+  RemAssign, // "%="
+  AddAssign, // "+="
+  SubAssign, // "-="
+  ShlAssign, // "<<="
+  ShrAssign, // ">>="
+  AndAssign, // "&="
+  XorAssign, // "^="
+  OrAssign,  // "|="
+};
+
+/// The Unary operators recognized by DIL.
+enum class UnaryOpKind {
+  PostInc, // "++"
+  PostDec, // "--"
+  PreInc,  // "++"
+  PreDec,  // "--"
+  AddrOf,  // "&"
+  Deref,   // "*"
+  Plus,    // "+"
+  Minus,   // "-"
+  Not,     // "~"
+  LNot,    // "!"
+};
+
+/// Helper functions for DIL AST node parsing.
+
+/// Translates clang tokens to BinaryOpKind.
+BinaryOpKind
+clang_token_kind_to_binary_op_kind(clang::tok::TokenKind token_kind);
+
+/// Returns bool indicating whether or not the input kind is an assignment.
+bool binary_op_kind_is_comp_assign(BinaryOpKind kind);
+
+/// Given a string representing a type, returns the CompilerType corresponding
+/// to the named type, if it exists.
+CompilerType
+ResolveTypeByName(const std::string &name,
+                  std::shared_ptr<ExecutionContextScope> ctx_scope);
+
+/// Quick lookup to check if a type name already exists in a
+/// name-to-CompilerType map the DIL parser keeps of previously found
+/// name/type pairs.
+bool IsContextVar(const std::string &name);
+
+/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak)
+/// or not. Only applicable for C++, which is why this is here and not part of
+/// the CompilerType class.
+bool IsSmartPtrType(CompilerType type);
+
+/// Class used to store & manipulate information about identifiers.
+class IdentifierInfo {
+private:
+  using MemberPath = std::vector<uint32_t>;
+  using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>;
+
+public:
+  enum class Kind {
+    kValue,
+    kContextArg,
+    kMemberPath,
+    kThisKeyword,
+  };
+
+  static IdentifierInfoPtr FromValue(lldb::ValueObjectSP value_sp) {
+    CompilerType type;
+    lldb::ValueObjectSP value = DILGetSPWithLock(value_sp);
+    if (value)
+      type = value->GetCompilerType();
+    return IdentifierInfoPtr(new IdentifierInfo(Kind::kValue, type, value, {}));
+  }
+
+  static IdentifierInfoPtr FromContextArg(CompilerType type) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(
+        new IdentifierInfo(Kind::kContextArg, type, empty_value, {}));
+  }
+
+  static IdentifierInfoPtr FromMemberPath(CompilerType type, MemberPath path) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(new IdentifierInfo(Kind::kMemberPath, type,
+                                                empty_value, std::move(path)));
+  }
+
+  static IdentifierInfoPtr FromThisKeyword(CompilerType type) {
+    lldb::ValueObjectSP empty_value;
+    return IdentifierInfoPtr(
+        new IdentifierInfo(Kind::kThisKeyword, type, empty_value, {}));
+  }
+
+  Kind kind() const { return m_kind; }
+  lldb::ValueObjectSP value() const { return m_value; }
+  const MemberPath &path() const { return m_path; }
+
+  CompilerType GetType() { return m_type; }
+  bool IsValid() const { return m_type.IsValid(); }
+
+  IdentifierInfo(Kind kind, CompilerType type, lldb::ValueObjectSP value,
+                 MemberPath path)
+      : m_kind(kind), m_type(type), m_value(std::move(value)),
+        m_path(std::move(path)) {}
+
+private:
+  Kind m_kind;
+  CompilerType m_type;
+  lldb::ValueObjectSP m_value;
+  MemberPath m_path;
+};
+
+/// Given the name of an identifier (variable name, member name, type name,
+/// etc.), find the ValueObject for that name (if it exists) and create and
+/// return an IdentifierInfo object containing all the relevant information
+/// about that object (for DIL parsing and evaluating).
+std::unique_ptr<IdentifierInfo> LookupIdentifier(
+    const std::string &name, std::shared_ptr<ExecutionContextScope> ctx_scope,
+    lldb::DynamicValueType use_dynamic, CompilerType *scope_ptr = nullptr);
+
+/// Forward declaration, for use in DIL AST nodes. Definition is at the very
+/// end of this file.
+class DILVisitor;
+
+/// The rest of the classes in this file, except for the DILVisitor class at the
+/// very end, define all the types of AST nodes used by the DIL parser and
+/// expression evaluator. The DIL parser parses the input string and creates the
+/// AST parse tree from the AST nodes. The resulting AST node tree gets passed
+/// to the DIL expression evaluator, which evaluates the DIL AST nodes and
+/// creates/returns a ValueObjectSP containing the result.
+
+/// Base class for AST nodes used by the Data Inspection Language (DIL) parser.
+/// All of the specialized types of AST nodes inherit from this (virtual) base
+/// class.
+class DILASTNode {
+public:
+  DILASTNode(clang::SourceLocation location) : location_(location) {}
+  virtual ~DILASTNode() {}
+
+  virtual void Accept(DILVisitor *v) const = 0;
+
+  virtual bool is_error() const { return false; };
+  virtual bool is_rvalue() const = 0;
+  virtual bool is_bitfield() const { return false; };
+  virtual bool is_context_var() const { return false; };
+  virtual bool is_literal_zero() const { return false; }
+  virtual uint32_t bitfield_size() const { return 0; }
+  virtual CompilerType result_type() const = 0;
+
+  virtual DILNodeKind what_am_i() const = 0;
+
+  clang::SourceLocation location() const { return location_; }
+
+  // The expression result type, but dereferenced in case it's a reference. This
+  // is for convenience, since for the purposes of the semantic analysis only
+  // the dereferenced type matters.
+  CompilerType result_type_deref() const;
+
+private:
+  clang::SourceLocation location_;
+};
+
+using ParseResult = std::unique_ptr<DILASTNode>;
+
+class DILErrorNode : public DILASTNode {
+public:
+  DILErrorNode(CompilerType empty_type)
+      : DILASTNode(clang::SourceLocation()), m_empty_type(empty_type) {}
+  void Accept(DILVisitor *v) const override;
+  bool is_error() const override { return true; }
+  bool is_rvalue() const override { return false; }
+  CompilerType result_type() const override { return m_empty_type; }
+  CompilerType result_type_real() const { return m_empty_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kDILErrorNode; }
+
+private:
+  CompilerType m_empty_type;
+};
+
+class LiteralNode : public DILASTNode {
+public:
+  template <typename ValueT>
+  LiteralNode(clang::SourceLocation location, CompilerType type, ValueT &&value,
+              bool is_literal_zero)
+      : DILASTNode(location), m_type(type),
+        m_value(std::forward<ValueT>(value)),
+        m_is_literal_zero(is_literal_zero) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  bool is_literal_zero() const override { return m_is_literal_zero; }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kLiteralNode; }
+
+  template <typename ValueT> ValueT value() const {
+    return std::get<ValueT>(m_value);
+  }
+
+  auto value() const { return m_value; }
+
+private:
+  CompilerType m_type;
+  std::variant<llvm::APInt, llvm::APFloat, bool, std::vector<char>> m_value;
+  bool m_is_literal_zero;
+};
+
+class IdentifierNode : public DILASTNode {
+public:
+  IdentifierNode(clang::SourceLocation location, std::string name,
+                 std::unique_ptr<IdentifierInfo> identifier, bool is_rvalue,
+                 bool is_context_var)
+      : DILASTNode(location), m_is_rvalue(is_rvalue),
+        m_is_context_var(is_context_var), m_name(std::move(name)),
+        m_identifier(std::move(identifier)) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return m_is_rvalue; }
+  bool is_context_var() const override { return m_is_context_var; };
+  CompilerType result_type() const override { return m_identifier->GetType(); }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kIdentifierNode;
+  }
+
+  std::string name() const { return m_name; }
+  const IdentifierInfo &info() const { return *m_identifier; }
+
+private:
+  bool m_is_rvalue;
+  bool m_is_context_var;
+  std::string m_name;
+  std::unique_ptr<IdentifierInfo> m_identifier;
+};
+
+class SizeOfNode : public DILASTNode {
+public:
+  SizeOfNode(clang::SourceLocation location, CompilerType type,
+             CompilerType operand)
+      : DILASTNode(location), m_type(type), m_operand(operand) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override { return DILNodeKind::kSizeOfNode; }
+
+  CompilerType operand() const { return m_operand; }
+
+private:
+  CompilerType m_type;
+  CompilerType m_operand;
+};
+
+class BuiltinFunctionCallNode : public DILASTNode {
+public:
+  BuiltinFunctionCallNode(clang::SourceLocation location,
+                          CompilerType result_type, std::string name,
+                          std::vector<ParseResult> arguments)
+      : DILASTNode(location), m_result_type(result_type),
+        m_name(std::move(name)), m_arguments(std::move(arguments)) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override { return true; }
+  CompilerType result_type() const override { return m_result_type; }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kBuiltinFunctionCallNode;
+  }
+
+  std::string name() const { return m_name; }
+  const std::vector<ParseResult> &arguments() const { return m_arguments; };
+
+private:
+  CompilerType m_result_type;
+  std::string m_name;
+  std::vector<ParseResult> m_arguments;
+};
+
+class CStyleCastNode : public DILASTNode {
+public:
+  CStyleCastNode(clang::SourceLocation location, CompilerType type,
+                 ParseResult rhs, CStyleCastKind kind)
+      : DILASTNode(location), m_type(type), m_rhs(std::move(rhs)),
+        m_kind(kind) {}
+
+  void Accept(DILVisitor *v) const override;
+  bool is_rvalue() const override {
+    return m_kind != CStyleCastKind::kReference;
+  }
+  CompilerType result_type() const override { return m_type; }
+  DILNodeKind what_am_i() const override {
+    return DILNodeKind::kCStyleCastNode;
+  }
+
+  CompilerType type() const { return m_type; }
+  DILASTNode *rhs() const { return m_rhs.get(); }
+  CStyleCastKind kind() const { return m_kind; }
+
+private:
+  CompilerType m_type;
+  ParseResult m_rhs;
+  CStyleCastKind m_kind;
+};
+
+class CxxStaticCastNode : public DILASTNode {
+public:
+  CxxStaticCastNode(clang::SourceLocation location, CompilerType type,
+                    ParseResult rhs, CxxStaticCastKind ki...
[truncated]

@cmtice cmtice requested review from jim22k, jimingham, adrian-prantl and werat and removed request for jim22k June 17, 2024 05:47
(* LLDB Debug Expressions, a subset of C++ *)
(* Insired by https://www.nongnu.org/hcb *)

expression = assignment_expression ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to actually have the grammar documented!

Comment on lines 70 to 72
| static_cast "<" type_id ">" "(" expression ")" ;
| dynamic_cast "<" type_id ">" "(" expression ")" ;
| reinterpret_cast "<" type_id ">" "(" expression ")" ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| static_cast "<" type_id ">" "(" expression ")" ;
| dynamic_cast "<" type_id ">" "(" expression ")" ;
| reinterpret_cast "<" type_id ">" "(" expression ")" ;
| static_cast "<" type_id ">" "(" expression ")"
| dynamic_cast "<" type_id ">" "(" expression ")"
| reinterpret_cast "<" type_id ">" "(" expression ")" ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nested_name_specifier = type_name "::"
| namespace_name '::'
| nested_name_specifier identifier "::"
| nested_name_specifier simple_template_id "::"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| nested_name_specifier simple_template_id "::"
| nested_name_specifier simple_template_id "::" ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

builtin_func_name = "__log2" ;

builtin_func_argument_list = builtin_func_argument
| builtin_func_argument_list "," builtin_func_argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| builtin_func_argument_list "," builtin_func_argument
| builtin_func_argument_list "," builtin_func_argument ;

Comment on lines 70 to 72
| static_cast "<" type_id ">" "(" expression ")" ;
| dynamic_cast "<" type_id ">" "(" expression ")" ;
| reinterpret_cast "<" type_id ">" "(" expression ")" ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I made those mistakes over time, but the semicolons are wrong in some places 😅

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gone through this in detail yet (having an example of how these structures will be used would be helpful), but something I was curious about is whether this DIL is going to be used in the implementation of a new command, or if we're going to extend the frame var language? I didn't see a concrete conclusion on this in the RFC

cast_expression = unary_expression
| "(" type_id ")" cast_expression ;

unary_expression = postfix_expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to allow state-changing expressions in the frame var language?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty useful to be able to easily change the program state and afaik it's already possible with expr command. Generally other debugger allow this as well (e.g. Visual Studio), so I think it's a reasonable thing to do. However, for increments specifically there's a deeper reason here (and I would agree that they probably shouldn't be available for general use). See my big comment below for justification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath don't exactly sound like they ought to be modifying the variable in the process of "getting" it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, many users confuse or conflate "frame variable" with "p" (the short-cut for the expression evaluator), the "p" certainly allows updating the variable value...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...


pointer_literal = "nullptr" ;

builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are builtins here? A set of functions that LLDB knows how to execute? How are those chosen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lldb-eval (DIL's predecessor) supports exactly two builtin functions at the moment: __log2 and cryptic __findnonnull (https://github.com/google/lldb-eval/blob/323d48f0b06d7dcf6ed70cdde17b42285450a32a/lldb-eval/parser.cc#L619). The latter is commonly used in NatVis for complex visualisers, e.g. https://github.com/EQEmu/Server/blob/7918fed81cae8700d73319d4bed5c9f265dbbf3d/libs/perlbind/src/perlbind.natvis#L88


typedef_name = identifier ;

simple_template_id = template_name "<" [template_argument_list] ">" ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the use-case for instantiating templates is in the DIL. I guess referencing static data members of different instantiations?

Copy link
Member

@Michael137 Michael137 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how will this be implemented? I guess the simple-template-names infrastructure could help here?
AFAIU, in the expression evaluator, we would ideally support templates by either using modules or extending DWARF in a way that allows us to construct more faithful ASTs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically this is used in casts like ((Foo<int>)some_ptr)->field. In the world of dynamic typing this is not necessary, however VisualStudio debugger doesn't support dynamic typing like this, so everyone is casting things.

@Michael137 Michael137 requested a review from labath June 24, 2024 08:08
| postfix_expression "--"
| static_cast "<" type_id ">" "(" expression ")"
| dynamic_cast "<" type_id ">" "(" expression ")"
| reinterpret_cast "<" type_id ">" "(" expression ")" ;
Copy link
Member

@Michael137 Michael137 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't C-style casts suffice? (i'm not sure how closely we want to model this language after C++, but these casts make it seem like we're just supporting C++)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the reply above, C++-style casts are often used in NatVis (visual studio debugger), which lldb-eval originally aimed to support.

@cmtice
Copy link
Contributor Author

cmtice commented Jun 25, 2024

Haven't gone through this in detail yet (having an example of how these structures will be used would be helpful), but something I was curious about is whether this DIL is going to be used in the implementation of a new command, or if we're going to extend the frame var language? I didn't see a concrete conclusion on this in the RFC

The plan is for the DIL implementation to eventually replace the current 'frame variable' command (see Milestone 1 in the implementation plan). Once that is done, it will be expanded to also replace calls to GetValueForExpressionPath, for parsing & handling pieces of LLDB's data formatters.

I couldn't put the whole implementation into a single PR, as it would have been much too big to review (even some of the pieces are going to be pretty big). So I broke it up into: the helper functions (which have already been committed now); the AST (this PR); and then three more PR's, each one coming after the preceding one is approved: the DIL Parser (a recursive-descent parser), which constructs (and returns) the appropriate DIL AST; the DIL Evaluator, which takes a DIL AST, evaluates it in the current context (frame), and returns an appropriate ValueObjectSP; and the code to actually hook this up to the 'frame variable' command. I have the whole thing implemented (complete with test cases). If you really want to dig through the complete code, you can see it at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ (note that I will be cleaning up the Parser & Evaluator code before actually being put into a PR). The current implementation there still has both implementations for 'frame variable' (original and DIL), and with 'original' being the default and 'DIL' being turned on by an option.

@labath
Copy link
Collaborator

labath commented Jun 25, 2024

I've been trying to stay out of this, as I don't think I can be objective here, but since I've been summoned, here's what I think. :)

For me, the worst possible outcome of this is an incomplete transition -- having a third (maybe fourth, since I sort of also count dwim-print) expression-like syntax, which is "almost" complete, but for whatever reason (technical or political) can't replace the previous one. With that in mind, I'd try to stage this in a way that minimizes this risk. Building the whole thing in layers from the bottom up is not the best way to achieve that, I think.

If I were doing this, I'd prioritise replacing the current "frame var" implementation as soon as possible. Ideally, the steps would be something like:

  1. implement the minimal necessary functionality to match current "frame var" feature set. (I think that's &, [], * and ->)
  2. flip the switch
  3. remove the old implementation
  4. add other features, one group of related operators at a time

Besides managing that risk (if this fades out for whatever reason, there's less dead code around), I think this would also make it easier to review, as we could have patches like "add operator/ to the DIL", which would focus solely on that, and also come with associated tests. I think it would also make it easier to discuss things like whether we want to add modifying operations (operator++) or obviously language-specific features (dynamic_cast), both of which I think are very good questions.

@walter-erquinigo
Copy link
Member

I second everything that Pavel says. I think this would be the best approach. The existing frame var implementation is not that "smart", so it should be practical to fully replace it with the new DIL in its first stage.
An additional consideration is that lldb-dap, which is being more and more frequently used, relies on frame var for executing most expressions. expr is just a fallback. Therefore, replacing frame var with DIL right away has the benefit of getting tested right away by all the lldb-dap users.

@Michael137
Copy link
Member

I have the whole thing implemented (complete with test cases). If you really want to dig through the complete code, you can see it at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ (note that I will be cleaning up the Parser & Evaluator code before actually being put into a PR).

Thanks! that'll be a useful reference

Besides managing that risk (if this fades out for whatever reason, there's less dead code around), I think this would also make it easier to review, as we could have patches like "add operator/ to the DIL", which would focus solely on that, and also come with associated tests. I think it would also make it easier to discuss things like whether we want to add modifying operations (operator++) or obviously language-specific features (dynamic_cast), both of which I think are very good questions.

Makes sense!

@werat
Copy link
Member

werat commented Jun 27, 2024

Providing more context for answers to Michael's questions.

The original motivation for lldb-eval (https://github.com/google/lldb-eval, which is a predecessor for DIL) was to support the evaluation of expressions used in NatVis rules. NatVis visualizers support complex logic, including C++-like casts, restricted function calls and state-changing operations like increment/decrement.

To be able to evaluate those expressions as is, lldb-eval had to support all of the commonly used features, hence the extensive and somewhat selective list of supported features :)

For example, reinterpret_cast<> is commonly used to cast pointers to structs/classes in order to access their fields (example -- https://github.com/godotengine/godot/blob/cae2f853dcd1ecc26ca68de08cec62089dee1f26/platform/windows/godot.natvis#L146). Visual Studio NatVis evaluator doesn't support dynamic typing (e.g. resolving foo->bar, if bar is defined in a subclass), therefore the casts are necessary. In the world of dynamic typing this may be not necessary.

State altering operations like increment/decrement are used in advanced visualizers like <CustomListItems> for control flow -- e.g. you can define a loop variable, then increment it and break out of the loop under some condition. A real life example -- https://github.com/GaijinEntertainment/DagorEngine/blob/8b340d10814f06aa364e75769d406ec7373c5574/prog/_jBuild/_natvis/dagor_types.natvis#L746. It's not visible in the grammar, but lldb-eval has additional knobs for controlling when state changing is allowed. If I recall correctly, this can be done in two cases:

  1. the variable being incremented is a "context variable", e.g. an artificial SBValue passed to the expression as context, which doesn't exist in the program. See some explanation and example here -- https://werat.dev/blog/blazing-fast-expression-evaluation-for-c-in-lldb/#maintaning-state
  2. when side effects are explecitely allowed via options -- https://github.com/google/lldb-eval/blob/323d48f0b06d7dcf6ed70cdde17b42285450a32a/lldb-eval/api.h#L77. In our debugger this was disabled for all visualizer-related evaluations and enabled for one-off interactive evaluations, basically mimicking the behaviour of Visual Studio debugger.

@adrian-prantl
Copy link
Collaborator

A high-level concern that I have with this is the grammar is full of C++-specific nodes, which is not something I would like to see in living in Core. Have we previously reached any consensus on how we want to handle different languages in DIL?
I would be more comfortable with moving most of this into the Plugins/Language/CPlusPlus directory and having only a generic, language-agnostic driver live in Core.

@jimingham
Copy link
Collaborator

A high-level concern that I have with this is the grammar is full of C++-specific nodes, which is not something I would like to see in living in Core. Have we previously reached any consensus on how we want to handle different languages in DIL? I would be more comfortable with moving most of this into the Plugins/Language/CPlusPlus directory and having only a generic, language-agnostic driver live in Core.

My .02:

The problem here is that the commands that use the DIL often have no idea what language the path expression they are being asked to parse is going to end up being.

(lldb) target var some_DIL_expression

could find C/ObjC/C++/Rust/Swift variables that match that expression.

The idea for the DIL is to have enough syntax to express sensible data lookup and presentation that would cover the variety of languages we support. That's going to end up looking like the languages we are using it to model, but it shouldn't be driven by them. I think that's better than having the DIL have language specific variants and the user somehow has to know to pick the right one for the context.

…uage

Remove all the pieces for adding functionality beyond what the current
default "frame variable" implementation does. Update the .ebnf file to
reflect the current simplified version of DIL.
@cmtice
Copy link
Contributor Author

cmtice commented Jul 2, 2024

As requested, I have updated this PR to only include the pieces needed to reproduce current (default) 'frame variable' behavior with DIL. I have also updated the BNF file accordingly.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

Copy link

github-actions bot commented Jul 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@cmtice
Copy link
Contributor Author

cmtice commented Jul 3, 2024

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

That's cool. Just to confirm, have you looked at replacing target variable as well? It uses the same language as "frame var" under the hood, which but it has a somewhat different starting point (it basically ignores the local scope and looks only at globals), which means it may need some special handling in the new parser.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

This direction makes sense to me, but I think that's all the more reason to be conservative/cautious about adding new features to the language. We can't just put every possible feature of every language into it, as we'd end up with a unmaintainable mess.

class IdentifierInfo {
private:
using MemberPath = std::vector<uint32_t>;
using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere, we'd use IdentifierInfoUP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

| boolean_literal
| pointer_literal
| id_expression
| "this" ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other languages may use a different keyword (self typically) to refer to "this" object (in which case this can refer to a regular variable). Special-casing this in the parser could make that difficult, so it may be better to treat this as a regular identifier, and only resolve it later, once we know which language we're in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the quoted strings from the grammar for now.

Comment on lines 41 to 43
boolean_literal = "true" | "false" ;

pointer_literal = "nullptr" ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true/false/nullptr are also spelled differently in different languages, and I don't believe they're necessary for the basic "frame variable" implementation. Can we skip those for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be acceptable for me to replace the quoted strings with something like: true_token, false_token, nullptr_token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the quoted strings from the grammar for now.

//
//===----------------------------------------------------------------------===//

#ifndef LLDB_DIL_AST_H_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLDB_CORE_DILAST_H

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"

namespace lldb_private {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the names defined here have embedded namespace names (DILMemberInfo), while others (IdentifierInfo) have deceptively generic name even though they're specific to the DIL. What do you (all) think about putting all of this in a namespace (I suggest dil)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (namespace "DIL")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should really be a lowercase dil. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 244 to 246
if (str_ref_name.size() > 2 && str_ref_name[0] == ':' &&
str_ref_name[1] == ':')
str_ref_name = str_ref_name.drop_front(2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str_ref_name.consume_front("::");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 248 to 249
ConstString tmp_name(str_ref_name);
if (tmp_name == name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constifying str_ref_name doesn't help anything, since you have to touch every byte of the string just to compute its checksum. if(name.GetStringRef() == str_ref_name) is better here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 256 to 260
for (auto var_sp : possible_matches)
if (var_sp->GetName() == name) {
exact_match = var_sp;
break;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto exact_match = llvm::find_if(possible_matches, [&](VariableSP var_sp) { return var_sp->GetName() == name; });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Target *target = ctx_scope->CalculateTarget().get();
Process *process = ctx_scope->CalculateProcess().get();
if (target && process) {
Process::StopLocker stop_locker;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the presence of the StopLocker here fairly surprising. What is this trying to achieve? If something like this was necessary (I'm not sure if the current frame var implementation does that), I would expect it be be scoped around the entire expression evaluation process, and not a single operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it's not actually needed here. Have removed it.

cast_expression = unary_expression
| "(" type_id ")" cast_expression ;

unary_expression = postfix_expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath don't exactly sound like they ought to be modifying the variable in the process of "getting" it).

@cmtice
Copy link
Contributor Author

cmtice commented Jul 11, 2024

BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.

That's cool. Just to confirm, have you looked at replacing target variable as well? It uses the same language as "frame var" under the hood, which but it has a somewhat different starting point (it basically ignores the local scope and looks only at globals), which means it may need some special handling in the new parser.

No, I have not looked at 'target variable' at this time.

I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.

This direction makes sense to me, but I think that's all the more reason to be conservative/cautious about adding new features to the language. We can't just put every possible feature of every language into it, as we'd end up with a unmaintainable mess.

@jimingham
Copy link
Collaborator

jimingham commented Jul 11, 2024 via email

@cmtice
Copy link
Contributor Author

cmtice commented Jul 15, 2024

On Jul 10, 2024, at 9:22 PM, cmtice @.***> wrote: @cmtice commented on this pull request. In lldb/include/lldb/Core/DILAST.h <#95738 (comment)>: > +/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak) +/// or not. Only applicable for C++, which is why this is here and not part of +/// the CompilerType class. When we go to dereference a pointer we need to know whether or not it's a smart pointer, because if it's a smart pointer we need to get at the pointer inside before we can dereference it. So we use IsSmartPtrType to determine whether or not to insert this layer of unwrapping (converting the smart ptr into a normal ptr).
I'm not sure it's the job of the DIL to do this sort of unwrapping. Then it would have to support all the ways this might need to be done in every language we might end up supporting. This seems more a job for the data formatters provided by the various languages. That's why Pavel was asking about the special deference Synthetic Child Provider API, I think. That provides a way for various languages to encode this sort of behavior w/o it having to be know in advance by lldb. Jim

— Reply to this email directly, view it on GitHub <#95738 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW4J5FKAPSUTJYXSJODZLYB7TAVCNFSM6AAAAABJNJ3CAKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZQHA4DAOJYGE. You are receiving this because you are on a team that was mentioned.

I can use the machinery that's already there in LLDB for dereferencig the smart pointers. But I can't treat smart pointers exactly the same as normal pointers -- among other things a ValueObject whose CompilerType is a smart pointer returns False for IsPointerType(), which causes errors in the code for handling normal pointers. So as far as I can tell I still need my function (declared in DILAST.h) for telling me whether or not a type is a smart pointer type (which is what Pavel's original comment was about).

cmtice added 2 commits July 27, 2024 19:26
Address review comments:
Remove decls for functions not defined or used in AST files; add
DIL namespace, code cleanups, etc.
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a lot of comments, but I think we're making progress.

The most important questions are about the parts where we appear to be reimplementing some existing lldb functionality. Basically all of the functionality for looking up names (for members, variables, types, etc.) appears to be more complex than necessary. For each of those things, we already have lldb functions, which one would (perhaps naively) expect to just call and have them do the right thing. I'd like to understand why that isn't the case, and whether we can do something about it (there are certainly bugs in those functions, and if that's the case it'd be better to fix those bugs than to work around them here).

At this point, perhaps the only way to figure that out is to strip them and see what breaks.

cast_expression = unary_expression
| "(" type_id ")" cast_expression ;

unary_expression = postfix_expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...

#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"

namespace lldb_private {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should really be a lowercase dil. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)

Comment on lines 45 to 46
/// use_synthetic options passed, acquiring the process & target locks if
/// appropriate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// use_synthetic options passed, acquiring the process & target locks if
/// appropriate.
/// use_synthetic options passed.

eMemberOfNode,
eArraySubscriptNode,
eUnaryOpNode,
eSmartPtrToPtrDecayNode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this as well (for the time being, at least)? It sounds like one of those things that may require rearchitecting to avoid smart pointer assumptions...

(And even we do end up adding it, I don't think it should be called "smart pointer decay", as, as far as this code is concerned, I think we shouldn't be making a difference between std::unique_ptr and say std::optional. I guess you could stretch the "smart pointer" definition to cover optionals, but that doesn't seem ideal)

class IdentifierInfo {
private:
using MemberPath = std::vector<uint32_t>;
using IdentifierInfoUP = std::unique_ptr<IdentifierInfo>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would normally be outside of the class, so it can be used from other code as well. If you're only using it for internal code, then it probably doesn't need to exist (note you can get rid of about half of the uses of this name by using std::make_unique)

}
}

// Find the corrent variable by matching the name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find the corrent variable by matching the name.
// Find the current variable by matching the name.

Comment on lines +89 to +91
// List global variable with the same "basename". There can be many matches
// from other scopes (namespaces, classes), so we do additional filtering
// later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better, but I'd still like to understand what is the purpose of the subsequent filtering (lines 229--235). If we didn't have that, we could just pass 1 as the "limit" to FindGlobalVariables and potentially speed things up a lot.

(Somewhat tangentially, I'm also wondering if this function belongs in the "AST" part of DIL. I think I could see a case for putting it into the parser.)

Comment on lines 236 to 237
lldb::ValueObjectSP empty_obj_sp;
return empty_obj_sp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nullptr or return {} works just fine.

Comment on lines 263 to 271
// Internally types don't have global scope qualifier in their names and
// LLDB doesn't support queries with it too.
llvm::StringRef name_ref(name);
bool global_scope = false;

if (name_ref.starts_with("::")) {
name_ref = name_ref.drop_front(2);
global_scope = true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's true, or at least "true anymore". I'm pretty sure I can look up types with leading "::" now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it works and sometimes it doesn't. It appears that I can prefix plain global vars with '::', but I can't (e.g.) prefix namespaces with '::'. So current frame var works with '::globalVar', and also on 'ns::globalVar', but fails on '::ns::globalVar'.

I am unclear if there is any work or change you are requesting here, other than commenting on the comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.

I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.

The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.

Comment on lines 304 to 330
// We've found multiple types, try finding the "correct" one.
CompilerType full_match;
std::vector<CompilerType> partial_matches;

for (uint32_t i = 0; i < result_type_list.size(); ++i) {
CompilerType type = result_type_list[i];
llvm::StringRef type_name_ref = type.GetTypeName().GetStringRef();
;

if (type_name_ref == name_ref)
full_match = type;
else if (type_name_ref.ends_with(name_ref))
partial_matches.push_back(type);
}

// Full match is always correct.
if (full_match.IsValid())
return full_match;

if (!global_scope) {
// We're looking for type, but there may be multiple candidates and which
// one is correct may depend on the currect scope. For now just pick the
// most "probable" type (pick a random one). TODO: Try to find a better way
// to do this.
if (partial_matches.size() > 0)
return partial_matches.back();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if that's true, then maybe this filtering isn't needed?

…uage

Address latest round of review comments.
@cmtice
Copy link
Contributor Author

cmtice commented Aug 10, 2024

I think I have addressed all the review comments; please take another look. :-)

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you've removed the smart pointer checking function (the most obvious example of language-specific behavior), I'm still worried about whether the overall design of this code will allow you to avoid it.

Looking at the IdentifierInfo class, I see that it can be constructed with only a type (the value is optional). I think this is going to be very hard to make work, since a type does not know if its a smart pointer (only a value does). Have you already figured out a way to make that work?

IndentifierInfo isn't the only suspicious place though. All of the places that are doing something with bitfield offsets are very suspicious to me, as that's something that the Values should already know.

I haven't seen the parser, but I wouldn't be surprised if this has something to do with the vexing parse issues of C languages, where one needs to know if something is a type just to parse the expression correctly. Now, if that's the case, and if really need to know if something is a smart pointer (i.e. whether it is dereferencable) without having access to a value of that type, then we may need to have a bigger discussion about what to do about that.

Comment on lines +23 to +24
#include "clang/Basic/SourceLocation.h"
#include "clang/Basic/TokenKinds.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these doing here? Can this be a language-agnostic implementation if it depends on clang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't clang compile all the languages we want to support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Swift uses llvm, but not clang. The same goes for rust, and potentially other downstream languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then we've run into a very serious problem: the DIL implementation (and lldb-eval, on which it was based) makes extensive use of Clang for parsing & lexing. I'm guessing the best way to handle this is to try to bundle the parsing/lexing stuff into some kind of plugin, with different plugins for the different languages? OR...is it possible that the small language subset that DIL wants to parse/lex/evaluate is roughly the same across all the languages?

I'm really not sure about the best way forward here...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that does sound like a problem.

Putting this stuff in a plugin would be possible, but it would be a departure from what we've talked about earlier, since we would essentially end up with a different flavour of DIL for every language.

Saying that clang is the parser for the DIL might also be possible, but that would also warrant more discussion, because we have a (maybe undocumented?) rule of no clang dependencies on lldb core.

A third option might be to roll our own parser. Unlike lldb-eval, I think the DIL does not have the ambition of matching c++ (or any other language) precisely, so maybe we have some leeway to remove/skip/alter the features which make parsing c(++) the mess that it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so could one say that it's usage is similar to the code in CPlusPlusNameParser, which uses clang to tokenize the (function) name, but then does the parsing itself? Or does it go deeper than that?

The thing I'm thinking about is that "having all of the tokens defined" may not be such an advantage if we later add some features from other languages (I don't know, let's say the ?. operator from Swift). For lexing, that may not matter cause you could maybe lex ? and . separately and then join them later, but if it's doing more, then that may be a problem..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's similar to what CPlusPlusNameParser does. DIL uses clang to tokenize the expression and parsing is done separately by DIL itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Andy (werat) is correct. I mis-spoke in my previous comment (apologies!). We use the Clang tokenizer & lexer but wrote our own parser.

Since that is the case, is this ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be fine with that, but I also think it's not my call to make. @jimingham, @adrian-prantl, @bulbazord, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been following the evolution of this PR closely, but from this comment chain it looks like the question is "Can we have clang dependencies in lldbCore?". I won't gatekeep this PR for this reason, but here are my positions:

  1. We should avoid having clang in lldbCore if possible.
  2. I think you should write a lexer for the DIL.

As Pavel already pointed out, plenty of folks downstream want to use LLDB to debug their programs with languages other than C/C++/ObjC/ObjC++. Swift and Rust are great examples and on occasion I see people try to add minor changes to support their languages too. Adding clang as a core dependency makes it more difficult to decouple LLDB from C/C++ specifically. LLDB has the potential to be a more language-agnostic debugger where the core of the engine is flexible and focuses on operating on common abstractions while the Plugins dictate how to interact with the specific language/OS/platform. I think LLDB is still pretty far away from being that, but myself and other contributors have done a lot of work over the last few years to bring us closer to that reality.

As for the DIL, I think we should consider the goal. If the language is meant to use C/C++ syntax to inspect data, I suppose it makes sense to use clang's lexer to tokenize input. CPlusPlusNameParser uses clang because it's trying to parse C++ names. However, using clang now makes it harder to extend to support other languages in the future. What if clang doesn't chop up my input into tokens correctly or the way I want? Pavel mentioned a case where we can work around this, but the parser is just going to be workaround after workaround trying to fit a square peg into a round hole. You've already written the parser, what's a lexer too? :)

Comment on lines 84 to 86
private:

public:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private:
public:
public:

const std::vector<uint32_t> &GetPath() const { return m_path; }

CompilerType GetType() { return m_type; }
bool IsValid() const { return m_type.IsValid(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this function and use an null IdentifierInfo pointer to denote invalidness?

Comment on lines 263 to 271
// Internally types don't have global scope qualifier in their names and
// LLDB doesn't support queries with it too.
llvm::StringRef name_ref(name);
bool global_scope = false;

if (name_ref.starts_with("::")) {
name_ref = name_ref.drop_front(2);
global_scope = true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.

I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.

The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.

@cmtice
Copy link
Contributor Author

cmtice commented Aug 14, 2024

Even though you've removed the smart pointer checking function (the most obvious example of language-specific behavior), I'm still worried about whether the overall design of this code will allow you to avoid it.

I believe I have fixed all the code to avoid any need to do smart pointer checking -- I went back and looked very closely at what the current frame variable implementation does when handling smart pointers, and tried to do the same thing, which appears to have worked.

Looking at the IdentifierInfo class, I see that it can be constructed with only a type (the value is optional). I think this is going to be very hard to make work, since a type does not know if its a smart pointer (only a value does). Have you already figured out a way to make that work?

The case where it is constructed only with a type is meant to be when handling LLDB convenience variables (the dynamic variables names "$..."); I'm not sure if those can be smart pointers?

IndentifierInfo isn't the only suspicious place though. All of the places that are doing something with bitfield offsets are very suspicious to me, as that's something that the Values should already know.

I haven't seen the parser, but I wouldn't be surprised if this has something to do with the vexing parse issues of C languages, where one needs to know if something is a type just to parse the expression correctly. Now, if that's the case, and if really need to know if something is a smart pointer (i.e. whether it is dereferencable) without having access to a value of that type, then we may need to have a bigger discussion about what to do about that.

I would be happy to share the parser code with you and walk/talk you through it, if that would help?

@cmtice
Copy link
Contributor Author

cmtice commented Sep 7, 2024

I think I have addressed all of the review comments now, except for the dependency on the Clang lexer. If it's ok with the reviewers, I would rather not put that change into this PR, but instead I will make that the next PR that I submit in the DIL (Data Inspection Language) series. However if you are strongly against that, I will add the new lexer to this PR as soon as I finish writing it.

So please review this again. Thank you!

@cmtice
Copy link
Contributor Author

cmtice commented Sep 16, 2024

Ping? Anybody?

@Michael137
Copy link
Member

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

@cmtice
Copy link
Contributor Author

cmtice commented Sep 16, 2024

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).

The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.

As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.

Does that answer all of your questions? Or is there more you would like to know or see?

@Michael137
Copy link
Member

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).

The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.

As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.

Does that answer all of your questions? Or is there more you would like to know or see?

Thanks for the breakdown! I was just wondering whether the plan was to merge this before the other PRs are up. I think it'd be preferable to merge them once they all have been approved. Though if others are ok with the current strategy I don't want to block this.

@cmtice
Copy link
Contributor Author

cmtice commented Sep 17, 2024

Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used

There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle).
The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code.
As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR.
Does that answer all of your questions? Or is there more you would like to know or see?

Thanks for the breakdown! I was just wondering whether the plan was to merge this before the other PRs are up. I think it'd be preferable to merge them once they all have been approved. Though if others are ok with the current strategy I don't want to block this.

I was under the impression that the plan was to merge each PR as it was approved (I thought things might be less confusing that way). But I am certainly open to alternatives. If the folks reviewing these DIL PR's would prefer to have all the PRs available for review at the same time, I can put most of the others out for review now as well. Just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants