-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLDB] Add AST node classes, functions, etc. for Data Inspection Lang… #95738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…uage (DIL). The Data Inspection Language (DIL), described in https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator. This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.
@llvm/pr-subscribers-lldb Author: None (cmtice) Changes…uage (DIL). The Data Inspection Language (DIL), described in This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator. Patch is 50.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95738.diff 4 Files Affected:
diff --git a/lldb/docs/dil-expr-lang.ebnf b/lldb/docs/dil-expr-lang.ebnf
new file mode 100644
index 0000000000000..40c678c25cda5
--- /dev/null
+++ b/lldb/docs/dil-expr-lang.ebnf
@@ -0,0 +1,165 @@
+(* LLDB Debug Expressions, a subset of C++ *)
+(* Insired by https://www.nongnu.org/hcb *)
+
+expression = assignment_expression ;
+
+assignment_expression = conditional_expression
+ logical_or_expression assignment_operator assignment_expression ;
+
+assignment_operator = "="
+ | "*="
+ | "/="
+ | "%="
+ | "+="
+ | "-="
+ | ">>="
+ | "<<="
+ | "&="
+ | "^="
+ | "|=" ;
+
+conditional_expression = logical_or_expression
+ | logical_or_expression "?" expression ":" assignment_expression ;
+
+logical_or_expression = logical_and_expression {"||" logical_and_expression} ;
+
+logical_and_expression = inclusive_or_expression {"&&" inclusive_or_expression} ;
+
+inclusive_or_expression = exclusive_or_expression {"|" exclusive_or_expression} ;
+
+exclusive_or_expression = and_expression {"^" and_expression} ;
+
+and_expression = equality_expression {"&" equality_expression} ;
+
+equality_expression = relational_expression {"==" relational_expression}
+ | relational_expression {"!=" relational_expression} ;
+
+relational_expression = shift_expression {"<" shift_expression}
+ | shift_expression {">" shift_expression}
+ | shift_expression {"<=" shift_expression}
+ | shift_expression {">=" shift_expression} ;
+
+shift_expression = additive_expression {"<<" additive_expression}
+ | additive_expression {">>" additive_expression} ;
+
+additive_expression = multiplicative_expression {"+" multiplicative_expression}
+ | multiplicative_expression {"-" multiplicative_expression} ;
+
+multiplicative_expression = cast_expression {"*" cast_expression}
+ | cast_expression {"/" cast_expression}
+ | cast_expression {"%" cast_expression} ;
+
+cast_expression = unary_expression
+ | "(" type_id ")" cast_expression ;
+
+unary_expression = postfix_expression
+ | "++" cast_expression
+ | "--" cast_expression
+ | unary_operator cast_expression
+ | "sizeof" unary_expression
+ | "sizeof" "(" type_id ")" ;
+
+unary_operator = "*" | "&" | "+" | "-" | "!" | "~" ;
+
+postfix_expression = primary_expression
+ | postfix_expression "[" expression "]"
+ | postfix_expression "." id_expression
+ | postfix_expression "->" id_expression
+ | postfix_expression "++"
+ | postfix_expression "--"
+ | static_cast "<" type_id ">" "(" expression ")" ;
+ | dynamic_cast "<" type_id ">" "(" expression ")" ;
+ | reinterpret_cast "<" type_id ">" "(" expression ")" ;
+
+primary_expression = numeric_literal
+ | boolean_literal
+ | pointer_literal
+ | id_expression
+ | "this"
+ | "(" expression ")"
+ | builtin_func ;
+
+type_id = type_specifier_seq [abstract_declarator] ;
+
+type_specifier_seq = type_specifier [type_specifier_seq] ;
+
+type_specifier = simple_type_specifier
+ | cv_qualifier ;
+
+simple_type_specifier = ["::"] [nested_name_specifier] type_name
+ | "char"
+ | "char16_t"
+ | "char32_t"
+ | "wchar_t"
+ | "bool"
+ | "short"
+ | "int"
+ | "long"
+ | "signed"
+ | "unsigned"
+ | "float"
+ | "double"
+ | "void" ;
+
+nested_name_specifier = type_name "::"
+ | namespace_name '::'
+ | nested_name_specifier identifier "::"
+ | nested_name_specifier simple_template_id "::"
+
+type_name = class_name
+ | enum_name
+ | typedef_name
+ | simple_template_id ;
+
+class_name = identifier ;
+
+enum_name = identifier ;
+
+typedef_name = identifier ;
+
+simple_template_id = template_name "<" [template_argument_list] ">" ;
+
+template_name = identifier ;
+
+template_argument_list = template_argument
+ | template_argument_list "," template_argument ;
+
+template_argument = type_id
+ | numeric_literal
+ | id_expression ;
+
+namespace_name = identifier ;
+
+cv_qualifier = "const" | "volatile" ;
+
+cv_qualifier_seq = cv_qualifier [cv_qualifier_seq] ;
+
+abstract_declarator = ptr_operator [abstract_declarator] ;
+
+ptr_operator = "*" [cv_qualifier_seq]
+ | "&" ;
+
+id_expression = unqualified_id
+ | qualified_id ;
+
+unqualified_id = identifier ;
+
+qualified_id = ["::"] [nested_name_specifier] unqualified_id
+ | ["::"] identifier ;
+
+identifier = ? clang::tok::identifier ? ;
+
+numeric_literal = ? clang::tok::numeric_constant ? ;
+
+boolean_literal = "true" | "false" ;
+
+pointer_literal = "nullptr" ;
+
+builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ;
+
+builtin_func_name = "__log2" ;
+
+builtin_func_argument_list = builtin_func_argument
+ | builtin_func_argument_list "," builtin_func_argument
+
+builtin_func_argument = expression ;
diff --git a/lldb/include/lldb/Core/DILAST.h b/lldb/include/lldb/Core/DILAST.h
new file mode 100644
index 0000000000000..acb9da088eda8
--- /dev/null
+++ b/lldb/include/lldb/Core/DILAST.h
@@ -0,0 +1,690 @@
+//===-- DILAST.h ------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_DIL_AST_H_
+#define LLDB_DIL_AST_H_
+
+#include <memory>
+#include <optional>
+#include <string>
+#include <variant>
+#include <vector>
+
+#include "lldb/Core/ValueObject.h"
+#include "lldb/Symbol/Type.h"
+#include "lldb/Symbol/TypeList.h"
+#include "lldb/Target/LanguageRuntime.h"
+#include "lldb/Utility/ConstString.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Basic/TokenKinds.h"
+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/APInt.h"
+
+namespace lldb_private {
+
+/// Struct to hold information about member fields. Used by the parser for the
+/// Data Inspection Language (DIL).
+struct DILMemberInfo {
+ std::optional<std::string> name;
+ CompilerType type;
+ bool is_bitfield;
+ uint32_t bitfield_size_in_bits;
+ bool is_synthetic;
+ bool is_dynamic;
+ lldb::ValueObjectSP val_obj_sp;
+
+ explicit operator bool() const { return type.IsValid(); }
+};
+
+/// This determines if the type is a shared, unique or weak pointer, either
+/// from stdlibc++ or libc+++.
+bool IsSmartPtrType(CompilerType type);
+
+/// Finds the member field with the given name and type, stores the child index
+/// corresponding to the field in the idx vector and returns a DILMemberInfo
+/// struct with appropriate information about the field.
+DILMemberInfo GetFieldWithNameIndexPath(lldb::ValueObjectSP lhs_val_sp,
+ CompilerType type,
+ const std::string &name,
+ std::vector<uint32_t> *idx,
+ CompilerType empty_type,
+ bool use_synthetic, bool is_dynamic);
+
+std::tuple<DILMemberInfo, std::vector<uint32_t>>
+GetMemberInfo(lldb::ValueObjectSP lhs_val_sp, CompilerType type,
+ const std::string &name, bool use_synthetic);
+
+/// Get the appropriate ValueObjectSP, consulting the use_dynamic and
+/// use_synthetic options passed, acquiring the process & target locks if
+/// appropriate.
+lldb::ValueObjectSP
+DILGetSPWithLock(lldb::ValueObjectSP valobj_sp,
+ lldb::DynamicValueType use_dynamic = lldb::eNoDynamicValues,
+ bool use_synthetic = false);
+
+/// The various types DIL AST nodes (used by the DIL parser).
+enum class DILNodeKind {
+ kDILErrorNode,
+ kLiteralNode,
+ kIdentifierNode,
+ kSizeOfNode,
+ kBuiltinFunctionCallNode,
+ kCStyleCastNode,
+ kCxxStaticCastNode,
+ kCxxReinterpretCastNode,
+ kMemberOfNode,
+ kArraySubscriptNode,
+ kBinaryOpNode,
+ kUnaryOpNode,
+ kTernaryOpNode,
+ kSmartPtrToPtrDecay
+};
+
+/// The C-Style casts allowed by DIL.
+enum class CStyleCastKind {
+ kArithmetic,
+ kEnumeration,
+ kPointer,
+ kNullptr,
+ kReference,
+};
+
+/// The Cxx static casts allowed by DIL.
+enum class CxxStaticCastKind {
+ kNoOp,
+ kArithmetic,
+ kEnumeration,
+ kPointer,
+ kNullptr,
+ kBaseToDerived,
+ kDerivedToBase,
+};
+
+/// The binary operators recognized by DIL.
+enum class BinaryOpKind {
+ Mul, // "*"
+ Div, // "/"
+ Rem, // "%"
+ Add, // "+"
+ Sub, // "-"
+ Shl, // "<<"
+ Shr, // ">>"
+ LT, // "<"
+ GT, // ">"
+ LE, // "<="
+ GE, // ">="
+ EQ, // "=="
+ NE, // "!="
+ And, // "&"
+ Xor, // "^"
+ Or, // "|"
+ LAnd, // "&&"
+ LOr, // "||"
+ Assign, // "="
+ MulAssign, // "*="
+ DivAssign, // "/="
+ RemAssign, // "%="
+ AddAssign, // "+="
+ SubAssign, // "-="
+ ShlAssign, // "<<="
+ ShrAssign, // ">>="
+ AndAssign, // "&="
+ XorAssign, // "^="
+ OrAssign, // "|="
+};
+
+/// The Unary operators recognized by DIL.
+enum class UnaryOpKind {
+ PostInc, // "++"
+ PostDec, // "--"
+ PreInc, // "++"
+ PreDec, // "--"
+ AddrOf, // "&"
+ Deref, // "*"
+ Plus, // "+"
+ Minus, // "-"
+ Not, // "~"
+ LNot, // "!"
+};
+
+/// Helper functions for DIL AST node parsing.
+
+/// Translates clang tokens to BinaryOpKind.
+BinaryOpKind
+clang_token_kind_to_binary_op_kind(clang::tok::TokenKind token_kind);
+
+/// Returns bool indicating whether or not the input kind is an assignment.
+bool binary_op_kind_is_comp_assign(BinaryOpKind kind);
+
+/// Given a string representing a type, returns the CompilerType corresponding
+/// to the named type, if it exists.
+CompilerType
+ResolveTypeByName(const std::string &name,
+ std::shared_ptr<ExecutionContextScope> ctx_scope);
+
+/// Quick lookup to check if a type name already exists in a
+/// name-to-CompilerType map the DIL parser keeps of previously found
+/// name/type pairs.
+bool IsContextVar(const std::string &name);
+
+/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak)
+/// or not. Only applicable for C++, which is why this is here and not part of
+/// the CompilerType class.
+bool IsSmartPtrType(CompilerType type);
+
+/// Class used to store & manipulate information about identifiers.
+class IdentifierInfo {
+private:
+ using MemberPath = std::vector<uint32_t>;
+ using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>;
+
+public:
+ enum class Kind {
+ kValue,
+ kContextArg,
+ kMemberPath,
+ kThisKeyword,
+ };
+
+ static IdentifierInfoPtr FromValue(lldb::ValueObjectSP value_sp) {
+ CompilerType type;
+ lldb::ValueObjectSP value = DILGetSPWithLock(value_sp);
+ if (value)
+ type = value->GetCompilerType();
+ return IdentifierInfoPtr(new IdentifierInfo(Kind::kValue, type, value, {}));
+ }
+
+ static IdentifierInfoPtr FromContextArg(CompilerType type) {
+ lldb::ValueObjectSP empty_value;
+ return IdentifierInfoPtr(
+ new IdentifierInfo(Kind::kContextArg, type, empty_value, {}));
+ }
+
+ static IdentifierInfoPtr FromMemberPath(CompilerType type, MemberPath path) {
+ lldb::ValueObjectSP empty_value;
+ return IdentifierInfoPtr(new IdentifierInfo(Kind::kMemberPath, type,
+ empty_value, std::move(path)));
+ }
+
+ static IdentifierInfoPtr FromThisKeyword(CompilerType type) {
+ lldb::ValueObjectSP empty_value;
+ return IdentifierInfoPtr(
+ new IdentifierInfo(Kind::kThisKeyword, type, empty_value, {}));
+ }
+
+ Kind kind() const { return m_kind; }
+ lldb::ValueObjectSP value() const { return m_value; }
+ const MemberPath &path() const { return m_path; }
+
+ CompilerType GetType() { return m_type; }
+ bool IsValid() const { return m_type.IsValid(); }
+
+ IdentifierInfo(Kind kind, CompilerType type, lldb::ValueObjectSP value,
+ MemberPath path)
+ : m_kind(kind), m_type(type), m_value(std::move(value)),
+ m_path(std::move(path)) {}
+
+private:
+ Kind m_kind;
+ CompilerType m_type;
+ lldb::ValueObjectSP m_value;
+ MemberPath m_path;
+};
+
+/// Given the name of an identifier (variable name, member name, type name,
+/// etc.), find the ValueObject for that name (if it exists) and create and
+/// return an IdentifierInfo object containing all the relevant information
+/// about that object (for DIL parsing and evaluating).
+std::unique_ptr<IdentifierInfo> LookupIdentifier(
+ const std::string &name, std::shared_ptr<ExecutionContextScope> ctx_scope,
+ lldb::DynamicValueType use_dynamic, CompilerType *scope_ptr = nullptr);
+
+/// Forward declaration, for use in DIL AST nodes. Definition is at the very
+/// end of this file.
+class DILVisitor;
+
+/// The rest of the classes in this file, except for the DILVisitor class at the
+/// very end, define all the types of AST nodes used by the DIL parser and
+/// expression evaluator. The DIL parser parses the input string and creates the
+/// AST parse tree from the AST nodes. The resulting AST node tree gets passed
+/// to the DIL expression evaluator, which evaluates the DIL AST nodes and
+/// creates/returns a ValueObjectSP containing the result.
+
+/// Base class for AST nodes used by the Data Inspection Language (DIL) parser.
+/// All of the specialized types of AST nodes inherit from this (virtual) base
+/// class.
+class DILASTNode {
+public:
+ DILASTNode(clang::SourceLocation location) : location_(location) {}
+ virtual ~DILASTNode() {}
+
+ virtual void Accept(DILVisitor *v) const = 0;
+
+ virtual bool is_error() const { return false; };
+ virtual bool is_rvalue() const = 0;
+ virtual bool is_bitfield() const { return false; };
+ virtual bool is_context_var() const { return false; };
+ virtual bool is_literal_zero() const { return false; }
+ virtual uint32_t bitfield_size() const { return 0; }
+ virtual CompilerType result_type() const = 0;
+
+ virtual DILNodeKind what_am_i() const = 0;
+
+ clang::SourceLocation location() const { return location_; }
+
+ // The expression result type, but dereferenced in case it's a reference. This
+ // is for convenience, since for the purposes of the semantic analysis only
+ // the dereferenced type matters.
+ CompilerType result_type_deref() const;
+
+private:
+ clang::SourceLocation location_;
+};
+
+using ParseResult = std::unique_ptr<DILASTNode>;
+
+class DILErrorNode : public DILASTNode {
+public:
+ DILErrorNode(CompilerType empty_type)
+ : DILASTNode(clang::SourceLocation()), m_empty_type(empty_type) {}
+ void Accept(DILVisitor *v) const override;
+ bool is_error() const override { return true; }
+ bool is_rvalue() const override { return false; }
+ CompilerType result_type() const override { return m_empty_type; }
+ CompilerType result_type_real() const { return m_empty_type; }
+ DILNodeKind what_am_i() const override { return DILNodeKind::kDILErrorNode; }
+
+private:
+ CompilerType m_empty_type;
+};
+
+class LiteralNode : public DILASTNode {
+public:
+ template <typename ValueT>
+ LiteralNode(clang::SourceLocation location, CompilerType type, ValueT &&value,
+ bool is_literal_zero)
+ : DILASTNode(location), m_type(type),
+ m_value(std::forward<ValueT>(value)),
+ m_is_literal_zero(is_literal_zero) {}
+
+ void Accept(DILVisitor *v) const override;
+ bool is_rvalue() const override { return true; }
+ bool is_literal_zero() const override { return m_is_literal_zero; }
+ CompilerType result_type() const override { return m_type; }
+ DILNodeKind what_am_i() const override { return DILNodeKind::kLiteralNode; }
+
+ template <typename ValueT> ValueT value() const {
+ return std::get<ValueT>(m_value);
+ }
+
+ auto value() const { return m_value; }
+
+private:
+ CompilerType m_type;
+ std::variant<llvm::APInt, llvm::APFloat, bool, std::vector<char>> m_value;
+ bool m_is_literal_zero;
+};
+
+class IdentifierNode : public DILASTNode {
+public:
+ IdentifierNode(clang::SourceLocation location, std::string name,
+ std::unique_ptr<IdentifierInfo> identifier, bool is_rvalue,
+ bool is_context_var)
+ : DILASTNode(location), m_is_rvalue(is_rvalue),
+ m_is_context_var(is_context_var), m_name(std::move(name)),
+ m_identifier(std::move(identifier)) {}
+
+ void Accept(DILVisitor *v) const override;
+ bool is_rvalue() const override { return m_is_rvalue; }
+ bool is_context_var() const override { return m_is_context_var; };
+ CompilerType result_type() const override { return m_identifier->GetType(); }
+ DILNodeKind what_am_i() const override {
+ return DILNodeKind::kIdentifierNode;
+ }
+
+ std::string name() const { return m_name; }
+ const IdentifierInfo &info() const { return *m_identifier; }
+
+private:
+ bool m_is_rvalue;
+ bool m_is_context_var;
+ std::string m_name;
+ std::unique_ptr<IdentifierInfo> m_identifier;
+};
+
+class SizeOfNode : public DILASTNode {
+public:
+ SizeOfNode(clang::SourceLocation location, CompilerType type,
+ CompilerType operand)
+ : DILASTNode(location), m_type(type), m_operand(operand) {}
+
+ void Accept(DILVisitor *v) const override;
+ bool is_rvalue() const override { return true; }
+ CompilerType result_type() const override { return m_type; }
+ DILNodeKind what_am_i() const override { return DILNodeKind::kSizeOfNode; }
+
+ CompilerType operand() const { return m_operand; }
+
+private:
+ CompilerType m_type;
+ CompilerType m_operand;
+};
+
+class BuiltinFunctionCallNode : public DILASTNode {
+public:
+ BuiltinFunctionCallNode(clang::SourceLocation location,
+ CompilerType result_type, std::string name,
+ std::vector<ParseResult> arguments)
+ : DILASTNode(location), m_result_type(result_type),
+ m_name(std::move(name)), m_arguments(std::move(arguments)) {}
+
+ void Accept(DILVisitor *v) const override;
+ bool is_rvalue() const override { return true; }
+ CompilerType result_type() const override { return m_result_type; }
+ DILNodeKind what_am_i() const override {
+ return DILNodeKind::kBuiltinFunctionCallNode;
+ }
+
+ std::string name() const { return m_name; }
+ const std::vector<ParseResult> &arguments() const { return m_arguments; };
+
+private:
+ CompilerType m_result_type;
+ std::string m_name;
+ std::vector<ParseResult> m_arguments;
+};
+
+class CStyleCastNode : public DILASTNode {
+public:
+ CStyleCastNode(clang::SourceLocation location, CompilerType type,
+ ParseResult rhs, CStyleCastKind kind)
+ : DILASTNode(location), m_type(type), m_rhs(std::move(rhs)),
+ m_kind(kind) {}
+
+ void Accept(DILVisitor *v) const override;
+ bool is_rvalue() const override {
+ return m_kind != CStyleCastKind::kReference;
+ }
+ CompilerType result_type() const override { return m_type; }
+ DILNodeKind what_am_i() const override {
+ return DILNodeKind::kCStyleCastNode;
+ }
+
+ CompilerType type() const { return m_type; }
+ DILASTNode *rhs() const { return m_rhs.get(); }
+ CStyleCastKind kind() const { return m_kind; }
+
+private:
+ CompilerType m_type;
+ ParseResult m_rhs;
+ CStyleCastKind m_kind;
+};
+
+class CxxStaticCastNode : public DILASTNode {
+public:
+ CxxStaticCastNode(clang::SourceLocation location, CompilerType type,
+ ParseResult rhs, CxxStaticCastKind ki...
[truncated]
|
lldb/docs/dil-expr-lang.ebnf
Outdated
(* LLDB Debug Expressions, a subset of C++ *) | ||
(* Insired by https://www.nongnu.org/hcb *) | ||
|
||
expression = assignment_expression ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to actually have the grammar documented!
lldb/docs/dil-expr-lang.ebnf
Outdated
| static_cast "<" type_id ">" "(" expression ")" ; | ||
| dynamic_cast "<" type_id ">" "(" expression ")" ; | ||
| reinterpret_cast "<" type_id ">" "(" expression ")" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static_cast "<" type_id ">" "(" expression ")" ; | |
| dynamic_cast "<" type_id ">" "(" expression ")" ; | |
| reinterpret_cast "<" type_id ">" "(" expression ")" ; | |
| static_cast "<" type_id ">" "(" expression ")" | |
| dynamic_cast "<" type_id ">" "(" expression ")" | |
| reinterpret_cast "<" type_id ">" "(" expression ")" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/docs/dil-expr-lang.ebnf
Outdated
nested_name_specifier = type_name "::" | ||
| namespace_name '::' | ||
| nested_name_specifier identifier "::" | ||
| nested_name_specifier simple_template_id "::" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nested_name_specifier simple_template_id "::" | |
| nested_name_specifier simple_template_id "::" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/docs/dil-expr-lang.ebnf
Outdated
builtin_func_name = "__log2" ; | ||
|
||
builtin_func_argument_list = builtin_func_argument | ||
| builtin_func_argument_list "," builtin_func_argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| builtin_func_argument_list "," builtin_func_argument | |
| builtin_func_argument_list "," builtin_func_argument ; |
lldb/docs/dil-expr-lang.ebnf
Outdated
| static_cast "<" type_id ">" "(" expression ")" ; | ||
| dynamic_cast "<" type_id ">" "(" expression ")" ; | ||
| reinterpret_cast "<" type_id ">" "(" expression ")" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I made those mistakes over time, but the semicolons are wrong in some places 😅
…uage Fix semicolons in the .ebnf file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't gone through this in detail yet (having an example of how these structures will be used would be helpful), but something I was curious about is whether this DIL
is going to be used in the implementation of a new command, or if we're going to extend the frame var
language? I didn't see a concrete conclusion on this in the RFC
cast_expression = unary_expression | ||
| "(" type_id ")" cast_expression ; | ||
|
||
unary_expression = postfix_expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to allow state-changing expressions in the frame var
language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty useful to be able to easily change the program state and afaik it's already possible with expr
command. Generally other debugger allow this as well (e.g. Visual Studio), so I think it's a reasonable thing to do. However, for increments specifically there's a deeper reason here (and I would agree that they probably shouldn't be available for general use). See my big comment below for justification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath
don't exactly sound like they ought to be modifying the variable in the process of "getting" it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, many users confuse or conflate "frame variable" with "p" (the short-cut for the expression evaluator), the "p" certainly allows updating the variable value...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...
lldb/docs/dil-expr-lang.ebnf
Outdated
|
||
pointer_literal = "nullptr" ; | ||
|
||
builtin_func = builtin_func_name "(" [builtin_func_argument_list] ")" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are builtins here? A set of functions that LLDB knows how to execute? How are those chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lldb-eval
(DIL's predecessor) supports exactly two builtin functions at the moment: __log2
and cryptic __findnonnull
(https://github.com/google/lldb-eval/blob/323d48f0b06d7dcf6ed70cdde17b42285450a32a/lldb-eval/parser.cc#L619). The latter is commonly used in NatVis for complex visualisers, e.g. https://github.com/EQEmu/Server/blob/7918fed81cae8700d73319d4bed5c9f265dbbf3d/libs/perlbind/src/perlbind.natvis#L88
lldb/docs/dil-expr-lang.ebnf
Outdated
|
||
typedef_name = identifier ; | ||
|
||
simple_template_id = template_name "<" [template_argument_list] ">" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the use-case for instantiating templates is in the DIL. I guess referencing static data members of different instantiations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how will this be implemented? I guess the simple-template-names infrastructure could help here?
AFAIU, in the expression evaluator, we would ideally support templates by either using modules or extending DWARF in a way that allows us to construct more faithful ASTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically this is used in casts like ((Foo<int>)some_ptr)->field
. In the world of dynamic typing this is not necessary, however VisualStudio debugger doesn't support dynamic typing like this, so everyone is casting things.
lldb/docs/dil-expr-lang.ebnf
Outdated
| postfix_expression "--" | ||
| static_cast "<" type_id ">" "(" expression ")" | ||
| dynamic_cast "<" type_id ">" "(" expression ")" | ||
| reinterpret_cast "<" type_id ">" "(" expression ")" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't C-style
casts suffice? (i'm not sure how closely we want to model this language after C++, but these casts make it seem like we're just supporting C++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the reply above, C++-style casts are often used in NatVis (visual studio debugger), which lldb-eval
originally aimed to support.
The plan is for the DIL implementation to eventually replace the current 'frame variable' command (see Milestone 1 in the implementation plan). Once that is done, it will be expanded to also replace calls to GetValueForExpressionPath, for parsing & handling pieces of LLDB's data formatters. I couldn't put the whole implementation into a single PR, as it would have been much too big to review (even some of the pieces are going to be pretty big). So I broke it up into: the helper functions (which have already been committed now); the AST (this PR); and then three more PR's, each one coming after the preceding one is approved: the DIL Parser (a recursive-descent parser), which constructs (and returns) the appropriate DIL AST; the DIL Evaluator, which takes a DIL AST, evaluates it in the current context (frame), and returns an appropriate ValueObjectSP; and the code to actually hook this up to the 'frame variable' command. I have the whole thing implemented (complete with test cases). If you really want to dig through the complete code, you can see it at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ (note that I will be cleaning up the Parser & Evaluator code before actually being put into a PR). The current implementation there still has both implementations for 'frame variable' (original and DIL), and with 'original' being the default and 'DIL' being turned on by an option. |
I've been trying to stay out of this, as I don't think I can be objective here, but since I've been summoned, here's what I think. :) For me, the worst possible outcome of this is an incomplete transition -- having a third (maybe fourth, since I sort of also count dwim-print) expression-like syntax, which is "almost" complete, but for whatever reason (technical or political) can't replace the previous one. With that in mind, I'd try to stage this in a way that minimizes this risk. Building the whole thing in layers from the bottom up is not the best way to achieve that, I think. If I were doing this, I'd prioritise replacing the current "frame var" implementation as soon as possible. Ideally, the steps would be something like:
Besides managing that risk (if this fades out for whatever reason, there's less dead code around), I think this would also make it easier to review, as we could have patches like "add operator/ to the DIL", which would focus solely on that, and also come with associated tests. I think it would also make it easier to discuss things like whether we want to add modifying operations (operator++) or obviously language-specific features (dynamic_cast), both of which I think are very good questions. |
I second everything that Pavel says. I think this would be the best approach. The existing frame var implementation is not that "smart", so it should be practical to fully replace it with the new DIL in its first stage. |
Thanks! that'll be a useful reference
Makes sense! |
Providing more context for answers to Michael's questions. The original motivation for To be able to evaluate those expressions as is, lldb-eval had to support all of the commonly used features, hence the extensive and somewhat selective list of supported features :) For example, State altering operations like increment/decrement are used in advanced visualizers like
|
A high-level concern that I have with this is the grammar is full of C++-specific nodes, which is not something I would like to see in living in |
My .02: The problem here is that the commands that use the DIL often have no idea what language the path expression they are being asked to parse is going to end up being.
could find C/ObjC/C++/Rust/Swift variables that match that expression. The idea for the DIL is to have enough syntax to express sensible data lookup and presentation that would cover the variety of languages we support. That's going to end up looking like the languages we are using it to model, but it shouldn't be driven by them. I think that's better than having the DIL have language specific variants and the user somehow has to know to pick the right one for the context. |
…uage Remove all the pieces for adding functionality beyond what the current default "frame variable" implementation does. Update the .ebnf file to reflect the current simplified version of DIL.
As requested, I have updated this PR to only include the pieces needed to reproduce current (default) 'frame variable' behavior with DIL. I have also updated the BNF file accordingly. I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I have verified that this stripped down version passes all the frame variable tests in LLDB.
That's cool. Just to confirm, have you looked at replacing target variable
as well? It uses the same language as "frame var" under the hood, which but it has a somewhat different starting point (it basically ignores the local scope and looks only at globals), which means it may need some special handling in the new parser.
I agree with Jim re the DIL language: We should only have a single language definition, and it can be a superset of the languages it supports. So there may be parts of it that belong to particular languages, but that does not mean it supports those languages exclusively.
This direction makes sense to me, but I think that's all the more reason to be conservative/cautious about adding new features to the language. We can't just put every possible feature of every language into it, as we'd end up with a unmaintainable mess.
lldb/include/lldb/Core/DILAST.h
Outdated
class IdentifierInfo { | ||
private: | ||
using MemberPath = std::vector<uint32_t>; | ||
using IdentifierInfoPtr = std::unique_ptr<IdentifierInfo>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsewhere, we'd use IdentifierInfoUP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/docs/dil-expr-lang.ebnf
Outdated
| boolean_literal | ||
| pointer_literal | ||
| id_expression | ||
| "this" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other languages may use a different keyword (self
typically) to refer to "this" object (in which case this
can refer to a regular variable). Special-casing this
in the parser could make that difficult, so it may be better to treat this
as a regular identifier, and only resolve it later, once we know which language we're in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the quoted strings from the grammar for now.
lldb/docs/dil-expr-lang.ebnf
Outdated
boolean_literal = "true" | "false" ; | ||
|
||
pointer_literal = "nullptr" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true/false/nullptr are also spelled differently in different languages, and I don't believe they're necessary for the basic "frame variable" implementation. Can we skip those for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be acceptable for me to replace the quoted strings with something like: true_token, false_token, nullptr_token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the quoted strings from the grammar for now.
lldb/include/lldb/Core/DILAST.h
Outdated
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLDB_DIL_AST_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLDB_CORE_DILAST_H
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include "llvm/ADT/APFloat.h" | ||
#include "llvm/ADT/APInt.h" | ||
|
||
namespace lldb_private { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the names defined here have embedded namespace names (DILMemberInfo), while others (IdentifierInfo) have deceptively generic name even though they're specific to the DIL. What do you (all) think about putting all of this in a namespace (I suggest dil
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (namespace "DIL")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should really be a lowercase dil
. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/source/Core/DILAST.cpp
Outdated
if (str_ref_name.size() > 2 && str_ref_name[0] == ':' && | ||
str_ref_name[1] == ':') | ||
str_ref_name = str_ref_name.drop_front(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str_ref_name.consume_front("::");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/source/Core/DILAST.cpp
Outdated
ConstString tmp_name(str_ref_name); | ||
if (tmp_name == name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constifying str_ref_name
doesn't help anything, since you have to touch every byte of the string just to compute its checksum. if(name.GetStringRef() == str_ref_name)
is better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/source/Core/DILAST.cpp
Outdated
for (auto var_sp : possible_matches) | ||
if (var_sp->GetName() == name) { | ||
exact_match = var_sp; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto exact_match = llvm::find_if(possible_matches, [&](VariableSP var_sp) { return var_sp->GetName() == name; });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/source/Core/DILAST.cpp
Outdated
Target *target = ctx_scope->CalculateTarget().get(); | ||
Process *process = ctx_scope->CalculateProcess().get(); | ||
if (target && process) { | ||
Process::StopLocker stop_locker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the presence of the StopLocker here fairly surprising. What is this trying to achieve? If something like this was necessary (I'm not sure if the current frame var implementation does that), I would expect it be be scoped around the entire expression evaluation process, and not a single operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it's not actually needed here. Have removed it.
cast_expression = unary_expression | ||
| "(" type_id ")" cast_expression ; | ||
|
||
unary_expression = postfix_expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to have a convenient way to change a variable value without invoking the full expression parser (or going into python), but I think there's a discussion to be made about how to expose that functionality. For example, the current envisioned entry points into this code (the "frame variable" CLI cmd, and SBFrame.GetValueForVariablePath
don't exactly sound like they ought to be modifying the variable in the process of "getting" it).
No, I have not looked at 'target variable' at this time.
|
On Jul 10, 2024, at 9:22 PM, cmtice ***@***.***> wrote:
@cmtice commented on this pull request.
In lldb/include/lldb/Core/DILAST.h <#95738 (comment)>:
> +/// Checks to see if the CompilerType is a Smart Pointer (shared, unique, weak)
+/// or not. Only applicable for C++, which is why this is here and not part of
+/// the CompilerType class.
When we go to dereference a pointer we need to know whether or not it's a smart pointer, because if it's a smart pointer we need to get at the pointer inside before we can dereference it. So we use IsSmartPtrType to determine whether or not to insert this layer of unwrapping (converting the smart ptr into a normal ptr).
I'm not sure it's the job of the DIL to do this sort of unwrapping. Then it would have to support all the ways this might need to be done in every language we might end up supporting. This seems more a job for the data formatters provided by the various languages. That's why Pavel was asking about the special deference Synthetic Child Provider API, I think. That provides a way for various languages to encode this sort of behavior w/o it having to be know in advance by lldb.
Jim
… —
Reply to this email directly, view it on GitHub <#95738 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW4J5FKAPSUTJYXSJODZLYB7TAVCNFSM6AAAAABJNJ3CAKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNZQHA4DAOJYGE>.
You are receiving this because you are on a team that was mentioned.
|
I can use the machinery that's already there in LLDB for dereferencig the smart pointers. But I can't treat smart pointers exactly the same as normal pointers -- among other things a ValueObject whose CompilerType is a smart pointer returns False for IsPointerType(), which causes errors in the code for handling normal pointers. So as far as I can tell I still need my function (declared in DILAST.h) for telling me whether or not a type is a smart pointer type (which is what Pavel's original comment was about). |
Address review comments: Remove decls for functions not defined or used in AST files; add DIL namespace, code cleanups, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a lot of comments, but I think we're making progress.
The most important questions are about the parts where we appear to be reimplementing some existing lldb functionality. Basically all of the functionality for looking up names (for members, variables, types, etc.) appears to be more complex than necessary. For each of those things, we already have lldb functions, which one would (perhaps naively) expect to just call and have them do the right thing. I'd like to understand why that isn't the case, and whether we can do something about it (there are certainly bugs in those functions, and if that's the case it'd be better to fix those bugs than to work around them here).
At this point, perhaps the only way to figure that out is to strip them and see what breaks.
cast_expression = unary_expression | ||
| "(" type_id ")" cast_expression ; | ||
|
||
unary_expression = postfix_expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be one way to resolve this -- have p/dwim-print activate some "extended" mode of DIL which allows for variable modification. But let's save that for later...
#include "llvm/ADT/APFloat.h" | ||
#include "llvm/ADT/APInt.h" | ||
|
||
namespace lldb_private { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should really be a lowercase dil
. All of our namespaces are like that. (That's one naming scheme consistent across all of llvm)
lldb/include/lldb/Core/DILAST.h
Outdated
/// use_synthetic options passed, acquiring the process & target locks if | ||
/// appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// use_synthetic options passed, acquiring the process & target locks if | |
/// appropriate. | |
/// use_synthetic options passed. |
lldb/include/lldb/Core/DILAST.h
Outdated
eMemberOfNode, | ||
eArraySubscriptNode, | ||
eUnaryOpNode, | ||
eSmartPtrToPtrDecayNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this as well (for the time being, at least)? It sounds like one of those things that may require rearchitecting to avoid smart pointer assumptions...
(And even we do end up adding it, I don't think it should be called "smart pointer decay", as, as far as this code is concerned, I think we shouldn't be making a difference between std::unique_ptr and say std::optional. I guess you could stretch the "smart pointer" definition to cover optionals, but that doesn't seem ideal)
lldb/include/lldb/Core/DILAST.h
Outdated
class IdentifierInfo { | ||
private: | ||
using MemberPath = std::vector<uint32_t>; | ||
using IdentifierInfoUP = std::unique_ptr<IdentifierInfo>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would normally be outside of the class, so it can be used from other code as well. If you're only using it for internal code, then it probably doesn't need to exist (note you can get rid of about half of the uses of this name by using std::make_unique)
lldb/source/Core/DILAST.cpp
Outdated
} | ||
} | ||
|
||
// Find the corrent variable by matching the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Find the corrent variable by matching the name. | |
// Find the current variable by matching the name. |
// List global variable with the same "basename". There can be many matches | ||
// from other scopes (namespaces, classes), so we do additional filtering | ||
// later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely better, but I'd still like to understand what is the purpose of the subsequent filtering (lines 229--235). If we didn't have that, we could just pass 1 as the "limit" to FindGlobalVariables and potentially speed things up a lot.
(Somewhat tangentially, I'm also wondering if this function belongs in the "AST" part of DIL. I think I could see a case for putting it into the parser.)
lldb/source/Core/DILAST.cpp
Outdated
lldb::ValueObjectSP empty_obj_sp; | ||
return empty_obj_sp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nullptr
or return {}
works just fine.
lldb/source/Core/DILAST.cpp
Outdated
// Internally types don't have global scope qualifier in their names and | ||
// LLDB doesn't support queries with it too. | ||
llvm::StringRef name_ref(name); | ||
bool global_scope = false; | ||
|
||
if (name_ref.starts_with("::")) { | ||
name_ref = name_ref.drop_front(2); | ||
global_scope = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's true, or at least "true anymore". I'm pretty sure I can look up types with leading "::" now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it works and sometimes it doesn't. It appears that I can prefix plain global vars with '::', but I can't (e.g.) prefix namespaces with '::'. So current frame var works with '::globalVar', and also on 'ns::globalVar', but fails on '::ns::globalVar'.
I am unclear if there is any work or change you are requesting here, other than commenting on the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.
I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.
The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.
lldb/source/Core/DILAST.cpp
Outdated
// We've found multiple types, try finding the "correct" one. | ||
CompilerType full_match; | ||
std::vector<CompilerType> partial_matches; | ||
|
||
for (uint32_t i = 0; i < result_type_list.size(); ++i) { | ||
CompilerType type = result_type_list[i]; | ||
llvm::StringRef type_name_ref = type.GetTypeName().GetStringRef(); | ||
; | ||
|
||
if (type_name_ref == name_ref) | ||
full_match = type; | ||
else if (type_name_ref.ends_with(name_ref)) | ||
partial_matches.push_back(type); | ||
} | ||
|
||
// Full match is always correct. | ||
if (full_match.IsValid()) | ||
return full_match; | ||
|
||
if (!global_scope) { | ||
// We're looking for type, but there may be multiple candidates and which | ||
// one is correct may depend on the currect scope. For now just pick the | ||
// most "probable" type (pick a random one). TODO: Try to find a better way | ||
// to do this. | ||
if (partial_matches.size() > 0) | ||
return partial_matches.back(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if that's true, then maybe this filtering isn't needed?
…uage Address latest round of review comments.
I think I have addressed all the review comments; please take another look. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you've removed the smart pointer checking function (the most obvious example of language-specific behavior), I'm still worried about whether the overall design of this code will allow you to avoid it.
Looking at the IdentifierInfo
class, I see that it can be constructed with only a type (the value is optional). I think this is going to be very hard to make work, since a type does not know if its a smart pointer (only a value does). Have you already figured out a way to make that work?
IndentifierInfo isn't the only suspicious place though. All of the places that are doing something with bitfield offsets are very suspicious to me, as that's something that the Values should already know.
I haven't seen the parser, but I wouldn't be surprised if this has something to do with the vexing parse issues of C languages, where one needs to know if something is a type just to parse the expression correctly. Now, if that's the case, and if really need to know if something is a smart pointer (i.e. whether it is dereferencable) without having access to a value of that type, then we may need to have a bigger discussion about what to do about that.
#include "clang/Basic/SourceLocation.h" | ||
#include "clang/Basic/TokenKinds.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these doing here? Can this be a language-agnostic implementation if it depends on clang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't clang compile all the languages we want to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Swift uses llvm, but not clang. The same goes for rust, and potentially other downstream languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then we've run into a very serious problem: the DIL implementation (and lldb-eval, on which it was based) makes extensive use of Clang for parsing & lexing. I'm guessing the best way to handle this is to try to bundle the parsing/lexing stuff into some kind of plugin, with different plugins for the different languages? OR...is it possible that the small language subset that DIL wants to parse/lex/evaluate is roughly the same across all the languages?
I'm really not sure about the best way forward here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that does sound like a problem.
Putting this stuff in a plugin would be possible, but it would be a departure from what we've talked about earlier, since we would essentially end up with a different flavour of DIL for every language.
Saying that clang is the parser for the DIL might also be possible, but that would also warrant more discussion, because we have a (maybe undocumented?) rule of no clang dependencies on lldb core.
A third option might be to roll our own parser. Unlike lldb-eval, I think the DIL does not have the ambition of matching c++ (or any other language) precisely, so maybe we have some leeway to remove/skip/alter the features which make parsing c(++) the mess that it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so could one say that it's usage is similar to the code in CPlusPlusNameParser, which uses clang to tokenize the (function) name, but then does the parsing itself? Or does it go deeper than that?
The thing I'm thinking about is that "having all of the tokens defined" may not be such an advantage if we later add some features from other languages (I don't know, let's say the ?.
operator from Swift). For lexing, that may not matter cause you could maybe lex ?
and .
separately and then join them later, but if it's doing more, then that may be a problem..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's similar to what CPlusPlusNameParser does. DIL uses clang to tokenize the expression and parsing is done separately by DIL itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Andy (werat) is correct. I mis-spoke in my previous comment (apologies!). We use the Clang tokenizer & lexer but wrote our own parser.
Since that is the case, is this ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd be fine with that, but I also think it's not my call to make. @jimingham, @adrian-prantl, @bulbazord, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been following the evolution of this PR closely, but from this comment chain it looks like the question is "Can we have clang dependencies in lldbCore?". I won't gatekeep this PR for this reason, but here are my positions:
- We should avoid having clang in lldbCore if possible.
- I think you should write a lexer for the DIL.
As Pavel already pointed out, plenty of folks downstream want to use LLDB to debug their programs with languages other than C/C++/ObjC/ObjC++. Swift and Rust are great examples and on occasion I see people try to add minor changes to support their languages too. Adding clang as a core dependency makes it more difficult to decouple LLDB from C/C++ specifically. LLDB has the potential to be a more language-agnostic debugger where the core of the engine is flexible and focuses on operating on common abstractions while the Plugins dictate how to interact with the specific language/OS/platform. I think LLDB is still pretty far away from being that, but myself and other contributors have done a lot of work over the last few years to bring us closer to that reality.
As for the DIL, I think we should consider the goal. If the language is meant to use C/C++ syntax to inspect data, I suppose it makes sense to use clang's lexer to tokenize input. CPlusPlusNameParser uses clang because it's trying to parse C++ names. However, using clang now makes it harder to extend to support other languages in the future. What if clang doesn't chop up my input into tokens correctly or the way I want? Pavel mentioned a case where we can work around this, but the parser is just going to be workaround after workaround trying to fit a square peg into a round hole. You've already written the parser, what's a lexer too? :)
lldb/include/lldb/Core/DILAST.h
Outdated
private: | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private: | |
public: | |
public: |
lldb/include/lldb/Core/DILAST.h
Outdated
const std::vector<uint32_t> &GetPath() const { return m_path; } | ||
|
||
CompilerType GetType() { return m_type; } | ||
bool IsValid() const { return m_type.IsValid(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this function and use an null IdentifierInfo pointer to denote invalidness?
lldb/source/Core/DILAST.cpp
Outdated
// Internally types don't have global scope qualifier in their names and | ||
// LLDB doesn't support queries with it too. | ||
llvm::StringRef name_ref(name); | ||
bool global_scope = false; | ||
|
||
if (name_ref.starts_with("::")) { | ||
name_ref = name_ref.drop_front(2); | ||
global_scope = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For types (this comment was in the type lookup function), I would like to drop this code, and I'm fairly certain we won't lose any functionality.
I can totally believe that variable lookup works like you've described, but even so, I think that this is a bug that should be fixed elsewhere (somewhere in the core variable lookup code) rather than worked around here.
The fix shouldn't be a part of this PR, and it probably does even have to come before it. We can probably delay that work until after we've switched the frame var implementation -- since the current variable lookup implementation does not have this workaround (if has does, I'd like to see it), we won't be regressing anything by not implementing it here.
I believe I have fixed all the code to avoid any need to do smart pointer checking -- I went back and looked very closely at what the current frame variable implementation does when handling smart pointers, and tried to do the same thing, which appears to have worked.
The case where it is constructed only with a type is meant to be when handling LLDB convenience variables (the dynamic variables names "$..."); I'm not sure if those can be smart pointers?
I would be happy to share the parser code with you and walk/talk you through it, if that would help? |
I think I have addressed all of the review comments now, except for the dependency on the Clang lexer. If it's ok with the reviewers, I would rather not put that change into this PR, but instead I will make that the next PR that I submit in the DIL (Data Inspection Language) series. However if you are strongly against that, I will add the new lexer to this PR as soon as I finish writing it. So please review this again. Thank you! |
Ping? Anybody? |
Is the plan to get the DIL data structures merged before the rest of the patch series is up for review? I think it'd be great to see how the infrastructure introduced here will be used |
There are 3 major pieces to the DIL implementation (soon to be 4, once I've got the lexer written): The AST bit; the parser, which is a recursive descent parser using the .ebnf grammar; and the interpreter/evaluator. The parser performs parsing and type-checking on the input expression, building up the AST as it goes. Once the AST is built, it gets passed to the interpreter/evaluator, which evaluates the AST (performing more correctness checks along the way), returning an LLDB ValueObjectSP at the end. I have both 'stripped down' and 'full' versions of all of these pieces. The 'stripped down' pieces match the current 'frame variable' functionality, and the 'full' versions implement extensions to that functionality (allowing it to handle all the expressions that lldb-eval could handle). The plan is to submit a series of PRs as follows: 1). the AST parts (this PR). 2). the not-yet-written lexer (with any necessary changes to the already submitted AST parts); 3). the parser; 4) the evaluator; 5); the 'glue', i.e. updates to things like StackFrame.cpp & CommandObjectFrame.cpp, to hook up the new DIL implementation (optionally), as well as the test cases. Currently, the stripped down parser is ~2700 lines of code, and the stripped down evaluator is ~1000 lines of code. As I mentioned in an earlier comment (on June 24), you can see the full (not stripped down) implementation at https://github.com/cmtice/llvm-project/tree/DIL-work-new/ , although I have not (yet) updated that with the updates that have gone into this PR. Does that answer all of your questions? Or is there more you would like to know or see? |
Thanks for the breakdown! I was just wondering whether the plan was to merge this before the other PRs are up. I think it'd be preferable to merge them once they all have been approved. Though if others are ok with the current strategy I don't want to block this. |
I was under the impression that the plan was to merge each PR as it was approved (I thought things might be less confusing that way). But I am certainly open to alternatives. If the folks reviewing these DIL PR's would prefer to have all the PRs available for review at the same time, I can put most of the others out for review now as well. Just let me know. |
…uage (DIL).
The Data Inspection Language (DIL), described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893 includes its own parser and expression evaluator.
This change defines the AST nodes, classes and functions which are used by the DIL parser and expression evaluator. It also adds the .ebnf file documenting the (current) expression language handled by the DIL parser and expression evaluator.