Skip to content

[clang] Implement statement printer for EmbedExpr #117770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[clang] Implement statement printer for EmbedExpr #117770

wants to merge 2 commits into from

Conversation

circl-lastname
Copy link

Fixes #107869

Due to the other FIXMEs in #embed code, this doesn't yet handle offsets, prefixes, and if_empty, however I feel it's a good start, and preferable to crashing.

Screenshot_20241126_192631

I've used StringRef to store the filename inside EmbedExpr, and std::string to store the filename inside the annot_embed token. Hopefully this prevents the memory leak fixed in #95802, but this assumes the token array exists while the AST does, if this is a false assumption I can try changing how the storage works.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2024

@llvm/pr-subscribers-clang

Author: circl (circl-lastname)

Changes

Fixes #107869

Due to the other FIXMEs in #embed code, this doesn't yet handle offsets, prefixes, and if_empty, however I feel it's a good start, and preferable to crashing.

Screenshot_20241126_192631

I've used StringRef to store the filename inside EmbedExpr, and std::string to store the filename inside the annot_embed token. Hopefully this prevents the memory leak fixed in #95802, but this assumes the token array exists while the AST does, if this is a false assumption I can try changing how the storage works.


Full diff: https://github.com/llvm/llvm-project/pull/117770.diff

9 Files Affected:

  • (modified) clang/include/clang/AST/Expr.h (+7-2)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+4-1)
  • (modified) clang/include/clang/Sema/Sema.h (+2-2)
  • (modified) clang/lib/AST/Expr.cpp (+4-4)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+13-1)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+6-3)
  • (modified) clang/lib/Parse/ParseInit.cpp (+2-1)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+3-2)
  • (modified) clang/lib/Sema/SemaInit.cpp (+3-3)
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 708c8656decbe0..95d341957c59a2 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -4917,19 +4917,24 @@ class EmbedExpr final : public Expr {
   SourceLocation EmbedKeywordLoc;
   IntegerLiteral *FakeChildNode = nullptr;
   const ASTContext *Ctx = nullptr;
+  StringRef Filename;
+  bool IsAngled;
   EmbedDataStorage *Data;
   unsigned Begin = 0;
   unsigned NumOfElements;
 
 public:
-  EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, EmbedDataStorage *Data,
-            unsigned Begin, unsigned NumOfElements);
+  EmbedExpr(const ASTContext &Ctx, SourceLocation Loc, StringRef Filename,
+            bool IsAngled, EmbedDataStorage *Data, unsigned Begin,
+            unsigned NumOfElements);
   explicit EmbedExpr(EmptyShell Empty) : Expr(SourceLocExprClass, Empty) {}
 
   SourceLocation getLocation() const { return EmbedKeywordLoc; }
   SourceLocation getBeginLoc() const { return EmbedKeywordLoc; }
   SourceLocation getEndLoc() const { return EmbedKeywordLoc; }
 
+  StringRef getFilename() const { return Filename; }
+  bool getIsAngled() const { return IsAngled; }
   StringLiteral *getDataStringLiteral() const { return Data->BinaryData; }
   EmbedDataStorage *getData() const { return Data; }
 
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 3312d4ed1d798d..0a4ca35224497c 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -2761,7 +2761,8 @@ class Preprocessor {
   // Binary data inclusion
   void HandleEmbedDirective(SourceLocation HashLoc, Token &Tok,
                             const FileEntry *LookupFromFile = nullptr);
-  void HandleEmbedDirectiveImpl(SourceLocation HashLoc,
+  void HandleEmbedDirectiveImpl(SourceLocation HashLoc, StringRef Filename,
+                                bool IsAngled,
                                 const LexEmbedParametersResult &Params,
                                 StringRef BinaryContents);
 
@@ -3066,6 +3067,8 @@ class EmptylineHandler {
 /// Helper class to shuttle information about #embed directives from the
 /// preprocessor to the parser through an annotation token.
 struct EmbedAnnotationData {
+  std::string Filename;
+  bool IsAngled;
   StringRef BinaryData;
 };
 
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 24abd5d95dd844..435bc4d4807bc4 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -7086,8 +7086,8 @@ class Sema final : public SemaBase {
                                 SourceLocation RPLoc);
 
   // #embed
-  ExprResult ActOnEmbedExpr(SourceLocation EmbedKeywordLoc,
-                            StringLiteral *BinaryData);
+  ExprResult ActOnEmbedExpr(SourceLocation EmbedKeywordLoc, StringRef Filename,
+                            bool IsAngled, StringLiteral *BinaryData);
 
   // Build a potentially resolved SourceLocExpr.
   ExprResult BuildSourceLocExpr(SourceLocIdentKind Kind, QualType ResultTy,
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index a4fb4d5a1f2ec4..eeea6da07d304f 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -2383,11 +2383,11 @@ APValue SourceLocExpr::EvaluateInContext(const ASTContext &Ctx,
 }
 
 EmbedExpr::EmbedExpr(const ASTContext &Ctx, SourceLocation Loc,
-                     EmbedDataStorage *Data, unsigned Begin,
-                     unsigned NumOfElements)
+                     StringRef Filename, bool IsAngled, EmbedDataStorage *Data,
+                     unsigned Begin, unsigned NumOfElements)
     : Expr(EmbedExprClass, Ctx.IntTy, VK_PRValue, OK_Ordinary),
-      EmbedKeywordLoc(Loc), Ctx(&Ctx), Data(Data), Begin(Begin),
-      NumOfElements(NumOfElements) {
+      EmbedKeywordLoc(Loc), Ctx(&Ctx), Filename(Filename), IsAngled(IsAngled),
+      Data(Data), Begin(Begin), NumOfElements(NumOfElements) {
   setDependence(ExprDependence::None);
   FakeChildNode = IntegerLiteral::Create(
       Ctx, llvm::APInt::getZero(Ctx.getTypeSize(getType())), getType(), Loc);
diff --git a/clang/lib/AST/StmtPrinter.cpp b/clang/lib/AST/StmtPrinter.cpp
index c8677d11b64e8d..da6b4cb5f66955 100644
--- a/clang/lib/AST/StmtPrinter.cpp
+++ b/clang/lib/AST/StmtPrinter.cpp
@@ -1202,7 +1202,19 @@ void StmtPrinter::VisitSourceLocExpr(SourceLocExpr *Node) {
 }
 
 void StmtPrinter::VisitEmbedExpr(EmbedExpr *Node) {
-  llvm::report_fatal_error("Not implemented");
+  // FIXME: This doesn't handle offsets, prefixes, and if_empty yet
+  OS << NL << "#embed";
+  if (Node->getIsAngled()) {
+    OS << "<";
+  } else {
+    OS << "\"";
+  }
+  OS << Node->getFilename();
+  if (Node->getIsAngled()) {
+    OS << ">" << NL;
+  } else {
+    OS << "\"" << NL;
+  }
 }
 
 void StmtPrinter::VisitConstantExpr(ConstantExpr *Node) {
diff --git a/clang/lib/Lex/PPDirectives.cpp b/clang/lib/Lex/PPDirectives.cpp
index a23ad40884f249..62eda8401636c4 100644
--- a/clang/lib/Lex/PPDirectives.cpp
+++ b/clang/lib/Lex/PPDirectives.cpp
@@ -3871,8 +3871,8 @@ Preprocessor::LexEmbedParameters(Token &CurTok, bool ForHasEmbed) {
 }
 
 void Preprocessor::HandleEmbedDirectiveImpl(
-    SourceLocation HashLoc, const LexEmbedParametersResult &Params,
-    StringRef BinaryContents) {
+    SourceLocation HashLoc, StringRef Filename, bool IsAngled,
+    const LexEmbedParametersResult &Params, StringRef BinaryContents) {
   if (BinaryContents.empty()) {
     // If we have no binary contents, the only thing we need to emit are the
     // if_empty tokens, if any.
@@ -3902,6 +3902,8 @@ void Preprocessor::HandleEmbedDirectiveImpl(
   }
 
   EmbedAnnotationData *Data = new (BP) EmbedAnnotationData;
+  Data->Filename = Filename;
+  Data->IsAngled = IsAngled;
   Data->BinaryData = BinaryContents;
 
   Toks[CurIdx].startToken();
@@ -4006,5 +4008,6 @@ void Preprocessor::HandleEmbedDirective(SourceLocation HashLoc, Token &EmbedTok,
   if (Callbacks)
     Callbacks->EmbedDirective(HashLoc, Filename, isAngled, MaybeFileRef,
                               *Params);
-  HandleEmbedDirectiveImpl(HashLoc, *Params, BinaryContents);
+  HandleEmbedDirectiveImpl(HashLoc, Filename, isAngled, *Params,
+                           BinaryContents);
 }
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53e..06ced273a65e92 100644
--- a/clang/lib/Parse/ParseInit.cpp
+++ b/clang/lib/Parse/ParseInit.cpp
@@ -451,7 +451,8 @@ ExprResult Parser::createEmbedExpr() {
 
     StringLiteral *BinaryDataArg = CreateStringLiteralFromStringRef(
         Data->BinaryData, Context.UnsignedCharTy);
-    Res = Actions.ActOnEmbedExpr(StartLoc, BinaryDataArg);
+    Res = Actions.ActOnEmbedExpr(StartLoc, Data->Filename, Data->IsAngled,
+                                 BinaryDataArg);
   }
   return Res;
 }
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index c9d7444d5865a5..f5deacf709d606 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -16770,12 +16770,13 @@ ExprResult Sema::BuildSourceLocExpr(SourceLocIdentKind Kind, QualType ResultTy,
 }
 
 ExprResult Sema::ActOnEmbedExpr(SourceLocation EmbedKeywordLoc,
+                                StringRef Filename, bool IsAngled,
                                 StringLiteral *BinaryData) {
   EmbedDataStorage *Data = new (Context) EmbedDataStorage;
   Data->BinaryData = BinaryData;
   return new (Context)
-      EmbedExpr(Context, EmbedKeywordLoc, Data, /*NumOfElements=*/0,
-                Data->getDataElementCount());
+      EmbedExpr(Context, EmbedKeywordLoc, Filename, IsAngled, Data,
+                /*NumOfElements=*/0, Data->getDataElementCount());
 }
 
 static bool maybeDiagnoseAssignmentToFunction(Sema &S, QualType DstType,
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 7c03a12e812809..609ca34a8c1d20 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -525,9 +525,9 @@ class InitListChecker {
       }
     }
 
-    Result = new (SemaRef.Context)
-        EmbedExpr(SemaRef.Context, Embed->getLocation(), Embed->getData(),
-                  CurEmbedIndex, ElsCount);
+    Result = new (SemaRef.Context) EmbedExpr(
+        SemaRef.Context, Embed->getLocation(), Embed->getFilename(),
+        Embed->getIsAngled(), Embed->getData(), CurEmbedIndex, ElsCount);
     CurEmbedIndex += ElsCount;
     if (CurEmbedIndex >= Embed->getDataElementCount()) {
       CurEmbed = nullptr;

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want a changelog

Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!
I wonder if the new change be tested somehow?
Also, we probably need to update Serialization code with handling for the new fields of EmbedExpr.

Comment on lines 3904 to 3907
EmbedAnnotationData *Data = new (BP) EmbedAnnotationData;
Data->Filename = Filename;
Data->IsAngled = IsAngled;
Data->BinaryData = BinaryContents;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle to see how the current implementation actually prevents the memory leak. EmbedAnnotationData is allocated first as if the std::string inside was empty, then the string is assigned with an array of chars which will likely extend string's size, giving no chance to deallocate proper size later. I wonder if keeping a StringRef in EmbedAnnotationData is possible instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there such a thing as an OwningStringRef then? I am not sure how to guarantee the string exists, and then is deallocated along with the tokens. I thought std::string would be a ptr to the dynamically sized string data, but I haven't used the STL much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None that I can think of... I don't think it would've helped though since I don't think the bump allocator calls the destructors.
I think we need to find a good place to store filename strings that come from embed directives. @AaronBallman any chance you have a good ideas about where can we store the filename string to manage the memory in a good way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue with the deserializer, not sure where to store the string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's something like a flexible array member and the EmbedAnnotationData is allocated to the right size the first time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where can we store the filename string to manage the memory in a good way?

Can’t we just copy the string data into the ASTContext’s allocator and store a StringRef to it, or am I missing something here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you can do that during the lexing stage

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way this may need to work is to find out where we're getting the file name from and seeing how long-lived that string is. It's passed here as a StringRef, and it would nice if we can store it as a StringRef in the EmbedAnnotationData rather than as a std::string. My guess is that the name comes from the FileManager and is as long-lived as the SourceManager which is owned by ASTContext, but we'd need to verify that.

@circl-lastname
Copy link
Author

We probably want a changelog

I put it in the C23 features section, if that's ok

@Endilll Endilll removed their request for review January 15, 2025 18:46
@circl-lastname circl-lastname closed this by deleting the head repository Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clangd] crash when using #embed
6 participants