Skip to content

[clang] Inject tokens containing #embed back into token stream #97274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/TokenKinds.def
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,9 @@ TOK(raw_identifier) // Used only in raw lexing mode.
// C99 6.4.4.2: Floating Constants
TOK(numeric_constant) // 0x123

// Directly holds numerical value. Used to process C23 #embed.
TOK(binary_data)

// C99 6.4.4: Character Constants
TOK(char_constant) // 'a'
TOK(wide_char_constant) // L'b'
Expand Down
2 changes: 1 addition & 1 deletion clang/include/clang/Basic/TokenKinds.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ inline bool isLiteral(TokenKind K) {
return K == tok::numeric_constant || K == tok::char_constant ||
K == tok::wide_char_constant || K == tok::utf8_char_constant ||
K == tok::utf16_char_constant || K == tok::utf32_char_constant ||
isStringLiteral(K) || K == tok::header_name;
isStringLiteral(K) || K == tok::header_name || K == tok::binary_data;
}

/// Return true if this is any of tok::annot_* kinds.
Expand Down
7 changes: 4 additions & 3 deletions clang/include/clang/Lex/Preprocessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -2123,17 +2123,18 @@ class Preprocessor {
char
getSpellingOfSingleCharacterNumericConstant(const Token &Tok,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the documentation and name of this function is incorrect now that the underlying functionality has changed - would you mind updating it to something more appropriate @Fznamznon? It used to return the character itself:

  /// Given a Token \p Tok that is a numeric constant with length 1,
  /// return the character.

But now returns the value of the character.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment and the interface in 295e4f4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AaronBallman

Copy link
Contributor

@bnbarham bnbarham Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AaronBallman! Would it be worth also changing Spelling to Value or something like that?

bool *Invalid = nullptr) const {
assert(Tok.is(tok::numeric_constant) &&
assert((Tok.is(tok::numeric_constant) || Tok.is(tok::binary_data)) &&
Tok.getLength() == 1 && "Called on unsupported token");
assert(!Tok.needsCleaning() && "Token can't need cleaning with length 1");

// If the token is carrying a literal data pointer, just use it.
if (const char *D = Tok.getLiteralData())
return *D;
return (Tok.getKind() == tok::binary_data) ? *D : *D - '0';

assert(Tok.is(tok::numeric_constant) && "binary data with no data");
// Otherwise, fall back on getCharacterData, which is slower, but always
// works.
return *SourceMgr.getCharacterData(Tok.getLocation(), Invalid);
return *SourceMgr.getCharacterData(Tok.getLocation(), Invalid) - '0';
}

/// Retrieve the name of the immediate macro expansion.
Expand Down
3 changes: 1 addition & 2 deletions clang/include/clang/Parse/Parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -2123,7 +2123,7 @@ class Parser : public CodeCompletionHandler {
};
ExprResult ParseInitializerWithPotentialDesignator(DesignatorCompletionInfo);
ExprResult createEmbedExpr();
void ExpandEmbedDirective(SmallVectorImpl<Expr *> &Exprs);
void injectEmbedTokens();

//===--------------------------------------------------------------------===//
// clang Expressions
Expand Down Expand Up @@ -3826,7 +3826,6 @@ class Parser : public CodeCompletionHandler {
AnnotateTemplateIdTokenAsType(CXXScopeSpec &SS,
ImplicitTypenameContext AllowImplicitTypename,
bool IsClassName = false);
void ExpandEmbedIntoTemplateArgList(TemplateArgList &TemplateArgs);
bool ParseTemplateArgumentList(TemplateArgList &TemplateArgs,
TemplateTy Template, SourceLocation OpenLoc);
ParsedTemplateArgument ParseTemplateTemplateArgument();
Expand Down
53 changes: 25 additions & 28 deletions clang/lib/Parse/ParseExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1018,6 +1018,7 @@ ExprResult Parser::ParseCastExpression(CastParseKind ParseKind,

// primary-expression
case tok::numeric_constant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adapt the couple other places where we parse a numeric constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate which other places you meant? I briefly examined all mentions of tok::numeric_constant and ActOnNumericConstant in lib/Parse and modifying any of these doesn't seem to be reasonable for #embed support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @cor3ntin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases

[[clang::availability(
   #embed "smth.txt"
)]] void f();

struct S {
   virtual void f() =
#embed "zero.bin"
    ;
};

However, it's a bit ambiguous whether these should be supported at all in the proposed C++ wording for embed. I sent a mail to the C++ committee

It's probably fine to ignore that for now, the patch is a great improvement as it is

case tok::binary_data:
// constant: integer-constant
// constant: floating-constant

Expand Down Expand Up @@ -1067,18 +1068,9 @@ ExprResult Parser::ParseCastExpression(CastParseKind ParseKind,
}

case tok::annot_embed: {
// We've met #embed in a context where a single value is expected. Take last
// element from #embed data as if it were a comma expression.
EmbedAnnotationData *Data =
reinterpret_cast<EmbedAnnotationData *>(Tok.getAnnotationValue());
SourceLocation StartLoc = ConsumeAnnotationToken();
ASTContext &Context = Actions.getASTContext();
Res = IntegerLiteral::Create(Context,
llvm::APInt(CHAR_BIT, Data->BinaryData.back()),
Context.UnsignedCharTy, StartLoc);
if (Data->BinaryData.size() > 1)
Diag(StartLoc, diag::warn_unused_comma_left_operand);
break;
injectEmbedTokens();
return ParseCastExpression(ParseKind, isAddressOfOperand, isTypeCast,
isVectorLiteral, NotPrimaryExpression);
}

case tok::kw___super:
Expand Down Expand Up @@ -3578,15 +3570,29 @@ ExprResult Parser::ParseFoldExpression(ExprResult LHS,
T.getCloseLocation());
}

void Parser::ExpandEmbedDirective(SmallVectorImpl<Expr *> &Exprs) {
void Parser::injectEmbedTokens() {
EmbedAnnotationData *Data =
reinterpret_cast<EmbedAnnotationData *>(Tok.getAnnotationValue());
SourceLocation StartLoc = ConsumeAnnotationToken();
ASTContext &Context = Actions.getASTContext();
for (auto Byte : Data->BinaryData) {
Exprs.push_back(IntegerLiteral::Create(Context, llvm::APInt(CHAR_BIT, Byte),
Context.UnsignedCharTy, StartLoc));
MutableArrayRef<Token> Toks(PP.getPreprocessorAllocator().Allocate<Token>(
Data->BinaryData.size() * 2 - 1),
Data->BinaryData.size() * 2 - 1);
unsigned I = 0;
for (auto &Byte : Data->BinaryData) {
Toks[I].startToken();
Toks[I].setKind(tok::binary_data);
Toks[I].setLocation(Tok.getLocation());
Toks[I].setLength(1);
Toks[I].setLiteralData(&Byte);
if (I != ((Data->BinaryData.size() - 1) * 2)) {
Toks[I + 1].startToken();
Toks[I + 1].setKind(tok::comma);
Toks[I + 1].setLocation(Tok.getLocation());
}
I += 2;
}
PP.EnterTokenStream(std::move(Toks), /*DisableMacroExpansion=*/true,
/*IsReinject=*/false);
ConsumeAnyToken(/*ConsumeCodeCompletionTok=*/true);
}

/// ParseExpressionList - Used for C/C++ (argument-)expression-list.
Expand Down Expand Up @@ -3624,17 +3630,8 @@ bool Parser::ParseExpressionList(SmallVectorImpl<Expr *> &Exprs,
if (getLangOpts().CPlusPlus11 && Tok.is(tok::l_brace)) {
Diag(Tok, diag::warn_cxx98_compat_generalized_initializer_lists);
Expr = ParseBraceInitializer();
} else if (Tok.is(tok::annot_embed)) {
ExpandEmbedDirective(Exprs);
if (Tok.isNot(tok::comma))
break;
Token Comma = Tok;
ConsumeToken();
checkPotentialAngleBracketDelimiter(Comma);
continue;
} else {
} else
Expr = ParseAssignmentExpression();
}

if (EarlyTypoCorrection)
Expr = Actions.CorrectDelayedTyposInExpr(Expr);
Expand Down
41 changes: 12 additions & 29 deletions clang/lib/Parse/ParseTemplate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1523,19 +1523,6 @@ ParsedTemplateArgument Parser::ParseTemplateArgument() {
ExprArg.get(), Loc);
}

void Parser::ExpandEmbedIntoTemplateArgList(TemplateArgList &TemplateArgs) {
EmbedAnnotationData *Data =
reinterpret_cast<EmbedAnnotationData *>(Tok.getAnnotationValue());
SourceLocation StartLoc = ConsumeAnnotationToken();
ASTContext &Context = Actions.getASTContext();
for (auto Byte : Data->BinaryData) {
Expr *E = IntegerLiteral::Create(Context, llvm::APInt(CHAR_BIT, Byte),
Context.UnsignedCharTy, StartLoc);
TemplateArgs.push_back(
ParsedTemplateArgument(ParsedTemplateArgument::NonType, E, StartLoc));
}
}

/// ParseTemplateArgumentList - Parse a C++ template-argument-list
/// (C++ [temp.names]). Returns true if there was an error.
///
Expand All @@ -1560,24 +1547,20 @@ bool Parser::ParseTemplateArgumentList(TemplateArgList &TemplateArgs,

do {
PreferredType.enterFunctionArgument(Tok.getLocation(), RunSignatureHelp);
if (Tok.is(tok::annot_embed)) {
ExpandEmbedIntoTemplateArgList(TemplateArgs);
} else {
ParsedTemplateArgument Arg = ParseTemplateArgument();
SourceLocation EllipsisLoc;
if (TryConsumeToken(tok::ellipsis, EllipsisLoc))
Arg = Actions.ActOnPackExpansion(Arg, EllipsisLoc);

if (Arg.isInvalid()) {
if (PP.isCodeCompletionReached() && !CalledSignatureHelp)
RunSignatureHelp();
return true;
}

// Save this template argument.
TemplateArgs.push_back(Arg);
ParsedTemplateArgument Arg = ParseTemplateArgument();
SourceLocation EllipsisLoc;
if (TryConsumeToken(tok::ellipsis, EllipsisLoc))
Arg = Actions.ActOnPackExpansion(Arg, EllipsisLoc);

if (Arg.isInvalid()) {
if (PP.isCodeCompletionReached() && !CalledSignatureHelp)
RunSignatureHelp();
return true;
}

// Save this template argument.
TemplateArgs.push_back(Arg);

// If the next token is a comma, consume it and keep reading
// arguments.
} while (TryConsumeToken(tok::comma));
Expand Down
4 changes: 2 additions & 2 deletions clang/lib/Sema/SemaExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3643,9 +3643,9 @@ bool Sema::CheckLoopHintExpr(Expr *E, SourceLocation Loc, bool AllowZero) {
ExprResult Sema::ActOnNumericConstant(const Token &Tok, Scope *UDLScope) {
// Fast path for a single digit (which is quite common). A single digit
// cannot have a trigraph, escaped newline, radix prefix, or suffix.
if (Tok.getLength() == 1) {
if (Tok.getLength() == 1 || Tok.getKind() == tok::binary_data) {
const char Val = PP.getSpellingOfSingleCharacterNumericConstant(Tok);
return ActOnIntegerConstant(Tok.getLocation(), Val-'0');
return ActOnIntegerConstant(Tok.getLocation(), Val);
}

SmallString<128> SpellingBuffer;
Expand Down
3 changes: 2 additions & 1 deletion clang/test/Preprocessor/embed_codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ a
};

// CHECK: store i32 107, ptr %b, align 4
int b =
int b = (
#embed<jk.txt>
)
;


Expand Down
3 changes: 2 additions & 1 deletion clang/test/Preprocessor/embed_constexpr.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify -Wno-c23-extensions
// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify -fexperimental-new-constant-interpreter -Wno-c23-extensions
// expected-no-diagnostics

constexpr int value(int a, int b) {
return a + b;
Expand Down Expand Up @@ -46,7 +47,7 @@ int array[
static_assert(sizeof(array) / sizeof(int) == 'j');

constexpr int comma_expr = (
#embed <jk.txt> // expected-warning {{left operand of comma operator has no effect}}
#embed <jk.txt>
);
static_assert(comma_expr == 'k');

Expand Down
21 changes: 10 additions & 11 deletions clang/test/Preprocessor/embed_weird.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ _Static_assert(
_Static_assert(sizeof(
#embed <single_byte.txt>
) ==
sizeof(unsigned char)
sizeof(int)
, ""
);
_Static_assert(sizeof
#embed <single_byte.txt>
, ""
);
_Static_assert(sizeof(
#embed <jk.txt> // expected-warning {{left operand of comma operator has no effect}}
#embed <jk.txt>
) ==
sizeof(unsigned char)
sizeof(int)
, ""
);

Expand Down Expand Up @@ -73,10 +73,10 @@ void do_stuff() {
// Ensure that we don't accidentally allow you to initialize an unsigned char *
// from embedded data; the data is modeled as a string literal internally, but
// is not actually a string literal.
const unsigned char *ptr =
const unsigned char *ptr = (
#embed <jk.txt> // expected-warning {{left operand of comma operator has no effect}}
; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \
cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}}
); // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'int'}} \
cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'int'}}

// However, there are some cases where this is fine and should work.
const unsigned char *null_ptr_1 =
Expand All @@ -101,11 +101,10 @@ constexpr unsigned char ch =
;
static_assert(ch == 0);

void foobar(float x, char y, char z); // cxx-note {{candidate function not viable: requires 3 arguments, but 1 was provided}}
// c-note@-1 {{declared here}}
void g1() { foobar((float) // cxx-error {{no matching function for call to 'foobar'}}
#embed "numbers.txt" limit(3) // expected-warning {{left operand of comma operator has no effect}}
); // c-error {{too few arguments to function call, expected 3, have 1}}
void foobar(float x, char y, char z);
void g1() { foobar((float)
#embed "numbers.txt" limit(3)
);
}

#if __cplusplus
Expand Down
Loading