Skip to content

Reland #90786 ([BoundsSafety] Allow 'counted_by' attribute on pointers in structs in C) #93121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

delcypher
Copy link
Contributor

@delcypher delcypher commented May 23, 2024

This PR attempts to reland #90786 that was reverted due to a memory leak.

[BoundsSafety] Reland #93121 Allow 'counted_by' attribute on pointers in structs in C (#93121)

Fixes #92687.

Previously the attribute was only allowed on flexible array members.
This patch patch changes this to also allow the attribute on pointer
fields in structs and also allows late parsing of the attribute in some
contexts.

For example this previously wasn't allowed:

struct BufferTypeDeclAttributePosition {
  size_t count;
  char* buffer __counted_by(count); // Now allowed
}

Note the attribute is prevented on pointee types where the size isn't
known at compile time. In particular pointee types that are:

  • Incomplete (e.g. void) and sizeless types
  • Function types (e.g. the pointee of a function pointer)
  • Struct types with a flexible array member

This patch also introduces late parsing of the attribute when used in
the declaration attribute position. For example

struct BufferTypeDeclAttributePosition {
  char* buffer __counted_by(count); // Now allowed
  size_t count;
}

is now allowed but only when passing
-fexperimental-late-parse-attributes. The motivation for using late
parsing here is to avoid breaking the data layout of structs in existing
code that want to use the counted_by attribute. This patch is the
first use of LateAttrParseExperimentalExt in Attr.td that was
introduced in a previous patch.

Note by allowing the attribute on struct member pointers this now allows
the possiblity of writing the attribute in the type attribute position.
For example:

struct BufferTypeAttributePosition {
  size_t count;
  char *__counted_by(count) buffer; // Now allowed
}

However, the attribute in this position is still currently parsed
immediately rather than late parsed. So this will not parse currently:

struct BufferTypeAttributePosition {
  char *__counted_by(count) buffer; // Fails to parse
  size_t count;
}

The intention is to lift this restriction in future patches. It has not
been done in this patch to keep this size of this commit small.

There are also several other follow up changes that will need to be
addressed in future patches:

  • Make late parsing working with anonymous structs (see
    on_pointer_anon_buf in attr-counted-by-late-parsed-struct-ptrs.c).
  • Allow counted_by on more subjects (e.g. parameters, returns types)
    when -fbounds-safety is enabled.
  • Make use of the attribute on pointer types in code gen (e.g. for
    _builtin_dynamic_object_size and UBSan's array-bounds checks).

This work is heavily based on a patch originally written by Yeoul Na.

Differences between #93121 and this patch

  • The memory leak that caused Reland #90786 ([BoundsSafety] Allow 'counted_by' attribute on pointers in structs in C) #93121 to be reverted (see Reverted 0ec3b972e and cef6387e5 #92687) should
    now be fixed. See "The Memory Leak".
  • The fix to pragma-attribute-supported-attributes-list.test
    (originally in cef6387) has been incorporated into this patch.
  • A relaxation of counted_by semantics (originally in 112eadd) has been
    incorporated into this patch.
  • The assert in Parser::DistributeCLateParsedAttrs has been removed
    because that broke downstream code.
  • The switch statement in Parser::ParseLexedCAttribute has been
    removed in favor of using Parser::ParseGNUAttributeArgs which does
    the same thing but is more feature complete.
  • The EnterScope parameter has been plumbed through
    Parser::ParseLexedCAttribute and Parser::ParseLexedCAttributeList.
    It currently doesn't do anything but it will be needed in future
    commits.

The Memory Leak

The problem was that these lines parsed the attributes but then did nothing to free the memory

assert(!getLangOpts().CPlusPlus);
for (auto *LateAttr : LateFieldAttrs)
  ParseLexedCAttribute(*LateAttr);

To fix this this a new Parser::ParseLexedCAttributeList method has been
added (based on Parser::ParseLexedAttributeList) which does the
necessary memory management. The intention is to merge these two
methods together so there is just one implementation in a future patch.

A more principled fixed here would be to fix the ownership of the
LateParsedAttribute objects. In principle LateParsedAttrList should own
its pointers exclusively and be responsible for deallocating them.
Unfortunately this is complicated by LateParsedAttribute objects also
being stored in another data structure (LateParsedDeclarations) as
can be seen below (LA gets stored in two places).

      // Handle attributes with arguments that require late parsing.
      LateParsedAttribute *LA =
          new LateParsedAttribute(this, *AttrName, AttrNameLoc);
      LateAttrs->push_back(LA);

      // Attributes in a class are parsed at the end of the class, along
      // with other late-parsed declarations.
      if (!ClassStack.empty() && !LateAttrs->parseSoon())
        getCurrentClass().LateParsedDeclarations.push_back(LA);

this means the ownership of LateParsedAttribute objects isn't very
clear.

rdar://125400257

@delcypher delcypher added the clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang label May 23, 2024
@delcypher delcypher self-assigned this May 23, 2024
@delcypher delcypher requested a review from Endilll as a code owner May 23, 2024 01:20
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 23, 2024
@llvmbot
Copy link
Member

llvmbot commented May 23, 2024

@llvm/pr-subscribers-clang

Author: Dan Liew (delcypher)

Changes

This PR attempts to reland #90786 that was reverted due to a memory leak.

This PR includes the following commits

  1. The original commit in #90786 (originally 0ec3b97)
  2. A follow up fix to pragma-attribute-supported-attributes-list.test (originally in cef6387)
  3. A relaxation of counted_by semantics (originally in 112eadd)
  4. A fix to the memory leak (in 604e274)

The intention will be squash all the changes into a single commit once we've agreed to the memory leak fix.

The Memory leak

The problem was that these lines parsed the attributes but then did nothing to free the memory

assert(!getLangOpts().CPlusPlus);
for (auto *LateAttr : LateFieldAttrs)
  ParseLexedCAttribute(*LateAttr);

To fix this the existing Parser::ParseLexedAttributeList method has been tweaked for our use case and has been re-used because it does the necessary memory management.

A more principled fixed here would be to fix the ownership of the LateParsedAttribute objects. In principle LateParsedAttrList should own its pointers and be responsible for deallocating them. Unfortunately this is complicated by LateParsedAttribute objects also being stored in another data structure (LateParsedDeclarations).

      // Handle attributes with arguments that require late parsing.
      LateParsedAttribute *LA =
          new LateParsedAttribute(this, *AttrName, AttrNameLoc);
      LateAttrs->push_back(LA);

      // Attributes in a class are parsed at the end of the class, along
      // with other late-parsed declarations.
      if (!ClassStack.empty() && !LateAttrs->parseSoon())
        getCurrentClass().LateParsedDeclarations.push_back(LA);

this means the ownership LateParsedAttribute objects isn't clear :(


Patch is 63.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93121.diff

24 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+20-1)
  • (modified) clang/include/clang/AST/Type.h (+1)
  • (modified) clang/include/clang/Basic/Attr.td (+2-1)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+4)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+21-2)
  • (modified) clang/include/clang/Parse/Parser.h (+6-1)
  • (modified) clang/include/clang/Sema/Sema.h (+2-1)
  • (modified) clang/lib/AST/Type.cpp (+10)
  • (modified) clang/lib/Parse/ParseCXXInlineMethods.cpp (+7-1)
  • (modified) clang/lib/Parse/ParseDecl.cpp (+97-7)
  • (modified) clang/lib/Parse/ParseObjc.cpp (+6-4)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+80-15)
  • (modified) clang/lib/Sema/SemaType.cpp (+3-3)
  • (modified) clang/lib/Sema/TreeTransform.h (+1-1)
  • (added) clang/test/AST/attr-counted-by-late-parsed-struct-ptrs.c (+45)
  • (added) clang/test/AST/attr-counted-by-struct-ptrs.c (+117)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (-1)
  • (added) clang/test/Sema/attr-counted-by-late-parsed-off.c (+26)
  • (added) clang/test/Sema/attr-counted-by-late-parsed-struct-ptrs.c (+254)
  • (added) clang/test/Sema/attr-counted-by-struct-ptrs-sizeless-types.c (+17)
  • (added) clang/test/Sema/attr-counted-by-struct-ptrs.c (+224)
  • (added) clang/test/Sema/attr-counted-by-vla-sizeless-types.c (+11)
  • (added) clang/test/Sema/attr-counted-by-vla.c (+196)
  • (removed) clang/test/Sema/attr-counted-by.c (-112)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 81e9d0423f96a..5d59b3b053600 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -317,7 +317,8 @@ New Compiler Flags
 
 - ``-fexperimental-late-parse-attributes`` enables an experimental feature to
   allow late parsing certain attributes in specific contexts where they would
-  not normally be late parsed.
+  not normally be late parsed. Currently this allows late parsing the
+  `counted_by` attribute in C. See `Attribute Changes in Clang`_.
 
 - ``-fseparate-named-sections`` uses separate unique sections for global
   symbols in named special sections (i.e. symbols annotated with
@@ -406,6 +407,24 @@ Attribute Changes in Clang
 - The ``clspv_libclc_builtin`` attribute has been added to allow clspv
   (`OpenCL-C to Vulkan SPIR-V compiler <https://github.com/google/clspv>`_) to identify functions coming from libclc
   (`OpenCL-C builtin library <https://libclc.llvm.org>`_).
+- The ``counted_by`` attribute is now allowed on pointers that are members of a
+  struct in C.
+
+- The ``counted_by`` attribute can now be late parsed in C when
+  ``-fexperimental-late-parse-attributes`` is passed but only when attribute is
+  used in the declaration attribute position. This allows using the
+  attribute on existing code where it previously impossible to do so without
+  re-ordering struct field declarations would break ABI as shown below.
+
+  .. code-block:: c
+
+     struct BufferTy {
+       /* Refering to `count` requires late parsing */
+       char* buffer __counted_by(count);
+       /* Swapping `buffer` and `count` to avoid late parsing would break ABI */
+       size_t count;
+     };
+
 
 Improvements to Clang's diagnostics
 -----------------------------------
diff --git a/clang/include/clang/AST/Type.h b/clang/include/clang/AST/Type.h
index 9a5c6e8d562c3..263b632df23ce 100644
--- a/clang/include/clang/AST/Type.h
+++ b/clang/include/clang/AST/Type.h
@@ -2515,6 +2515,7 @@ class alignas(TypeAlignment) Type : public ExtQualsTypeCommonBase {
   bool isRecordType() const;
   bool isClassType() const;
   bool isStructureType() const;
+  bool isStructureTypeWithFlexibleArrayMember() const;
   bool isObjCBoxableRecordType() const;
   bool isInterfaceType() const;
   bool isStructureOrClassType() const;
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 7008bea483c87..565708cef28ce 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2256,7 +2256,8 @@ def TypeNullUnspecified : TypeAttr {
 def CountedBy : DeclOrTypeAttr {
   let Spellings = [Clang<"counted_by">];
   let Subjects = SubjectList<[Field], ErrorDiag>;
-  let Args = [ExprArgument<"Count">, IntArgument<"NestedLevel">];
+  let Args = [ExprArgument<"Count">, IntArgument<"NestedLevel", 1>];
+  let LateParsed = LateAttrParseExperimentalExt;
   let ParseArgumentsAsUnevaluated = 1;
   let Documentation = [CountedByDocs];
   let LangOpts = [COnly];
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 4cb4f3d999f7a..4fad4d1a0eca7 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -1447,6 +1447,10 @@ def FunctionMultiVersioning
 
 def NoDeref : DiagGroup<"noderef">;
 
+// -fbounds-safety and bounds annotation related warnings
+def BoundsSafetyCountedByEltTyUnknownSize :
+  DiagGroup<"bounds-safety-counted-by-elt-type-unknown-size">;
+
 // A group for cross translation unit static analysis related warnings.
 def CrossTU : DiagGroup<"ctu">;
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index c7dea1d54d063..0e73e0207766e 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -6544,8 +6544,10 @@ def warn_superclass_variable_sized_type_not_at_end : Warning<
 
 def err_flexible_array_count_not_in_same_struct : Error<
   "'counted_by' field %0 isn't within the same struct as the flexible array">;
-def err_counted_by_attr_not_on_flexible_array_member : Error<
-  "'counted_by' only applies to C99 flexible array members">;
+def err_counted_by_attr_not_on_ptr_or_flexible_array_member : Error<
+  "'counted_by' only applies to pointers or C99 flexible array members">;
+def err_counted_by_attr_on_array_not_flexible_array_member : Error<
+  "'counted_by' on arrays only applies to C99 flexible array members">;
 def err_counted_by_attr_refer_to_itself : Error<
   "'counted_by' cannot refer to the flexible array member %0">;
 def err_counted_by_must_be_in_structure : Error<
@@ -6560,6 +6562,23 @@ def err_counted_by_attr_refer_to_union : Error<
   "'counted_by' argument cannot refer to a union member">;
 def note_flexible_array_counted_by_attr_field : Note<
   "field %0 declared here">;
+def err_counted_by_attr_pointee_unknown_size : Error<
+  "'counted_by' %select{cannot|should not}3 be applied to %select{"
+    "a pointer with pointee|" // pointer
+    "an array with element}0" // array
+  " of unknown size because %1 is %select{"
+    "an incomplete type|"  // CountedByInvalidPointeeTypeKind::INCOMPLETE
+    "a sizeless type|"     // CountedByInvalidPointeeTypeKind::SIZELESS
+    "a function type|"     // CountedByInvalidPointeeTypeKind::FUNCTION
+    // CountedByInvalidPointeeTypeKind::FLEXIBLE_ARRAY_MEMBER
+    "a struct type with a flexible array member"
+    "%select{|. This will be an error in a future compiler version}3"
+    ""
+  "}2">;
+
+def warn_counted_by_attr_elt_type_unknown_size :
+  Warning<err_counted_by_attr_pointee_unknown_size.Summary>,
+  InGroup<BoundsSafetyCountedByEltTyUnknownSize>;
 
 let CategoryName = "ARC Semantic Issue" in {
 
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index 3c4ab649e3b4c..f5d990b400550 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1648,6 +1648,8 @@ class Parser : public CodeCompletionHandler {
                                bool EnterScope, bool OnDefinition);
   void ParseLexedAttribute(LateParsedAttribute &LA,
                            bool EnterScope, bool OnDefinition);
+  void ParseLexedCAttribute(LateParsedAttribute &LA,
+                            ParsedAttributes *OutAttrs = nullptr);
   void ParseLexedMethodDeclarations(ParsingClass &Class);
   void ParseLexedMethodDeclaration(LateParsedMethodDeclaration &LM);
   void ParseLexedMethodDefs(ParsingClass &Class);
@@ -2534,7 +2536,8 @@ class Parser : public CodeCompletionHandler {
 
   void ParseStructDeclaration(
       ParsingDeclSpec &DS,
-      llvm::function_ref<void(ParsingFieldDeclarator &)> FieldsCallback);
+      llvm::function_ref<Decl *(ParsingFieldDeclarator &)> FieldsCallback,
+      LateParsedAttrList *LateFieldAttrs = nullptr);
 
   DeclGroupPtrTy ParseTopLevelStmtDecl();
 
@@ -3112,6 +3115,8 @@ class Parser : public CodeCompletionHandler {
                                  SourceLocation ScopeLoc,
                                  ParsedAttr::Form Form);
 
+  void DistributeCLateParsedAttrs(Decl *Dcl, LateParsedAttrList *LateAttrs);
+
   void ParseBoundsAttribute(IdentifierInfo &AttrName,
                             SourceLocation AttrNameLoc, ParsedAttributes &Attrs,
                             IdentifierInfo *ScopeName, SourceLocation ScopeLoc,
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 01ddba5eaf01d..a37e22055d852 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -11381,7 +11381,8 @@ class Sema final : public SemaBase {
   QualType BuildMatrixType(QualType T, Expr *NumRows, Expr *NumColumns,
                            SourceLocation AttrLoc);
 
-  QualType BuildCountAttributedArrayType(QualType WrappedTy, Expr *CountExpr);
+  QualType BuildCountAttributedArrayOrPointerType(QualType WrappedTy,
+                                                  Expr *CountExpr);
 
   QualType BuildAddressSpaceAttr(QualType &T, LangAS ASIdx, Expr *AddrSpace,
                                  SourceLocation AttrLoc);
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 3b90b8229dd18..04f105c128872 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -632,6 +632,16 @@ bool Type::isStructureType() const {
   return false;
 }
 
+bool Type::isStructureTypeWithFlexibleArrayMember() const {
+  const auto *RT = getAs<RecordType>();
+  if (!RT)
+    return false;
+  const auto *Decl = RT->getDecl();
+  if (!Decl->isStruct())
+    return false;
+  return Decl->hasFlexibleArrayMember();
+}
+
 bool Type::isObjCBoxableRecordType() const {
   if (const auto *RT = getAs<RecordType>())
     return RT->getDecl()->hasAttr<ObjCBoxableAttr>();
diff --git a/clang/lib/Parse/ParseCXXInlineMethods.cpp b/clang/lib/Parse/ParseCXXInlineMethods.cpp
index 943ce0fdde3a3..e1dfd572d21a7 100644
--- a/clang/lib/Parse/ParseCXXInlineMethods.cpp
+++ b/clang/lib/Parse/ParseCXXInlineMethods.cpp
@@ -744,7 +744,13 @@ void Parser::ParseLexedAttributeList(LateParsedAttrList &LAs, Decl *D,
   for (unsigned i = 0, ni = LAs.size(); i < ni; ++i) {
     if (D)
       LAs[i]->addDecl(D);
-    ParseLexedAttribute(*LAs[i], EnterScope, OnDefinition);
+    // FIXME: There has to be a better way to ask "is this C?""
+    if (LangStandard::getLangStandardForKind(getLangOpts().LangStd)
+            .getLanguage() == Language::C) {
+      // TODO: Use `EnterScope`
+      ParseLexedCAttribute(*LAs[i]);
+    } else
+      ParseLexedAttribute(*LAs[i], EnterScope, OnDefinition);
     delete LAs[i];
   }
   LAs.clear();
diff --git a/clang/lib/Parse/ParseDecl.cpp b/clang/lib/Parse/ParseDecl.cpp
index 445d3fd66e387..36e130a3bb6cb 100644
--- a/clang/lib/Parse/ParseDecl.cpp
+++ b/clang/lib/Parse/ParseDecl.cpp
@@ -3306,6 +3306,19 @@ void Parser::ParseAlignmentSpecifier(ParsedAttributes &Attrs,
   }
 }
 
+void Parser::DistributeCLateParsedAttrs(Decl *Dcl,
+                                        LateParsedAttrList *LateAttrs) {
+  assert(Dcl && "Dcl cannot be null");
+
+  if (!LateAttrs)
+    return;
+
+  for (auto *LateAttr : *LateAttrs) {
+    if (LateAttr->Decls.empty())
+      LateAttr->addDecl(Dcl);
+  }
+}
+
 /// Bounds attributes (e.g., counted_by):
 ///   AttrName '(' expression ')'
 void Parser::ParseBoundsAttribute(IdentifierInfo &AttrName,
@@ -4843,13 +4856,14 @@ static void DiagnoseCountAttributedTypeInUnnamedAnon(ParsingDeclSpec &DS,
 ///
 void Parser::ParseStructDeclaration(
     ParsingDeclSpec &DS,
-    llvm::function_ref<void(ParsingFieldDeclarator &)> FieldsCallback) {
+    llvm::function_ref<Decl *(ParsingFieldDeclarator &)> FieldsCallback,
+    LateParsedAttrList *LateFieldAttrs) {
 
   if (Tok.is(tok::kw___extension__)) {
     // __extension__ silences extension warnings in the subexpression.
     ExtensionRAIIObject O(Diags);  // Use RAII to do this.
     ConsumeToken();
-    return ParseStructDeclaration(DS, FieldsCallback);
+    return ParseStructDeclaration(DS, FieldsCallback, LateFieldAttrs);
   }
 
   // Parse leading attributes.
@@ -4914,10 +4928,12 @@ void Parser::ParseStructDeclaration(
     }
 
     // If attributes exist after the declarator, parse them.
-    MaybeParseGNUAttributes(DeclaratorInfo.D);
+    MaybeParseGNUAttributes(DeclaratorInfo.D, LateFieldAttrs);
 
     // We're done with this declarator;  invoke the callback.
-    FieldsCallback(DeclaratorInfo);
+    Decl *Field = FieldsCallback(DeclaratorInfo);
+    if (Field)
+      DistributeCLateParsedAttrs(Field, LateFieldAttrs);
 
     // If we don't have a comma, it is either the end of the list (a ';')
     // or an error, bail out.
@@ -4928,6 +4944,69 @@ void Parser::ParseStructDeclaration(
   }
 }
 
+/// Finish parsing an attribute for which parsing was delayed.
+/// This will be called at the end of parsing a class declaration
+/// for each LateParsedAttribute. We consume the saved tokens and
+/// create an attribute with the arguments filled in. We add this
+/// to the Attribute list for the decl.
+void Parser::ParseLexedCAttribute(LateParsedAttribute &LA,
+                                  ParsedAttributes *OutAttrs) {
+  // Create a fake EOF so that attribute parsing won't go off the end of the
+  // attribute.
+  Token AttrEnd;
+  AttrEnd.startToken();
+  AttrEnd.setKind(tok::eof);
+  AttrEnd.setLocation(Tok.getLocation());
+  AttrEnd.setEofData(LA.Toks.data());
+  LA.Toks.push_back(AttrEnd);
+
+  // Append the current token at the end of the new token stream so that it
+  // doesn't get lost.
+  LA.Toks.push_back(Tok);
+  PP.EnterTokenStream(LA.Toks, /*DisableMacroExpansion=*/true,
+                      /*IsReinject=*/true);
+  // Drop the current token and bring the first cached one. It's the same token
+  // as when we entered this function.
+  ConsumeAnyToken(/*ConsumeCodeCompletionTok=*/true);
+
+  ParsedAttributes Attrs(AttrFactory);
+
+  assert(LA.Decls.size() <= 1 &&
+         "late field attribute expects to have at most one declaration.");
+
+  // Dispatch based on the attribute and parse it
+  const AttributeCommonInfo::Form ParsedForm = ParsedAttr::Form::GNU();
+  IdentifierInfo *ScopeName = nullptr;
+  const ParsedAttr::Kind AttrKind =
+      ParsedAttr::getParsedKind(&LA.AttrName, /*ScopeName=*/ScopeName,
+                                /*SyntaxUsed=*/ParsedForm.getSyntax());
+  switch (AttrKind) {
+  case ParsedAttr::Kind::AT_CountedBy:
+    ParseBoundsAttribute(LA.AttrName, LA.AttrNameLoc, Attrs,
+                         /*ScopeName=*/ScopeName, SourceLocation(),
+                         /*Form=*/ParsedForm);
+    break;
+  default:
+    llvm_unreachable("Unhandled late parsed attribute");
+  }
+
+  for (auto *D : LA.Decls)
+    Actions.ActOnFinishDelayedAttribute(getCurScope(), D, Attrs);
+
+  // Due to a parsing error, we either went over the cached tokens or
+  // there are still cached tokens left, so we skip the leftover tokens.
+  while (Tok.isNot(tok::eof))
+    ConsumeAnyToken();
+
+  // Consume the fake EOF token if it's there
+  if (Tok.is(tok::eof) && Tok.getEofData() == AttrEnd.getEofData())
+    ConsumeAnyToken();
+
+  if (OutAttrs) {
+    OutAttrs->takeAllFrom(Attrs);
+  }
+}
+
 /// ParseStructUnionBody
 ///       struct-contents:
 ///         struct-declaration-list
@@ -4951,6 +5030,11 @@ void Parser::ParseStructUnionBody(SourceLocation RecordLoc,
   ParseScope StructScope(this, Scope::ClassScope|Scope::DeclScope);
   Actions.ActOnTagStartDefinition(getCurScope(), TagDecl);
 
+  // `LateAttrParseExperimentalExtOnly=true` requests that only attributes
+  // marked with `LateAttrParseExperimentalExt` are late parsed.
+  LateParsedAttrList LateFieldAttrs(/*PSoon=*/true,
+                                    /*LateAttrParseExperimentalExtOnly=*/true);
+
   // While we still have something to read, read the declarations in the struct.
   while (!tryParseMisplacedModuleImport() && Tok.isNot(tok::r_brace) &&
          Tok.isNot(tok::eof)) {
@@ -5001,18 +5085,19 @@ void Parser::ParseStructUnionBody(SourceLocation RecordLoc,
     }
 
     if (!Tok.is(tok::at)) {
-      auto CFieldCallback = [&](ParsingFieldDeclarator &FD) {
+      auto CFieldCallback = [&](ParsingFieldDeclarator &FD) -> Decl * {
         // Install the declarator into the current TagDecl.
         Decl *Field =
             Actions.ActOnField(getCurScope(), TagDecl,
                                FD.D.getDeclSpec().getSourceRange().getBegin(),
                                FD.D, FD.BitfieldSize);
         FD.complete(Field);
+        return Field;
       };
 
       // Parse all the comma separated declarators.
       ParsingDeclSpec DS(*this);
-      ParseStructDeclaration(DS, CFieldCallback);
+      ParseStructDeclaration(DS, CFieldCallback, &LateFieldAttrs);
     } else { // Handle @defs
       ConsumeToken();
       if (!Tok.isObjCAtKeyword(tok::objc_defs)) {
@@ -5053,7 +5138,12 @@ void Parser::ParseStructUnionBody(SourceLocation RecordLoc,
 
   ParsedAttributes attrs(AttrFactory);
   // If attributes exist after struct contents, parse them.
-  MaybeParseGNUAttributes(attrs);
+  MaybeParseGNUAttributes(attrs, &LateFieldAttrs);
+
+  // Late parse field attributes if necessary.
+  assert(!getLangOpts().CPlusPlus);
+  ParseLexedAttributeList(LateFieldAttrs, /*Decl=*/nullptr,
+                          /*EnterScope=*/false, /*OnDefinition=*/false);
 
   SmallVector<Decl *, 32> FieldDecls(TagDecl->fields());
 
diff --git a/clang/lib/Parse/ParseObjc.cpp b/clang/lib/Parse/ParseObjc.cpp
index 89f4acbd25e49..6a2088a73c55b 100644
--- a/clang/lib/Parse/ParseObjc.cpp
+++ b/clang/lib/Parse/ParseObjc.cpp
@@ -780,16 +780,16 @@ void Parser::ParseObjCInterfaceDeclList(tok::ObjCKeywordKind contextKey,
       }
 
       bool addedToDeclSpec = false;
-      auto ObjCPropertyCallback = [&](ParsingFieldDeclarator &FD) {
+      auto ObjCPropertyCallback = [&](ParsingFieldDeclarator &FD) -> Decl * {
         if (FD.D.getIdentifier() == nullptr) {
           Diag(AtLoc, diag::err_objc_property_requires_field_name)
               << FD.D.getSourceRange();
-          return;
+          return nullptr;
         }
         if (FD.BitfieldSize) {
           Diag(AtLoc, diag::err_objc_property_bitfield)
               << FD.D.getSourceRange();
-          return;
+          return nullptr;
         }
 
         // Map a nullability property attribute to a context-sensitive keyword
@@ -818,6 +818,7 @@ void Parser::ParseObjCInterfaceDeclList(tok::ObjCKeywordKind contextKey,
             MethodImplKind);
 
         FD.complete(Property);
+        return Property;
       };
 
       // Parse all the comma separated declarators.
@@ -2013,7 +2014,7 @@ void Parser::ParseObjCClassInstanceVariables(ObjCContainerDecl *interfaceDecl,
       continue;
     }
 
-    auto ObjCIvarCallback = [&](ParsingFieldDeclarator &FD) {
+    auto ObjCIvarCallback = [&](ParsingFieldDeclarator &FD) -> Decl * {
       assert(getObjCDeclContext() == interfaceDecl &&
              "Ivar should have interfaceDecl as its decl context");
       // Install the declarator into the interface decl.
@@ -2024,6 +2025,7 @@ void Parser::ParseObjCClassInstanceVariables(ObjCContainerDecl *interfaceDecl,
       if (Field)
         AllIvarDecls.push_back(Field);
       FD.complete(Field);
+      return Field;
     };
 
     // Parse all the comma separated declarators.
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index ca5938083917f..5041fd65286fa 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -8663,31 +8663,95 @@ static const RecordDecl *GetEnclosingNamedOrTopAnonRecord(const FieldDecl *FD) {
   return RD;
 }
 
-static bool
-CheckCountExpr(Sema &S, FieldDecl *FD, Expr *E,
-               llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls) {
+enum class CountedByInvalidPointeeTypeKind {
+  INCOMPLETE,
+  SIZELESS,
+  FUNCTION,
+  FLEXIBLE_ARRAY_MEMBER,
+  VALID,
+};
+
+static bool CheckCountedByAttrOnField(
+    Sema &S, FieldDecl *FD, Expr *E,
+    llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls) {
+  // Check the context the attribute is used in
+
   if (FD->getParent()->isUnion()) {
     S.Diag(FD->getBeginLoc(), diag::err_counted_by_attr_in_union)
         << FD->getSourceRange();
     return true;
   }
 
-  if (!E->getType()->isIntegerType() || E->getType()->isBooleanType()) {
-    S.Diag(E->getBeginLoc(), diag::err_counted_by_attr_argument_not_integer)
-        << E->getSourceRange();
+  const auto FieldTy = FD->getType();
+  if (!FieldTy->isArrayType() && !FieldTy->isPointerType()) {
+    S.Diag(FD->getBeginLoc(),
+           diag::err_counted_by_attr_not_on_ptr_or_flexible_array_member)
+        << FD->getLocation();
     return true;
   }
 
   LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
       LangOptions::StrictFlexArraysLevelKind::IncompleteOnly;
-
-  if (!Decl::isFlexibleArrayMemberLike(S.getASTContext(), FD, FD->getType(),
+  if (FieldTy->isArrayType() &&
+      !Decl::isFlexibleArrayMemberLike(S.getASTContext(), FD, FieldTy,
                                        StrictFlexArraysLevel, true)) {
-    // The "counted_by" attribute must be on a flexible array member.
-    SourceRange SR = FD->getLocation();
-    S.Diag(SR.getBegin(),
-           diag::err_counted_by_attr_not_on_flexible_array_member)
-        << SR;
+    S.Diag(FD->getBeginLoc(),
+           diag::err_c...
[truncated]

@@ -744,7 +744,13 @@ void Parser::ParseLexedAttributeList(LateParsedAttrList &LAs, Decl *D,
for (unsigned i = 0, ni = LAs.size(); i < ni; ++i) {
if (D)
LAs[i]->addDecl(D);
ParseLexedAttribute(*LAs[i], EnterScope, OnDefinition);
// FIXME: There has to be a better way to ask "is this C?""
if (LangStandard::getLangStandardForKind(getLangOpts().LangStd)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronBallman @erichkeane Is there a better way to check that we're building for C?

@delcypher
Copy link
Contributor Author

delcypher commented May 23, 2024

@rapidsna Please check 604e274 b41afe1 2b1794e to check you are happy with the way I'm fixing the memory leak.

@delcypher
Copy link
Contributor Author

Hmm looks like I broke

  Clang :: Sema/attr-capabilities.c
  Clang :: Sema/diagnose_if.c
  Clang :: Sema/warn-thread-safety-analysis.c
  Clang :: SemaObjC/warn-thread-safety-analysis.m

Let's see if I can figure out what I did...

@delcypher
Copy link
Contributor Author

Okay this is interesting...

********************
FAIL: Clang :: SemaObjC/warn-thread-safety-analysis.m (15515 of 19369)
******************** TEST 'Clang :: SemaObjC/warn-thread-safety-analysis.m' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
RUN: at line 1: /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang -cc1 -internal-isystem /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/lib/clang/19/include -nostdsysteminc -fsyntax-only -verify -Wthread-safety -Wthread-safety-beta -Wno-objc-root-class /Volumes/user_data/dev/llvm/llvm.org/main/src/llvm-project/clang/test/SemaObjC/warn-thread-safety-analysis.m
+ /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang -cc1 -internal-isystem /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/lib/clang/19/include -nostdsysteminc -fsyntax-only -verify -Wthread-safety -Wthread-safety-beta -Wno-objc-root-class /Volumes/user_data/dev/llvm/llvm.org/main/src/llvm-project/clang/test/SemaObjC/warn-thread-safety-analysis.m
Unhandled late parsed attribute
UNREACHABLE executed at /Volumes/user_data/dev/llvm/llvm.org/main/src/llvm-project/clang/lib/Parse/ParseDecl.cpp:4990!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang -cc1 -internal-isystem /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/lib/clang/19/include -nostdsysteminc -fsyntax-only -verify -Wthread-safety -Wthread-safety-beta -Wno-objc-root-class /Volumes/user_data/dev/llvm/llvm.org/main/src/llvm-project/clang/test/SemaObjC/warn-thread-safety-analysis.m
1.      /Volumes/user_data/dev/llvm/llvm.org/main/src/llvm-project/clang/test/SemaObjC/warn-thread-safety-analysis.m:7:61: current parser token '('
 #0 0x0000000107d741b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1035d41b8)
 #1 0x0000000107d7479c PrintStackTraceSignalHandler(void*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1035d479c)
 #2 0x0000000107d72454 llvm::sys::RunSignalHandlers() (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1035d2454)
 #3 0x0000000107d758b8 SignalHandler(int) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1035d58b8)
 #4 0x00000001813b9a24 (/usr/lib/system/libsystem_platform.dylib+0x18046da24)
 #5 0x0000000181389cc0 (/usr/lib/system/libsystem_pthread.dylib+0x18043dcc0)
 #6 0x0000000181295a40 (/usr/lib/system/libsystem_c.dylib+0x180349a40)
 #7 0x0000000107c4f0dc llvm::install_out_of_memory_new_handler() (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1034af0dc)
 #8 0x000000010c40fc90 clang::Parser::ParseLexedCAttribute(clang::Parser::LateParsedAttribute&, clang::ParsedAttributes*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107c6fc90)
 #9 0x000000010c3ed5ec clang::Parser::ParseLexedAttributeList(clang::Parser::LateParsedAttrList&, clang::Decl*, bool, bool) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107c4d5ec)
#10 0x000000010c51a05c clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107d7a05c)
#11 0x000000010c400bb8 clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107c60bb8)
#12 0x000000010c5190bc clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107d790bc)
#13 0x000000010c5185f4 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107d785f4)
#14 0x000000010c5177f0 clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107d777f0)
#15 0x000000010c5158cc clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107d758cc)
#16 0x000000010c3e5808 clang::ParseAST(clang::Sema&, bool, bool) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x107c45808)
#17 0x00000001095652b8 clang::ASTFrontendAction::ExecuteAction() (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x104dc52b8)
#18 0x0000000109564a74 clang::FrontendAction::Execute() (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x104dc4a74)
#19 0x0000000109484230 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x104ce4230)
#20 0x0000000109689944 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x104ee9944)
#21 0x00000001047b2dec cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x100012dec)
#22 0x00000001047a48c8 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1000048c8)
#23 0x00000001047a35fc clang_main(int, char**, llvm::ToolContext const&) (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x1000035fc)
#24 0x00000001047dc5a4 main (/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/bin/clang-19+0x10003c5a4)
#25 0x00000001810090e0 

This means clang::Parser::ParseLexedAttributeList (and hence clang::Parser::ParseLexedAttribute is actually used for parsing C files. I had assumed it was for C++ only but that's clearly not the case.

I'm going to redo the memory leak patch to not use clang::Parser::ParseLexedAttributeList just so we can get the code relanded but after that we should investigate using clang::Parser::ParseLexedAttributeList and clang::Parser::ParseLexedAttribute so we can get better code reuse.

@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from 604e274 to 671dd73 Compare May 23, 2024 07:08
@@ -4993,20 +4993,8 @@ void Parser::ParseLexedCAttribute(LateParsedAttribute &LA,
"late field attribute expects to have at most one declaration.");

// Dispatch based on the attribute and parse it
const AttributeCommonInfo::Form ParsedForm = ParsedAttr::Form::GNU();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hnrklssn @rapidsna Heads up I plan to land this change too. I discovered that this this switch statement doesn't need to exist because ParseGNUAttributeArgs effectively already does what we need.

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sema.h changes look good.

@delcypher
Copy link
Contributor Author

Hmm looks like I'll have to tweak the assert. Looks like some unit tests don't specify the language.

FAIL: Clang-Unit :: CodeGen/./ClangCodeGenTests/5/16 (19428 of 19653)
******************** TEST 'Clang-Unit :: CodeGen/./ClangCodeGenTests/5/16' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/tools/clang/unittests/CodeGen/./ClangCodeGenTests-Clang-Unit-78051-5-16.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=16 GTEST_SHARD_INDEX=5 /Volumes/user_data/dev/llvm/llvm.org/main/builds/Debug_xc___sccache/tools/clang/unittests/CodeGen/./ClangCodeGenTests
--

Note: This is test shard 6 of 16.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TBAAMetadataTest
[ RUN      ] TBAAMetadataTest.CTypedefFields2
LLVM ERROR: getLangStandardForKind() on unspecified kind
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  ClangCodeGenTests        0x0000000102c186e4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 88
1  ClangCodeGenTests        0x0000000102c18cc8 PrintStackTraceSignalHandler(void*) + 28
2  ClangCodeGenTests        0x0000000102c1695c llvm::sys::RunSignalHandlers() + 152
3  ClangCodeGenTests        0x0000000102c19e30 SignalHandler(int) + 276
4  libsystem_platform.dylib 0x00000001813b9a24 _sigtramp + 56
5  libsystem_pthread.dylib  0x0000000181389cc0 pthread_kill + 288
6  libsystem_c.dylib        0x0000000181295a40 abort + 180
7  ClangCodeGenTests        0x0000000102ad60bc llvm::report_fatal_error(llvm::Twine const&, bool) + 364
8  ClangCodeGenTests        0x0000000102ad5f50 llvm::report_fatal_error(llvm::Twine const&, bool) + 0
9  ClangCodeGenTests        0x000000010388c44c clang::LangStandard::getLangStandardForKind(clang::LangStandard::Kind) + 88
10 ClangCodeGenTests        0x00000001048d565c clang::Parser::ParseLexedCAttributeList(clang::Parser::LateParsedAttrList&, bool, clang::ParsedAttributes*) + 144
11 ClangCodeGenTests        0x00000001048d6190 clang::Parser::ParseStructUnionBody(clang::SourceLocation, clang::TypeSpecifierType, clang::RecordDecl*) + 1724
12 ClangCodeGenTests        0x00000001048ff868 clang::Parser::ParseClassSpecifier(clang::tok::TokenKind, clang::SourceLocation, clang::DeclSpec&, clang::Parser::ParsedTemplateInfo&, clang::AccessSpecifier, bool, clang::Parser::DeclSpecContext, clang::ParsedAttributes&) + 8916
13 ClangCodeGenTests        0x00000001048ce33c clang::Parser::ParseDeclarationSpecifiers(clang::DeclSpec&, clang::Parser::ParsedTemplateInfo&, clang::AccessSpecifier, clang::Parser::DeclSpecContext, clang::Parser::LateParsedAttrList*, clang::ImplicitTypenameContext) + 15492
14 ClangCodeGenTests        0x00000001048c5908 clang::Parser::ParseDeclarationSpecifiers(clang::DeclSpec&, clang::Parser::ParsedTemplateInfo&, clang::AccessSpecifier, clang::Parser::DeclSpecContext, clang::Parser::LateParsedAttrList*) + 124
15 ClangCodeGenTests        0x00000001048c54ac clang::Parser::ParseSimpleDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::ParsedAttributes&, bool, clang::Parser::ForRangeInit*, clang::SourceLocation*) + 260
16 ClangCodeGenTests        0x00000001048c5284 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::ParsedAttributes&, clang::SourceLocation*) + 1060
17 ClangCodeGenTests        0x00000001049ddbd0 clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) + 2196
18 ClangCodeGenTests        0x00000001049dc10c clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) + 2024
19 ClangCodeGenTests        0x00000001049db880 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) + 64
20 ClangCodeGenTests        0x00000001048ac1c8 clang::ParseAST(clang::Sema&, bool, bool) + 476
21 ClangCodeGenTests        0x00000001024b1824 llvm::TestCompiler::compile() + 48
22 ClangCodeGenTests        0x00000001024c1024 (anonymous namespace)::TBAAMetadataTest_CTypedefFields2_Test::TestBody() + 236
23 ClangCodeGenTests        0x0000000102c9cf40 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 132
24 ClangCodeGenTests        0x0000000102c6be7c void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 96
25 ClangCodeGenTests        0x0000000102c6bdcc testing::Test::Run() + 192
26 ClangCodeGenTests        0x0000000102c6c894 testing::TestInfo::Run() + 304
27 ClangCodeGenTests        0x0000000102c6d768 testing::TestSuite::Run() + 412
28 ClangCodeGenTests        0x0000000102c77ee4 testing::internal::UnitTestImpl::RunAllTests() + 1024
29 ClangCodeGenTests        0x0000000102ca8248 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 132
30 ClangCodeGenTests        0x0000000102c77a58 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 96
31 ClangCodeGenTests        0x0000000102c7796c testing::UnitTest::Run() + 192
32 ClangCodeGenTests        0x0000000102c597b8 RUN_ALL_TESTS() + 16
33 ClangCodeGenTests        0x0000000102c5979c main + 192
34 dyld                     0x00000001810090e0 start + 2360

@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from a643967 to 3feb18b Compare May 23, 2024 17:50
Copy link
Contributor

@rapidsna rapidsna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

int count;
// Treating this as a warning is a temporary fix for existing attribute adopters. It **SHOULD BE AN ERROR**.
// expected-warning@+1{{'counted_by' should not be applied to an array with element of unknown size because 'struct has_unannotated_VLA' is a struct type with a flexible array member. This will be an error in a future compiler version}}
struct has_unannotated_VLA Arr[] __counted_by(count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwendling Just to be clear. I'm not sure if Linux builds treat warnings as errors by default. If so, they would still need to do things so as not to promote it as an error in order to have the builds succeed.

@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from 3feb18b to bc5b8bc Compare May 23, 2024 20:43
@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from bc5b8bc to f9c8abd Compare May 23, 2024 20:50
delcypher added a commit to delcypher/apple-llvm-project that referenced this pull request May 23, 2024
…ters in structs in C (llvm#93121)

Fixes llvm#92687.

Previously the attribute was only allowed on flexible array members.
This patch patch changes this to also allow the attribute on pointer
fields in structs and also allows late parsing of the attribute in some
contexts.

For example this previously wasn't allowed:

```
struct BufferTypeDeclAttributePosition {
  size_t count;
  char* buffer __counted_by(count); // Now allowed
}
```

Note the attribute is prevented on pointee types where the size isn't
known at compile time. In particular pointee types that are:

* Incomplete (e.g. `void`) and sizeless types
* Function types (e.g. the pointee of a function pointer)
* Struct types with a flexible array member

This patch also introduces late parsing of the attribute when used in
the declaration attribute position. For example

```
struct BufferTypeDeclAttributePosition {
  char* buffer __counted_by(count); // Now allowed
  size_t count;
}
```

is now allowed but **only** when passing
`-fexperimental-late-parse-attributes`. The motivation for using late
parsing here is to avoid breaking the data layout of structs in existing
code that want to use the `counted_by` attribute. This patch is the
first use of `LateAttrParseExperimentalExt` in `Attr.td` that was
introduced in a previous patch.

Note by allowing the attribute on struct member pointers this now allows
the possiblity of writing the attribute in the type attribute position.
For example:

```
struct BufferTypeAttributePosition {
  size_t count;
  char *__counted_by(count) buffer; // Now allowed
}
```

However, the attribute in this position is still currently parsed
immediately rather than late parsed. So this will not parse currently:

```
struct BufferTypeAttributePosition {
  char *__counted_by(count) buffer; // Fails to parse
  size_t count;
}
```

The intention is to lift this restriction in future patches. It has not
been done in this patch to keep this size of this commit small.

There are also several other follow up changes that will need to be
addressed in future patches:

* Make late parsing working with anonymous structs (see
`on_pointer_anon_buf` in `attr-counted-by-late-parsed-struct-ptrs.c`).
* Allow `counted_by` on more subjects (e.g. parameters, returns types)
when `-fbounds-safety` is enabled.
* Make use of the attribute on pointer types in code gen (e.g. for
`_builtin_dynamic_object_size` and UBSan's array-bounds checks).

This work is heavily based on a patch originally written by Yeoul Na.

** Differences between llvm#93121 and this patch **

* The memory leak that caused llvm#93121 to be reverted (see llvm#92687) should
  now be fixed. See "The Memory Leak".
* The fix to `pragma-attribute-supported-attributes-list.test`
  (originally in cef6387) has been incorporated into this patch.
* A relaxation of counted_by semantics (originally in 112eadd) has been
  incorporated into this patch.
* The assert in `Parser::DistributeCLateParsedAttrs` has been removed
  because that broke downstream code.
* The switch statement in `Parser::ParseLexedCAttribute` has been
  removed in favor of using `Parser::ParseGNUAttributeArgs` which does
  the same thing but is more feature complete.
* The `EnterScope` parameter has been plumbed through
  `Parser::ParseLexedCAttribute` and `Parser::ParseLexedCAttributeList`.
  It currently doesn't do anything but it will be needed in future
  commits.

** The Memory Leak **

The problem was that these lines parsed the attributes but then did nothing to free the memory

```
assert(!getLangOpts().CPlusPlus);
for (auto *LateAttr : LateFieldAttrs)
  ParseLexedCAttribute(*LateAttr);
```

To fix this this a new `Parser::ParseLexedCAttributeList` method has been
added (based on `Parser::ParseLexedAttributeList`) which does the
necessary memory management. The intention is to merge these two
methods together so there is just one implementation in a future patch.

A more principled fixed here would be to fix the ownership of the
`LateParsedAttribute` objects. In principle `LateParsedAttrList` should own
its pointers exclusively and be responsible for deallocating them.
Unfortunately this is complicated by `LateParsedAttribute` objects also
being stored in another data structure (`LateParsedDeclarations`) as
can be seen below (`LA` gets stored in two places).

```
      // Handle attributes with arguments that require late parsing.
      LateParsedAttribute *LA =
          new LateParsedAttribute(this, *AttrName, AttrNameLoc);
      LateAttrs->push_back(LA);

      // Attributes in a class are parsed at the end of the class, along
      // with other late-parsed declarations.
      if (!ClassStack.empty() && !LateAttrs->parseSoon())
        getCurrentClass().LateParsedDeclarations.push_back(LA);
```

this means the ownership of LateParsedAttribute objects isn't very
clear.

rdar://125400257
@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from f9c8abd to 9e1a9e4 Compare May 23, 2024 20:53
@delcypher
Copy link
Contributor Author

Filed to #93263 to track the potential refactor.

…ters in structs in C (llvm#93121)

Fixes llvm#92687.

Previously the attribute was only allowed on flexible array members.
This patch patch changes this to also allow the attribute on pointer
fields in structs and also allows late parsing of the attribute in some
contexts.

For example this previously wasn't allowed:

```
struct BufferTypeDeclAttributePosition {
  size_t count;
  char* buffer __counted_by(count); // Now allowed
}
```

Note the attribute is prevented on pointee types where the size isn't
known at compile time. In particular pointee types that are:

* Incomplete (e.g. `void`) and sizeless types
* Function types (e.g. the pointee of a function pointer)
* Struct types with a flexible array member

This patch also introduces late parsing of the attribute when used in
the declaration attribute position. For example

```
struct BufferTypeDeclAttributePosition {
  char* buffer __counted_by(count); // Now allowed
  size_t count;
}
```

is now allowed but **only** when passing
`-fexperimental-late-parse-attributes`. The motivation for using late
parsing here is to avoid breaking the data layout of structs in existing
code that want to use the `counted_by` attribute. This patch is the
first use of `LateAttrParseExperimentalExt` in `Attr.td` that was
introduced in a previous patch.

Note by allowing the attribute on struct member pointers this now allows
the possiblity of writing the attribute in the type attribute position.
For example:

```
struct BufferTypeAttributePosition {
  size_t count;
  char *__counted_by(count) buffer; // Now allowed
}
```

However, the attribute in this position is still currently parsed
immediately rather than late parsed. So this will not parse currently:

```
struct BufferTypeAttributePosition {
  char *__counted_by(count) buffer; // Fails to parse
  size_t count;
}
```

The intention is to lift this restriction in future patches. It has not
been done in this patch to keep this size of this commit small.

There are also several other follow up changes that will need to be
addressed in future patches:

* Make late parsing working with anonymous structs (see
`on_pointer_anon_buf` in `attr-counted-by-late-parsed-struct-ptrs.c`).
* Allow `counted_by` on more subjects (e.g. parameters, returns types)
when `-fbounds-safety` is enabled.
* Make use of the attribute on pointer types in code gen (e.g. for
`_builtin_dynamic_object_size` and UBSan's array-bounds checks).

This work is heavily based on a patch originally written by Yeoul Na.

** Differences between llvm#93121 and this patch **

* The memory leak that caused llvm#93121 to be reverted (see llvm#92687) should
  now be fixed. See "The Memory Leak".
* The fix to `pragma-attribute-supported-attributes-list.test`
  (originally in cef6387) has been incorporated into this patch.
* A relaxation of counted_by semantics (originally in 112eadd) has been
  incorporated into this patch.
* The assert in `Parser::DistributeCLateParsedAttrs` has been removed
  because that broke downstream code.
* The switch statement in `Parser::ParseLexedCAttribute` has been
  removed in favor of using `Parser::ParseGNUAttributeArgs` which does
  the same thing but is more feature complete.
* The `EnterScope` parameter has been plumbed through
  `Parser::ParseLexedCAttribute` and `Parser::ParseLexedCAttributeList`.
  It currently doesn't do anything but it will be needed in future
  commits.

** The Memory Leak **

The problem was that these lines parsed the attributes but then did nothing to free the memory

```
assert(!getLangOpts().CPlusPlus);
for (auto *LateAttr : LateFieldAttrs)
  ParseLexedCAttribute(*LateAttr);
```

To fix this this a new `Parser::ParseLexedCAttributeList` method has been
added (based on `Parser::ParseLexedAttributeList`) which does the
necessary memory management. The intention is to merge these two
methods together so there is just one implementation in a future patch
(llvm#93263).

A more principled fixed here would be to fix the ownership of the
`LateParsedAttribute` objects. In principle `LateParsedAttrList` should own
its pointers exclusively and be responsible for deallocating them.
Unfortunately this is complicated by `LateParsedAttribute` objects also
being stored in another data structure (`LateParsedDeclarations`) as
can be seen below (`LA` gets stored in two places).

```
      // Handle attributes with arguments that require late parsing.
      LateParsedAttribute *LA =
          new LateParsedAttribute(this, *AttrName, AttrNameLoc);
      LateAttrs->push_back(LA);

      // Attributes in a class are parsed at the end of the class, along
      // with other late-parsed declarations.
      if (!ClassStack.empty() && !LateAttrs->parseSoon())
        getCurrentClass().LateParsedDeclarations.push_back(LA);
```

this means the ownership of LateParsedAttribute objects isn't very
clear.

rdar://125400257
@delcypher delcypher force-pushed the users/delcypher/experimental-attr-late-parsing_v2 branch from 9e1a9e4 to c1f0914 Compare May 24, 2024 01:31
@delcypher delcypher merged commit 2918973 into llvm:main May 24, 2024
4 of 6 checks passed
diag::err_counted_by_attr_not_on_flexible_array_member)
<< SR;
S.Diag(FD->getBeginLoc(),
diag::err_counted_by_attr_on_array_not_flexible_array_member)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the motivation for this was. Wouldn't it also be nice to be able apply this to objects that isFlexibleArrayMemberLike for a different StrictFlexArraysLevelKind, such as arrays of size 0 or 1 (or even more), which is often use in old code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmayer This is an artificial limitation to meet the current upstream uses of the attribute. The downstream implementation doesn't restrict __counted_by to just flexible array members. the full implementation of -fbounds-safety can be found in https://github.com/swiftlang/llvm-project/tree/next

Copy link
Member

@hnrklssn hnrklssn Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the question is not why only flexible array members are allowed the attribute, but why only incomplete arrays are. Some old projects use this as syntax as incomplete array:

struct S {
  int len;
  char fam[0];
};

Would converting the above to an incomplete array like this be an issue in those projects? It seems to me that if you're introducing __counted_by, removing any old integer constant isn't much more extra work.

struct S {
  int len;
  char fam[__counted_by(len)];
};

The reason is that a complete array already has a bound, which we use for bounds checking with -fbounds-safety enabled. In fact, an array parameter like this will decay into a int * __counted_by(len) pointer:

void foo(int len, int arr[len]);

If you allow annotating these arrays with __counted_by also, you effectively end up with two conflicting sources of bounds info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the replies.

It is slightly easier (more mechanical) to not change the (size of the) underlying type, because then we do not need to make sure size calculations etc are correct. In theory I can also imagine some API considerations around the actual struct size as well (e.g. some client of the code manually calculates the size).

My thinking was that __counted_by on any FAM-eligible (i.e. last member of a struct) array would make clang consider it a FAM, regardless of its size (and regardless of -fstrict-flex-arrays), because there is a clear user intent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fstrict-flex-arrays should not be overridden. counted_by should only be legal on actual flexible arrays. The other stuff (trailing array of any size) cannot be deterministically reasoned about. A project using fake flexible arrays must fix their code before using these features, and attempts to mix counted_by with sized arrays should be an error.

Refactoring to use only true flexible arrays is mostly mechanical:
https://people.kernel.org/kees/bounded-flexible-arrays-in-c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. The linked blogpost does override -fstrict-flex-arrays (although it seems incorrectly, because it doesn't pass the level). Your point is that the only true levels are 0 or 3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, =1 and =2 were introduced by GCC against my recommendation even though they could not justify it beyond theoretical concerns. IMO, our experience purging the Linux kernel of fake flexible arrays supports the "all or nothing" case for dealing with flexible arrays. But, whatever the case, at the end of the day, =3 needs to stick to the "strict" part of the option name. :) Anything else allows for ambiguous bounds calculations, and ambiguity is what breeds bugs and security flaws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reverted 0ec3b972e and cef6387e5
7 participants