Skip to content

[Bounds Safety][NFC] Add SemaBoundsSafety class and move existing Sema checks there #98954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

delcypher
Copy link
Contributor

This patch creates a SemaBoundsSafety class to create a clean
separation between -fbounds-safety Sema checks and
other Sema checks. A SemaBoundsSafety object is available via the
Sema::BoundsSafety() method, similar to how other Sema checks have
been seperated out (e.g. SemaSwift).

The existing CheckCountedByAttrOnField function and related helper
functions and types from SemaDeclAttr.cpp has been moved into
SemaBoundsSafety. Although counted_by(_or_null) and
sized_by(_or_null) attributes have a meaning outside of
-fbounds-safety it seems reasonable to also have the Sema logic live
in SemaBoundsSafety since the intention is that the attributes will
have the same semantics (but not necessarily the same enforcement).

As -fbounds-safety is upstreamed additional Sema checks will be
added to the SemaBoundsSafety class.

rdar://131777237

…ema checks there

This patch creates a `SemaBoundsSafety` class to create a clean
separation between `-fbounds-safety` Sema checks and
other Sema checks. A `SemaBoundsSafety` object is available via the
`Sema::BoundsSafety()` method, similar to how other Sema checks have
been seperated out (e.g. `SemaSwift`).

The existing `CheckCountedByAttrOnField` function and related helper
functions and types from `SemaDeclAttr.cpp` has been moved into
`SemaBoundsSafety`. Although `counted_by(_or_null)` and
`sized_by(_or_null)` attributes have a meaning outside of
`-fbounds-safety` it seems reasonable to also have the Sema logic live
in `SemaBoundsSafety` since the intention is that the attributes will
have the same semantics (but not necessarily the same enforcement).

As `-fbounds-safety` is upstreamed additional Sema checks will be
added to the `SemaBoundsSafety` class.

rdar://131777237
@delcypher delcypher added the clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang label Jul 15, 2024
@delcypher delcypher self-assigned this Jul 15, 2024
@delcypher delcypher requested a review from Endilll as a code owner July 15, 2024 20:20
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2024

@llvm/pr-subscribers-clang

Author: Dan Liew (delcypher)

Changes

This patch creates a SemaBoundsSafety class to create a clean
separation between -fbounds-safety Sema checks and
other Sema checks. A SemaBoundsSafety object is available via the
Sema::BoundsSafety() method, similar to how other Sema checks have
been seperated out (e.g. SemaSwift).

The existing CheckCountedByAttrOnField function and related helper
functions and types from SemaDeclAttr.cpp has been moved into
SemaBoundsSafety. Although counted_by(_or_null) and
sized_by(_or_null) attributes have a meaning outside of
-fbounds-safety it seems reasonable to also have the Sema logic live
in SemaBoundsSafety since the intention is that the attributes will
have the same semantics (but not necessarily the same enforcement).

As -fbounds-safety is upstreamed additional Sema checks will be
added to the SemaBoundsSafety class.

rdar://131777237


Full diff: https://github.com/llvm/llvm-project/pull/98954.diff

6 Files Affected:

  • (modified) clang/include/clang/Sema/Sema.h (+7)
  • (added) clang/include/clang/Sema/SemaBoundsSafety.h (+42)
  • (modified) clang/lib/Sema/CMakeLists.txt (+1)
  • (modified) clang/lib/Sema/Sema.cpp (+2)
  • (added) clang/lib/Sema/SemaBoundsSafety.cpp (+198)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+3-176)
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 48dff1b76cc57..81e7fed9f3f4c 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -173,6 +173,7 @@ class QualType;
 class SemaAMDGPU;
 class SemaARM;
 class SemaAVR;
+class SemaBoundsSafety;
 class SemaBPF;
 class SemaCodeCompletion;
 class SemaCUDA;
@@ -1151,6 +1152,11 @@ class Sema final : public SemaBase {
     return *AVRPtr;
   }
 
+  SemaBoundsSafety &BoundsSafety() {
+    assert(BoundsSafetyPtr);
+    return *BoundsSafetyPtr;
+  }
+
   SemaBPF &BPF() {
     assert(BPFPtr);
     return *BPFPtr;
@@ -1294,6 +1300,7 @@ class Sema final : public SemaBase {
   std::unique_ptr<SemaAMDGPU> AMDGPUPtr;
   std::unique_ptr<SemaARM> ARMPtr;
   std::unique_ptr<SemaAVR> AVRPtr;
+  std::unique_ptr<SemaBoundsSafety> BoundsSafetyPtr;
   std::unique_ptr<SemaBPF> BPFPtr;
   std::unique_ptr<SemaCodeCompletion> CodeCompletionPtr;
   std::unique_ptr<SemaCUDA> CUDAPtr;
diff --git a/clang/include/clang/Sema/SemaBoundsSafety.h b/clang/include/clang/Sema/SemaBoundsSafety.h
new file mode 100644
index 0000000000000..22ac14807e66d
--- /dev/null
+++ b/clang/include/clang/Sema/SemaBoundsSafety.h
@@ -0,0 +1,42 @@
+//===---- SemaBoundsSafety.h - Bounds Safety specific routines-*- C++ -*---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file declares semantic analysis functions specific to `-fbounds-safety`
+/// (Bounds Safety) and also its attributes when used without `-fbounds-safety`
+/// (e.g. `counted_by`)
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_SEMA_SEMABOUNDSSAFETY_H
+#define LLVM_CLANG_SEMA_SEMABOUNDSSAFETY_H
+
+#include "clang/Sema/SemaBase.h"
+#include "llvm/ADT/SmallVector.h"
+
+namespace clang {
+class CountAttributedType;
+class Decl;
+class Expr;
+class FieldDecl;
+class NamedDecl;
+class ParsedAttr;
+class TypeCoupledDeclRefInfo;
+
+class SemaBoundsSafety : public SemaBase {
+public:
+  SemaBoundsSafety(Sema &S);
+
+  bool CheckCountedByAttrOnField(
+      FieldDecl *FD, Expr *E,
+      llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls, bool CountInBytes,
+      bool OrNull);
+};
+
+} // namespace clang
+
+#endif //  LLVM_CLANG_SEMA_SEMABOUNDSSAFETY_H
diff --git a/clang/lib/Sema/CMakeLists.txt b/clang/lib/Sema/CMakeLists.txt
index 5934c8c30daf9..2cee4f5ef6e99 100644
--- a/clang/lib/Sema/CMakeLists.txt
+++ b/clang/lib/Sema/CMakeLists.txt
@@ -36,6 +36,7 @@ add_clang_library(clangSema
   SemaAvailability.cpp
   SemaBPF.cpp
   SemaBase.cpp
+  SemaBoundsSafety.cpp
   SemaCXXScopeSpec.cpp
   SemaCast.cpp
   SemaChecking.cpp
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index d6228718d53ae..9f6e1a887e40d 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -45,6 +45,7 @@
 #include "clang/Sema/SemaARM.h"
 #include "clang/Sema/SemaAVR.h"
 #include "clang/Sema/SemaBPF.h"
+#include "clang/Sema/SemaBoundsSafety.h"
 #include "clang/Sema/SemaCUDA.h"
 #include "clang/Sema/SemaCodeCompletion.h"
 #include "clang/Sema/SemaConsumer.h"
@@ -224,6 +225,7 @@ Sema::Sema(Preprocessor &pp, ASTContext &ctxt, ASTConsumer &consumer,
       AMDGPUPtr(std::make_unique<SemaAMDGPU>(*this)),
       ARMPtr(std::make_unique<SemaARM>(*this)),
       AVRPtr(std::make_unique<SemaAVR>(*this)),
+      BoundsSafetyPtr(std::make_unique<SemaBoundsSafety>(*this)),
       BPFPtr(std::make_unique<SemaBPF>(*this)),
       CodeCompletionPtr(
           std::make_unique<SemaCodeCompletion>(*this, CodeCompleter)),
diff --git a/clang/lib/Sema/SemaBoundsSafety.cpp b/clang/lib/Sema/SemaBoundsSafety.cpp
new file mode 100644
index 0000000000000..a7d88cc5073c6
--- /dev/null
+++ b/clang/lib/Sema/SemaBoundsSafety.cpp
@@ -0,0 +1,198 @@
+//===---- SemaBoundsSafety.h - Bounds Safety specific routines-*- C++ -*---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file declares semantic analysis functions specific to `-fbounds-safety`
+/// (Bounds Safety) and also its attributes when used without `-fbounds-safety`
+/// (e.g. `counted_by`)
+///
+//===----------------------------------------------------------------------===//
+
+#include "clang/Sema/SemaBoundsSafety.h"
+#include "clang/Basic/DiagnosticSema.h"
+#include "clang/Lex/Lexer.h"
+#include "clang/Sema/Sema.h"
+
+namespace clang {
+SemaBoundsSafety::SemaBoundsSafety(Sema &S) : SemaBase(S) {}
+
+static CountAttributedType::DynamicCountPointerKind
+getCountAttrKind(bool CountInBytes, bool OrNull) {
+  if (CountInBytes)
+    return OrNull ? CountAttributedType::SizedByOrNull
+                  : CountAttributedType::SizedBy;
+  return OrNull ? CountAttributedType::CountedByOrNull
+                : CountAttributedType::CountedBy;
+}
+
+static const RecordDecl *GetEnclosingNamedOrTopAnonRecord(const FieldDecl *FD) {
+  const auto *RD = FD->getParent();
+  // An unnamed struct is anonymous struct only if it's not instantiated.
+  // However, the struct may not be fully processed yet to determine
+  // whether it's anonymous or not. In that case, this function treats it as
+  // an anonymous struct and tries to find a named parent.
+  while (RD && (RD->isAnonymousStructOrUnion() ||
+                (!RD->isCompleteDefinition() && RD->getName().empty()))) {
+    const auto *Parent = dyn_cast<RecordDecl>(RD->getParent());
+    if (!Parent)
+      break;
+    RD = Parent;
+  }
+  return RD;
+}
+
+enum class CountedByInvalidPointeeTypeKind {
+  INCOMPLETE,
+  SIZELESS,
+  FUNCTION,
+  FLEXIBLE_ARRAY_MEMBER,
+  VALID,
+};
+
+bool SemaBoundsSafety::CheckCountedByAttrOnField(
+    FieldDecl *FD, Expr *E,
+    llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls, bool CountInBytes,
+    bool OrNull) {
+  // Check the context the attribute is used in
+
+  unsigned Kind = getCountAttrKind(CountInBytes, OrNull);
+
+  if (FD->getParent()->isUnion()) {
+    Diag(FD->getBeginLoc(), diag::err_count_attr_in_union)
+        << Kind << FD->getSourceRange();
+    return true;
+  }
+
+  const auto FieldTy = FD->getType();
+  if (FieldTy->isArrayType() && (CountInBytes || OrNull)) {
+    Diag(FD->getBeginLoc(),
+         diag::err_count_attr_not_on_ptr_or_flexible_array_member)
+        << Kind << FD->getLocation() << /* suggest counted_by */ 1;
+    return true;
+  }
+  if (!FieldTy->isArrayType() && !FieldTy->isPointerType()) {
+    Diag(FD->getBeginLoc(),
+         diag::err_count_attr_not_on_ptr_or_flexible_array_member)
+        << Kind << FD->getLocation() << /* do not suggest counted_by */ 0;
+    return true;
+  }
+
+  LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
+      LangOptions::StrictFlexArraysLevelKind::IncompleteOnly;
+  if (FieldTy->isArrayType() &&
+      !Decl::isFlexibleArrayMemberLike(getASTContext(), FD, FieldTy,
+                                       StrictFlexArraysLevel, true)) {
+    Diag(FD->getBeginLoc(),
+         diag::err_counted_by_attr_on_array_not_flexible_array_member)
+        << Kind << FD->getLocation();
+    return true;
+  }
+
+  CountedByInvalidPointeeTypeKind InvalidTypeKind =
+      CountedByInvalidPointeeTypeKind::VALID;
+  QualType PointeeTy;
+  int SelectPtrOrArr = 0;
+  if (FieldTy->isPointerType()) {
+    PointeeTy = FieldTy->getPointeeType();
+    SelectPtrOrArr = 0;
+  } else {
+    assert(FieldTy->isArrayType());
+    const ArrayType *AT = getASTContext().getAsArrayType(FieldTy);
+    PointeeTy = AT->getElementType();
+    SelectPtrOrArr = 1;
+  }
+  // Note: The `Decl::isFlexibleArrayMemberLike` check earlier on means
+  // only `PointeeTy->isStructureTypeWithFlexibleArrayMember()` is reachable
+  // when `FieldTy->isArrayType()`.
+  bool ShouldWarn = false;
+  if (PointeeTy->isIncompleteType() && !CountInBytes) {
+    InvalidTypeKind = CountedByInvalidPointeeTypeKind::INCOMPLETE;
+  } else if (PointeeTy->isSizelessType()) {
+    InvalidTypeKind = CountedByInvalidPointeeTypeKind::SIZELESS;
+  } else if (PointeeTy->isFunctionType()) {
+    InvalidTypeKind = CountedByInvalidPointeeTypeKind::FUNCTION;
+  } else if (PointeeTy->isStructureTypeWithFlexibleArrayMember()) {
+    if (FieldTy->isArrayType()) {
+      // This is a workaround for the Linux kernel that has already adopted
+      // `counted_by` on a FAM where the pointee is a struct with a FAM. This
+      // should be an error because computing the bounds of the array cannot be
+      // done correctly without manually traversing every struct object in the
+      // array at runtime. To allow the code to be built this error is
+      // downgraded to a warning.
+      ShouldWarn = true;
+    }
+    InvalidTypeKind = CountedByInvalidPointeeTypeKind::FLEXIBLE_ARRAY_MEMBER;
+  }
+
+  if (InvalidTypeKind != CountedByInvalidPointeeTypeKind::VALID) {
+    unsigned DiagID = ShouldWarn
+                          ? diag::warn_counted_by_attr_elt_type_unknown_size
+                          : diag::err_counted_by_attr_pointee_unknown_size;
+    Diag(FD->getBeginLoc(), DiagID)
+        << SelectPtrOrArr << PointeeTy << (int)InvalidTypeKind
+        << (ShouldWarn ? 1 : 0) << Kind << FD->getSourceRange();
+    return true;
+  }
+
+  // Check the expression
+
+  if (!E->getType()->isIntegerType() || E->getType()->isBooleanType()) {
+    Diag(E->getBeginLoc(), diag::err_count_attr_argument_not_integer)
+        << Kind << E->getSourceRange();
+    return true;
+  }
+
+  auto *DRE = dyn_cast<DeclRefExpr>(E);
+  if (!DRE) {
+    Diag(E->getBeginLoc(),
+         diag::err_count_attr_only_support_simple_decl_reference)
+        << Kind << E->getSourceRange();
+    return true;
+  }
+
+  auto *CountDecl = DRE->getDecl();
+  FieldDecl *CountFD = dyn_cast<FieldDecl>(CountDecl);
+  if (auto *IFD = dyn_cast<IndirectFieldDecl>(CountDecl)) {
+    CountFD = IFD->getAnonField();
+  }
+  if (!CountFD) {
+    Diag(E->getBeginLoc(), diag::err_count_attr_must_be_in_structure)
+        << CountDecl << Kind << E->getSourceRange();
+
+    Diag(CountDecl->getBeginLoc(),
+         diag::note_flexible_array_counted_by_attr_field)
+        << CountDecl << CountDecl->getSourceRange();
+    return true;
+  }
+
+  if (FD->getParent() != CountFD->getParent()) {
+    if (CountFD->getParent()->isUnion()) {
+      Diag(CountFD->getBeginLoc(), diag::err_count_attr_refer_to_union)
+          << Kind << CountFD->getSourceRange();
+      return true;
+    }
+    // Whether CountRD is an anonymous struct is not determined at this
+    // point. Thus, an additional diagnostic in case it's not anonymous struct
+    // is done later in `Parser::ParseStructDeclaration`.
+    auto *RD = GetEnclosingNamedOrTopAnonRecord(FD);
+    auto *CountRD = GetEnclosingNamedOrTopAnonRecord(CountFD);
+
+    if (RD != CountRD) {
+      Diag(E->getBeginLoc(), diag::err_count_attr_param_not_in_same_struct)
+          << CountFD << Kind << FieldTy->isArrayType() << E->getSourceRange();
+      Diag(CountFD->getBeginLoc(),
+           diag::note_flexible_array_counted_by_attr_field)
+          << CountFD << CountFD->getSourceRange();
+      return true;
+    }
+  }
+
+  Decls.push_back(TypeCoupledDeclRefInfo(CountFD, /*IsDref*/ false));
+  return false;
+}
+
+} // namespace clang
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 20f46c003a464..b371a2009f2e0 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -45,6 +45,7 @@
 #include "clang/Sema/SemaARM.h"
 #include "clang/Sema/SemaAVR.h"
 #include "clang/Sema/SemaBPF.h"
+#include "clang/Sema/SemaBoundsSafety.h"
 #include "clang/Sema/SemaCUDA.h"
 #include "clang/Sema/SemaHLSL.h"
 #include "clang/Sema/SemaInternal.h"
@@ -5852,181 +5853,6 @@ static void handleZeroCallUsedRegsAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(ZeroCallUsedRegsAttr::Create(S.Context, Kind, AL));
 }
 
-static const RecordDecl *GetEnclosingNamedOrTopAnonRecord(const FieldDecl *FD) {
-  const auto *RD = FD->getParent();
-  // An unnamed struct is anonymous struct only if it's not instantiated.
-  // However, the struct may not be fully processed yet to determine
-  // whether it's anonymous or not. In that case, this function treats it as
-  // an anonymous struct and tries to find a named parent.
-  while (RD && (RD->isAnonymousStructOrUnion() ||
-                (!RD->isCompleteDefinition() && RD->getName().empty()))) {
-    const auto *Parent = dyn_cast<RecordDecl>(RD->getParent());
-    if (!Parent)
-      break;
-    RD = Parent;
-  }
-  return RD;
-}
-
-static CountAttributedType::DynamicCountPointerKind
-getCountAttrKind(bool CountInBytes, bool OrNull) {
-  if (CountInBytes)
-    return OrNull ? CountAttributedType::SizedByOrNull
-                  : CountAttributedType::SizedBy;
-  return OrNull ? CountAttributedType::CountedByOrNull
-                : CountAttributedType::CountedBy;
-}
-
-enum class CountedByInvalidPointeeTypeKind {
-  INCOMPLETE,
-  SIZELESS,
-  FUNCTION,
-  FLEXIBLE_ARRAY_MEMBER,
-  VALID,
-};
-
-static bool
-CheckCountedByAttrOnField(Sema &S, FieldDecl *FD, Expr *E,
-                          llvm::SmallVectorImpl<TypeCoupledDeclRefInfo> &Decls,
-                          bool CountInBytes, bool OrNull) {
-  // Check the context the attribute is used in
-
-  unsigned Kind = getCountAttrKind(CountInBytes, OrNull);
-
-  if (FD->getParent()->isUnion()) {
-    S.Diag(FD->getBeginLoc(), diag::err_count_attr_in_union)
-        << Kind << FD->getSourceRange();
-    return true;
-  }
-
-  const auto FieldTy = FD->getType();
-  if (FieldTy->isArrayType() && (CountInBytes || OrNull)) {
-    S.Diag(FD->getBeginLoc(),
-           diag::err_count_attr_not_on_ptr_or_flexible_array_member)
-        << Kind << FD->getLocation() << /* suggest counted_by */ 1;
-    return true;
-  }
-  if (!FieldTy->isArrayType() && !FieldTy->isPointerType()) {
-    S.Diag(FD->getBeginLoc(),
-           diag::err_count_attr_not_on_ptr_or_flexible_array_member)
-        << Kind << FD->getLocation() << /* do not suggest counted_by */ 0;
-    return true;
-  }
-
-  LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
-      LangOptions::StrictFlexArraysLevelKind::IncompleteOnly;
-  if (FieldTy->isArrayType() &&
-      !Decl::isFlexibleArrayMemberLike(S.getASTContext(), FD, FieldTy,
-                                       StrictFlexArraysLevel, true)) {
-    S.Diag(FD->getBeginLoc(),
-           diag::err_counted_by_attr_on_array_not_flexible_array_member)
-        << Kind << FD->getLocation();
-    return true;
-  }
-
-  CountedByInvalidPointeeTypeKind InvalidTypeKind =
-      CountedByInvalidPointeeTypeKind::VALID;
-  QualType PointeeTy;
-  int SelectPtrOrArr = 0;
-  if (FieldTy->isPointerType()) {
-    PointeeTy = FieldTy->getPointeeType();
-    SelectPtrOrArr = 0;
-  } else {
-    assert(FieldTy->isArrayType());
-    const ArrayType *AT = S.getASTContext().getAsArrayType(FieldTy);
-    PointeeTy = AT->getElementType();
-    SelectPtrOrArr = 1;
-  }
-  // Note: The `Decl::isFlexibleArrayMemberLike` check earlier on means
-  // only `PointeeTy->isStructureTypeWithFlexibleArrayMember()` is reachable
-  // when `FieldTy->isArrayType()`.
-  bool ShouldWarn = false;
-  if (PointeeTy->isIncompleteType() && !CountInBytes) {
-    InvalidTypeKind = CountedByInvalidPointeeTypeKind::INCOMPLETE;
-  } else if (PointeeTy->isSizelessType()) {
-    InvalidTypeKind = CountedByInvalidPointeeTypeKind::SIZELESS;
-  } else if (PointeeTy->isFunctionType()) {
-    InvalidTypeKind = CountedByInvalidPointeeTypeKind::FUNCTION;
-  } else if (PointeeTy->isStructureTypeWithFlexibleArrayMember()) {
-    if (FieldTy->isArrayType()) {
-      // This is a workaround for the Linux kernel that has already adopted
-      // `counted_by` on a FAM where the pointee is a struct with a FAM. This
-      // should be an error because computing the bounds of the array cannot be
-      // done correctly without manually traversing every struct object in the
-      // array at runtime. To allow the code to be built this error is
-      // downgraded to a warning.
-      ShouldWarn = true;
-    }
-    InvalidTypeKind = CountedByInvalidPointeeTypeKind::FLEXIBLE_ARRAY_MEMBER;
-  }
-
-  if (InvalidTypeKind != CountedByInvalidPointeeTypeKind::VALID) {
-    unsigned DiagID = ShouldWarn
-                          ? diag::warn_counted_by_attr_elt_type_unknown_size
-                          : diag::err_counted_by_attr_pointee_unknown_size;
-    S.Diag(FD->getBeginLoc(), DiagID)
-        << SelectPtrOrArr << PointeeTy << (int)InvalidTypeKind
-        << (ShouldWarn ? 1 : 0) << Kind << FD->getSourceRange();
-    return true;
-  }
-
-  // Check the expression
-
-  if (!E->getType()->isIntegerType() || E->getType()->isBooleanType()) {
-    S.Diag(E->getBeginLoc(), diag::err_count_attr_argument_not_integer)
-        << Kind << E->getSourceRange();
-    return true;
-  }
-
-  auto *DRE = dyn_cast<DeclRefExpr>(E);
-  if (!DRE) {
-    S.Diag(E->getBeginLoc(),
-           diag::err_count_attr_only_support_simple_decl_reference)
-        << Kind << E->getSourceRange();
-    return true;
-  }
-
-  auto *CountDecl = DRE->getDecl();
-  FieldDecl *CountFD = dyn_cast<FieldDecl>(CountDecl);
-  if (auto *IFD = dyn_cast<IndirectFieldDecl>(CountDecl)) {
-    CountFD = IFD->getAnonField();
-  }
-  if (!CountFD) {
-    S.Diag(E->getBeginLoc(), diag::err_count_attr_must_be_in_structure)
-        << CountDecl << Kind << E->getSourceRange();
-
-    S.Diag(CountDecl->getBeginLoc(),
-           diag::note_flexible_array_counted_by_attr_field)
-        << CountDecl << CountDecl->getSourceRange();
-    return true;
-  }
-
-  if (FD->getParent() != CountFD->getParent()) {
-    if (CountFD->getParent()->isUnion()) {
-      S.Diag(CountFD->getBeginLoc(), diag::err_count_attr_refer_to_union)
-          << Kind << CountFD->getSourceRange();
-      return true;
-    }
-    // Whether CountRD is an anonymous struct is not determined at this
-    // point. Thus, an additional diagnostic in case it's not anonymous struct
-    // is done later in `Parser::ParseStructDeclaration`.
-    auto *RD = GetEnclosingNamedOrTopAnonRecord(FD);
-    auto *CountRD = GetEnclosingNamedOrTopAnonRecord(CountFD);
-
-    if (RD != CountRD) {
-      S.Diag(E->getBeginLoc(), diag::err_count_attr_param_not_in_same_struct)
-          << CountFD << Kind << FieldTy->isArrayType() << E->getSourceRange();
-      S.Diag(CountFD->getBeginLoc(),
-             diag::note_flexible_array_counted_by_attr_field)
-          << CountFD << CountFD->getSourceRange();
-      return true;
-    }
-  }
-
-  Decls.push_back(TypeCoupledDeclRefInfo(CountFD, /*IsDref*/ false));
-  return false;
-}
-
 static void handleCountedByAttrField(Sema &S, Decl *D, const ParsedAttr &AL) {
   auto *FD = dyn_cast<FieldDecl>(D);
   assert(FD);
@@ -6059,7 +5885,8 @@ static void handleCountedByAttrField(Sema &S, Decl *D, const ParsedAttr &AL) {
   }
 
   llvm::SmallVector<TypeCoupledDeclRefInfo, 1> Decls;
-  if (CheckCountedByAttrOnField(S, FD, CountExpr, Decls, CountInBytes, OrNull))
+  if (S.BoundsSafety().CheckCountedByAttrOnField(FD, CountExpr, Decls,
+                                                 CountInBytes, OrNull))
     return;
 
   QualType CAT = S.BuildCountAttributedArrayOrPointerType(

@rapidsna rapidsna requested a review from devincoughlin July 15, 2024 20:28
Copy link
Member

@hnrklssn hnrklssn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, might want to give the rest some time to have a look though

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think this should be held off until we have a bigger picture for the future of Sema. I'll elaborate below.

If you take a close look at the existing set of Sema parts, it's almost entirely comprised of other languages or language extensions (from C and C++ standpoint) and backends. They were considered natural enough to be split off Sema, which, among other things, means that it's easy to explain their place in the big picture, like I did in the previous sentence.

I drove the effort to split Sema, and it has stalled, because it's not clear what to do with what's left there. We had several offline discussions, but we didn't find a solution that would substantially improve status quo.

I don't think this patch makes it easier to explain people where things are in this codebase, because (I think) counted_by is a declaration attribute, and we already have place for them. Having only one function also makes it a very small part of Sema, which doesn't sound compelling. A bigger picture this fits in would be a compelling argument (like it was for small backend parts), but to my knowledge there's none. Coming up with one and getting maintainers on board is certainly out of scope of this PR, so don't feel obligated to do that. Hopefully I'll get back to this whole topic later.

CC @AaronBallman

#include "llvm/ADT/SmallVector.h"

namespace clang {
class CountAttributedType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only half of those forward declarations are really needed in the header.

@hnrklssn
Copy link
Member

If you take a close look at the existing set of Sema parts, it's almost entirely comprised of other languages or language extensions (from C and C++ standpoint) and backends. They were considered natural enough to be split off Sema, which, among other things, means that it's easy to explain their place in the big picture, like I did in the previous sentence.

I don't have a strong opinion on whether this should be split up, but I just wanted to point out that -fbounds-safety is also a language extension, so it does kind of fit your description of other things separated out. It's currently in the process of being upstreamed which is why it's small, but it'd be easier to split now than wait until it's reached a certain size, if we do want to split it eventually.

@Endilll
Copy link
Contributor

Endilll commented Jul 15, 2024

If you take a close look at the existing set of Sema parts, it's almost entirely comprised of other languages or language extensions (from C and C++ standpoint) and backends. They were considered natural enough to be split off Sema, which, among other things, means that it's easy to explain their place in the big picture, like I did in the previous sentence.

I don't have a strong opinion on whether this should be split up, but I just wanted to point out that -fbounds-safety is also a language extension, so it does kind of fit your description of other things separated out.

"Extension" is definitely quite broad. What I meant are (basically) languages that are based off C or C++. Would you argue that -fbounds-safety fits into a set of OpenMP, OpenACC, CUDA, HLSL, Objective-C, and Swift?

It's currently in the process of being upstreamed which is why it's small, but it'd be easier to split now than wait until it's reached a certain size, if we do want to split it eventually.

I did that a lot with other parts of Sema, and it's not that hard, unless it grows comparable to SemaExprCXX.cpp (which I consider unlikely to happen). I'd rather see this being upstreamed into the existing Sema structure as it has been planned all along, and use it as an input for the design of further Sema splitting, rather than committing today that SemaBoundsSafety is going to be a thing.

@delcypher
Copy link
Contributor Author

delcypher commented Jul 16, 2024

Unfortunately, I think this should be held off until we have a bigger picture for the future of Sema. I'll elaborate below.

If you take a close look at the existing set of Sema parts, it's almost entirely comprised of other languages or language extensions (from C and C++ standpoint) and backends. They were considered natural enough to be split off Sema, which, among other things, means that it's easy to explain their place in the big picture, like I did in the previous sentence.

As noted above -fbounds-safety is a C language extension which makes it seem like it would fit nicely into the existing division of Sema into multiple objects and relevant source files.

I don't think this patch makes it easier to explain people where things are in this codebase, because (I think) counted_by is a declaration attribute, and we already have place for them. Having only one function also makes it a very small part of Sema, which doesn't sound compelling.

It doesn't right now because most of -fbounds-safety implementation isn't upstream yet. I simply moved what is currently upstream. As we (Apple) upstream more and more of the implementation it makes a lot sense to try to keep as much of it in its own source file to:

  • Create a clean separation between -fbounds-safety Sema checks and everything else. This will ultimately make exploring the code easier once a significant amount of our implementation has been upstreamed.
  • Avoid growing all the existing Sema*.cpp files unnecessarily. They are already huge and we don't want to make the problem worse when there's an easy option available to avoid that.

A bigger picture this fits in would be a compelling argument (like it was for small backend parts), but to my knowledge there's none.

The counted_by attribute is just one of several attributes that are part of the -fbounds-safety language extension which is documented here https://clang.llvm.org/docs/BoundsSafety.html. That's "bigger picture" for the feature, or did you mean something else by "bigger picture this fits in"?

Coming up with one and getting maintainers on board is certainly out of scope of this PR, so don't feel obligated to do that. Hopefully I'll get back to this whole topic later.

My end goal is to have "somewhere" to put all of the Sema code that supports -fbounds-safety that we will upstream. I think it's very much preferable to have this code go into its own source file and header file for the reasons I gave above.

We don't have to use the sub-class SemaBase mechanism that I've used here if this is really a problem. However, the mechanism does seem like a good fit for -fbounds-safety and would be a shame not to use it.

@delcypher
Copy link
Contributor Author

delcypher commented Jul 16, 2024

"Extension" is definitely quite broad. What I meant are (basically) languages that are based off C or C++. Would you argue that -fbounds-safety fits into a set of OpenMP, OpenACC, CUDA, HLSL, Objective-C, and Swift?

In my opinion it fits in the set because it is a (pretty large) C language extension.

It's currently in the process of being upstreamed which is why it's small, but it'd be easier to split now than wait until it's reached a certain size, if we do want to split it eventually.

I did that a lot with other parts of Sema, and it's not that hard, unless it grows comparable to SemaExprCXX.cpp (which I consider unlikely to happen). I'd rather see this being upstreamed into the existing Sema structure as it has been planned all along, and use it as an input for the design of further Sema splitting, rather than committing today that SemaBoundsSafety is going to be a thing.

I don't agree with this approach. I outlined above why I think it makes a lot of sense to keep the -fbounds-safety Sema code in its own source file and blocking doing that on a potential future refactor that might never happen doesn't seem like the right approach to me.

If at some point we come up with some new design for Sema we can easily move the -fbounds-safety code out of its own source file (and class) and into the relevant locations required by the redesign. If the -fbounds-safety code is littered all over the other Sema files the potential future refactor could be much harder because finding all -fbound-safety code now is much more time consuming.

@Endilll
Copy link
Contributor

Endilll commented Jul 16, 2024

Unfortunately, I think this should be held off until we have a bigger picture for the future of Sema. I'll elaborate below.
If you take a close look at the existing set of Sema parts, it's almost entirely comprised of other languages or language extensions (from C and C++ standpoint) and backends. They were considered natural enough to be split off Sema, which, among other things, means that it's easy to explain their place in the big picture, like I did in the previous sentence.

As noted above -fbounds-safety is a C language extension which makes it seem like it would fit nicely into the existing division of Sema into multiple objects and relevant source files.

No, it doesn't fit nicely into the division, which is the reason we're having this discussion.

I don't think this patch makes it easier to explain people where things are in this codebase, because (I think) counted_by is a declaration attribute, and we already have place for them. Having only one function also makes it a very small part of Sema, which doesn't sound compelling.

It doesn't right now because most of -fbounds-safety implementation isn't upstream yet. I simply moved what is currently upstream. As we (Apple) upstream more and more of the implementation it makes a lot sense to try to keep as much of it in its own source file to:

* Create a clean separation between `-fbounds-safety` Sema checks and everything else. This will ultimately make exploring the code easier once a significant amount of our implementation has been upstreamed.

* Avoid growing all the existing `Sema*.cpp` files unnecessarily. They are already huge and we don't want to make the problem worse when there's an easy option available to avoid that.

You can have SemaBoundsSafety.cpp and your own "section" inside Sema.h, like C++ features do (like templates or lambdas). There's no reason to make separation physical by introducing SemaBoundsSafety class from the get-go.

A bigger picture this fits in would be a compelling argument (like it was for small backend parts), but to my knowledge there's none.

The counted_by attribute is just one of several attributes that are part of the -fbounds-safety language extension which is documented here https://clang.llvm.org/docs/BoundsSafety.html. That's "bigger picture" for the feature, or did you mean something else by "bigger picture this fits in"?

A bigger picture is "what are we going to do with the rest of the attributes in SemaDeclAttr.cpp?". Even bigger one is "what do we do about SemaDecl, SemaExpr, SemaDeclCXX, and SemaExprCXX?"

Coming up with one and getting maintainers on board is certainly out of scope of this PR, so don't feel obligated to do that. Hopefully I'll get back to this whole topic later.

My end goal is to have "somewhere" to put all of the Sema code that supports -fbounds-safety that we will upstream. I think it's very much preferable to have this code go into its own source file and header file for the reasons I gave above.

We don't have to use the sub-class SemaBase mechanism that I've used here if this is really a problem. However, the mechanism does seem like a good fit for -fbounds-safety and would be a shame not to use it.

"Extension" is definitely quite broad. What I meant are (basically) languages that are based off C or C++. Would you argue that -fbounds-safety fits into a set of OpenMP, OpenACC, CUDA, HLSL, Objective-C, and Swift?

In my opinion it fits in the set because it is a (pretty large) C language extension.

It's currently in the process of being upstreamed which is why it's small, but it'd be easier to split now than wait until it's reached a certain size, if we do want to split it eventually.

I did that a lot with other parts of Sema, and it's not that hard, unless it grows comparable to SemaExprCXX.cpp (which I consider unlikely to happen). I'd rather see this being upstreamed into the existing Sema structure as it has been planned all along, and use it as an input for the design of further Sema splitting, rather than committing today that SemaBoundsSafety is going to be a thing.

I don't agree with this approach. I outlined above why I think it makes a lot of sense to keep the -fbounds-safety Sema code in its own source file and blocking doing that on a potential future refactor that might never happen doesn't seem like the right approach to me.

I have to say that -fbounds-safety upstreaming is in the same boat of something that might never happen.

If at some point we come up with some new design for Sema we can easily move the -fbounds-safety code out of its own source file (and class) and into the relevant locations required by the redesign. If the -fbounds-safety code is littered all over the other Sema files the potential future refactor could be much harder because finding all -fbound-safety code now is much more time consuming.

That's why I propose to follow long-established practice of doing SemaBoundsSafety.cpp, and move that around later. What I'd like to evaluate before deciding on SemaBoundsChecking is how big its interface is (what would be exposed via SemaBoundsChecking class,)

@AaronBallman
Copy link
Collaborator

The separations we've been making so far in Sema have been at a higher level of granularity than this proposal. Vlad was calling it "language extensions" but perhaps a different way to phrase it would be "unique language dialects". e.g., Objective-C is its own language, but it's technically an extension of C. Similar for things like OpenMP, etc. I don't think bounds safety is the same kind of extension in that regard; it's a handful of features allowing for extra diagnostic checking more than it's a unique language. For example, thread safety analysis can be described in exactly the same way, so should it be split off too? If we keep doing that, will splitting semantic object by language feature scale? What's the criteria for when a language feature should or should not be split? I think we can side step all of this for right now.

That said, I think grouping the bounds safety semantic bits together in a single source file is a totally reasonable thing to do (Vlad's suggestion of SemaBoundsSafety.cpp was along the exact lines I was thinking this should go).

@delcypher
Copy link
Contributor Author

As noted above -fbounds-safety is a C language extension which makes it seem like it would fit nicely into the existing division of Sema into multiple objects and relevant source files.

No, it doesn't fit nicely into the division, which is the reason we're having this discussion.

If we don't agree on this then I must not fully understand what the criteria is for dividing Sema current is.

You can have SemaBoundsSafety.cpp and your own "section" inside Sema.h, like C++ features do (like templates or lambdas). There's no reason to make separation physical by introducing SemaBoundsSafety class from the get-go.

I'm ok with this. Having the implementation in a separate file is where the main benefit lies. The separation into a separation into a separate class is a nice-to-have and ok to drop doing that.

That's why I propose to follow long-established practice of doing SemaBoundsSafety.cpp, and move that around later. What I'd like to evaluate before deciding on SemaBoundsChecking is how big its interface is (what would be exposed via SemaBoundsChecking class,)

Sure. Let's go with that approach then.

@AaronBallman
Copy link
Collaborator

As noted above -fbounds-safety is a C language extension which makes it seem like it would fit nicely into the existing division of Sema into multiple objects and relevant source files.

No, it doesn't fit nicely into the division, which is the reason we're having this discussion.

If we don't agree on this then I must not fully understand what the criteria is for dividing Sema current is.

The goal is to provide some layering within Sema to help break it up more. e.g., the base layer is C and C++ needs, but then there's a layer for Objective-C needs, a different layer for OpenMP needs, etc. This helps make it more clear where dependencies lie between the large-scale different semantic components. Bounds safety isn't really a "layer". Does that make some sense?

You can have SemaBoundsSafety.cpp and your own "section" inside Sema.h, like C++ features do (like templates or lambdas). There's no reason to make separation physical by introducing SemaBoundsSafety class from the get-go.

I'm ok with this. Having the implementation in a separate file is where the main benefit lies. The separation into a separation into a separate class is a nice-to-have and ok to drop doing that.

That's why I propose to follow long-established practice of doing SemaBoundsSafety.cpp, and move that around later. What I'd like to evaluate before deciding on SemaBoundsChecking is how big its interface is (what would be exposed via SemaBoundsChecking class,)

Sure. Let's go with that approach then.

Excellent, thank you!

@delcypher
Copy link
Contributor Author

The separations we've been making so far in Sema have been at a higher level of granularity than this proposal. Vlad was calling it "language extensions" but perhaps a different way to phrase it would be "unique language dialects". e.g., Objective-C is its own language, but it's technically an extension of C. Similar for things like OpenMP, etc. I don't think bounds safety is the same kind of extension in that regard; it's a handful of features allowing for extra diagnostic checking more than it's a unique language.

Thanks for clarifying. I would describe-fbounds-safety as a lot more than extra diagnostic checking, it also injects runtime checks and its programming model adds restrictions that make C feel quite different. I'm not really sure that counts as a dialect but I guess it doesn't matter now given that I'll be going with a different approach.

Out of curiosity, how does the current division of Sema fit into the "unique language dialect" classification? I've noticed there are a bunch of architecture specific Sema classes (e.g. SemaRISCV, SemaX86) and those don't really fit the classification.

For example, thread safety analysis can be described in exactly the same way, so should it be split off too? If we keep doing that, will splitting semantic object by language feature scale? What's the criteria for when a language feature should or should not be split? I think we can side step all of this for right now.

Interesting question. I guess it depends on the size of the language feature and given that almost none of -fbounds-safety is currently upstream it is very difficult for you to make an informed decision in this particular case. So, yes let's side step this for now. We can always revisit it once a significant portion of -fbounds-safety is upstream.

The goal is to provide some layering within Sema to help break it up more. e.g., the base layer is C and C++ needs, but then there's a layer for Objective-C needs, a different layer for OpenMP needs, etc. This helps make it more clear where dependencies lie between the large-scale different semantic components. Bounds safety isn't really a "layer". Does that make some sense?

Mostly, but I have some doubts about the "Bounds safety isn't really a layer" part but let's delay dealing with that until more of the implementation is upstream.

@Endilll
Copy link
Contributor

Endilll commented Jul 17, 2024

Out of curiosity, how does the current division of Sema fit into the "unique language dialect" classification? I've noticed there are a bunch of architecture specific Sema classes (e.g. SemaRISCV, SemaX86) and those don't really fit the classification.

They are a different group of Sema parts, dedicated for backend-specific code. SemaX86 and SemaARM serve as a good example of what a populated backend-specific part looks like.

@AaronBallman
Copy link
Collaborator

Out of curiosity, how does the current division of Sema fit into the "unique language dialect" classification? I've noticed there are a bunch of architecture specific Sema classes (e.g. SemaRISCV, SemaX86) and those don't really fit the classification.

They are a different group of Sema parts, dedicated for backend-specific code. SemaX86 and SemaARM serve as a good example of what a populated backend-specific part looks like.

Yup, and they're an example of where it helps to have layering (if parts of SemaARM need to call into SemaX86, it's good for that dependency to be more explicit).

@delcypher
Copy link
Contributor Author

Thanks all for the clarification. Closing this PR in favor of #99330

@delcypher delcypher closed this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants