Skip to content

[Clang] Verify data layout consistency #144720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 1, 2025
Merged

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jun 18, 2025

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.

@nikic nikic requested review from rjmccall and efriedma-quic June 18, 2025 14:59
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-clang-codegen

Author: Nikita Popov (nikic)

Changes

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.


Full diff: https://github.com/llvm/llvm-project/pull/144720.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+73)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c27168e4c4bfe..aabc872e22df1 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -332,6 +332,76 @@ const TargetCodeGenInfo &CodeGenModule::getTargetCodeGenInfo() {
   return *TheTargetCodeGenInfo;
 }
 
+static void checkDataLayoutConsistency(const TargetInfo &Target,
+                                       llvm::LLVMContext &Context,
+                                       const LangOptions &Opts) {
+#ifndef NDEBUG
+  // Don't verify non-standard ABI configurations.
+  if (Opts.AlignDouble || Opts.OpenCL || Opts.HLSL)
+    return;
+
+  llvm::Triple Triple = Target.getTriple();
+  llvm::DataLayout DL(Target.getDataLayoutString());
+  auto Check = [&](const char *Name, llvm::Type *Ty, unsigned Alignment) {
+    llvm::Align DLAlign = DL.getABITypeAlign(Ty);
+    llvm::Align ClangAlign(Alignment / 8);
+    if (DLAlign != ClangAlign) {
+      llvm::errs() << "For target " << Triple.str() << " type " << Name
+                   << " mapping to " << *Ty << " has data layout alignment "
+                   << DLAlign.value() << " while clang specifies "
+                   << ClangAlign.value() << "\n";
+      abort();
+    }
+  };
+
+  Check("bool", llvm::Type::getIntNTy(Context, Target.BoolWidth),
+        Target.BoolAlign);
+  Check("short", llvm::Type::getIntNTy(Context, Target.ShortWidth),
+        Target.ShortAlign);
+  // FIXME: M68k specifies incorrect wrong int and long alignments in Clang
+  // and incorrect long long alignment in both LLVM and Clang.
+  if (Triple.getArch() != llvm::Triple::m68k) {
+    Check("int", llvm::Type::getIntNTy(Context, Target.IntWidth),
+          Target.IntAlign);
+    Check("long", llvm::Type::getIntNTy(Context, Target.LongWidth),
+          Target.LongAlign);
+    Check("long long", llvm::Type::getIntNTy(Context, Target.LongLongWidth),
+          Target.LongLongAlign);
+  }
+  // FIXME: There are int128 alignment mismatches on multiple targets.
+  if (Target.hasInt128Type() && !Target.getTargetOpts().ForceEnableInt128 &&
+      !Triple.isAMDGPU() && !Triple.isSPIRV() &&
+      Triple.getArch() != llvm::Triple::ve)
+    Check("__int128", llvm::Type::getIntNTy(Context, 128), Target.Int128Align);
+
+  if (Target.hasFloat16Type())
+    Check("half", llvm::Type::getFloatingPointTy(Context, *Target.HalfFormat),
+          Target.HalfAlign);
+  if (Target.hasBFloat16Type())
+    Check("bfloat", llvm::Type::getBFloatTy(Context), Target.BFloat16Align);
+  Check("float", llvm::Type::getFloatingPointTy(Context, *Target.FloatFormat),
+        Target.FloatAlign);
+  // FIXME: AIX specifies wrong double alignment in DataLayout
+  if (!Triple.isOSAIX()) {
+    Check("double",
+          llvm::Type::getFloatingPointTy(Context, *Target.DoubleFormat),
+          Target.DoubleAlign);
+    Check("long double",
+          llvm::Type::getFloatingPointTy(Context, *Target.LongDoubleFormat),
+          Target.LongDoubleAlign);
+  }
+  // FIXME: Wasm has a mismatch in f128 alignment between Clang and LLVM.
+  if (Target.hasFloat128Type() && !Triple.isWasm())
+    Check("__float128", llvm::Type::getFP128Ty(Context), Target.Float128Align);
+  if (Target.hasIbm128Type())
+    Check("__ibm128", llvm::Type::getPPC_FP128Ty(Context), Target.Ibm128Align);
+
+  // FIXME: Clang specifies incorrect pointer alignment for m68k.
+  if (Triple.getArch() != llvm::Triple::m68k)
+    Check("void*", llvm::PointerType::getUnqual(Context), Target.PointerAlign);
+#endif
+}
+
 CodeGenModule::CodeGenModule(ASTContext &C,
                              IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS,
                              const HeaderSearchOptions &HSO,
@@ -458,6 +528,9 @@ CodeGenModule::CodeGenModule(ASTContext &C,
   if (Context.getTargetInfo().getTriple().getArch() == llvm::Triple::x86)
     getModule().addModuleFlag(llvm::Module::Error, "NumRegisterParameters",
                               CodeGenOpts.NumRegisterParameters);
+
+  if (!Context.getAuxTargetInfo())
+    checkDataLayoutConsistency(Context.getTargetInfo(), LLVMContext, LangOpts);
 }
 
 CodeGenModule::~CodeGenModule() {}

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

Verify that the alignments specified by clang TargetInfo match the alignments specified by LLVM data layout, which will hopefully prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like -malign-double are used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency in Basic.


Full diff: https://github.com/llvm/llvm-project/pull/144720.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+73)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c27168e4c4bfe..aabc872e22df1 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -332,6 +332,76 @@ const TargetCodeGenInfo &CodeGenModule::getTargetCodeGenInfo() {
   return *TheTargetCodeGenInfo;
 }
 
+static void checkDataLayoutConsistency(const TargetInfo &Target,
+                                       llvm::LLVMContext &Context,
+                                       const LangOptions &Opts) {
+#ifndef NDEBUG
+  // Don't verify non-standard ABI configurations.
+  if (Opts.AlignDouble || Opts.OpenCL || Opts.HLSL)
+    return;
+
+  llvm::Triple Triple = Target.getTriple();
+  llvm::DataLayout DL(Target.getDataLayoutString());
+  auto Check = [&](const char *Name, llvm::Type *Ty, unsigned Alignment) {
+    llvm::Align DLAlign = DL.getABITypeAlign(Ty);
+    llvm::Align ClangAlign(Alignment / 8);
+    if (DLAlign != ClangAlign) {
+      llvm::errs() << "For target " << Triple.str() << " type " << Name
+                   << " mapping to " << *Ty << " has data layout alignment "
+                   << DLAlign.value() << " while clang specifies "
+                   << ClangAlign.value() << "\n";
+      abort();
+    }
+  };
+
+  Check("bool", llvm::Type::getIntNTy(Context, Target.BoolWidth),
+        Target.BoolAlign);
+  Check("short", llvm::Type::getIntNTy(Context, Target.ShortWidth),
+        Target.ShortAlign);
+  // FIXME: M68k specifies incorrect wrong int and long alignments in Clang
+  // and incorrect long long alignment in both LLVM and Clang.
+  if (Triple.getArch() != llvm::Triple::m68k) {
+    Check("int", llvm::Type::getIntNTy(Context, Target.IntWidth),
+          Target.IntAlign);
+    Check("long", llvm::Type::getIntNTy(Context, Target.LongWidth),
+          Target.LongAlign);
+    Check("long long", llvm::Type::getIntNTy(Context, Target.LongLongWidth),
+          Target.LongLongAlign);
+  }
+  // FIXME: There are int128 alignment mismatches on multiple targets.
+  if (Target.hasInt128Type() && !Target.getTargetOpts().ForceEnableInt128 &&
+      !Triple.isAMDGPU() && !Triple.isSPIRV() &&
+      Triple.getArch() != llvm::Triple::ve)
+    Check("__int128", llvm::Type::getIntNTy(Context, 128), Target.Int128Align);
+
+  if (Target.hasFloat16Type())
+    Check("half", llvm::Type::getFloatingPointTy(Context, *Target.HalfFormat),
+          Target.HalfAlign);
+  if (Target.hasBFloat16Type())
+    Check("bfloat", llvm::Type::getBFloatTy(Context), Target.BFloat16Align);
+  Check("float", llvm::Type::getFloatingPointTy(Context, *Target.FloatFormat),
+        Target.FloatAlign);
+  // FIXME: AIX specifies wrong double alignment in DataLayout
+  if (!Triple.isOSAIX()) {
+    Check("double",
+          llvm::Type::getFloatingPointTy(Context, *Target.DoubleFormat),
+          Target.DoubleAlign);
+    Check("long double",
+          llvm::Type::getFloatingPointTy(Context, *Target.LongDoubleFormat),
+          Target.LongDoubleAlign);
+  }
+  // FIXME: Wasm has a mismatch in f128 alignment between Clang and LLVM.
+  if (Target.hasFloat128Type() && !Triple.isWasm())
+    Check("__float128", llvm::Type::getFP128Ty(Context), Target.Float128Align);
+  if (Target.hasIbm128Type())
+    Check("__ibm128", llvm::Type::getPPC_FP128Ty(Context), Target.Ibm128Align);
+
+  // FIXME: Clang specifies incorrect pointer alignment for m68k.
+  if (Triple.getArch() != llvm::Triple::m68k)
+    Check("void*", llvm::PointerType::getUnqual(Context), Target.PointerAlign);
+#endif
+}
+
 CodeGenModule::CodeGenModule(ASTContext &C,
                              IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS,
                              const HeaderSearchOptions &HSO,
@@ -458,6 +528,9 @@ CodeGenModule::CodeGenModule(ASTContext &C,
   if (Context.getTargetInfo().getTriple().getArch() == llvm::Triple::x86)
     getModule().addModuleFlag(llvm::Module::Error, "NumRegisterParameters",
                               CodeGenOpts.NumRegisterParameters);
+
+  if (!Context.getAuxTargetInfo())
+    checkDataLayoutConsistency(Context.getTargetInfo(), LLVMContext, LangOpts);
 }
 
 CodeGenModule::~CodeGenModule() {}

@rjmccall
Copy link
Contributor

So, devil's advocate: do we actually care about IR data layout alignments? Clang should be putting explicit alignments on everything in the IR. And, I mean, I can certainly imagine that a target might end up in an impossible situation for terrible historical reasons, because e.g. it has an int128_t with 8-byte alignment but an __int128_t with 16-byte alignment.

@nikic
Copy link
Contributor Author

nikic commented Jun 19, 2025

@rjmccall From a Clang perspective, largely no -- but not entirely. There are cases when LLVM materializes allocations based on the data layout in ways that are observable and not controllable by Clang. One of these is certain inline asm constraints (which can end up passing constant pool addresses that don't satisfy alignment requirements) and another is certain edge cases around libcall lowering (libcalls that result in sret demotion, which is why clang used to miscompile f128 libcalls).

From an LLVM perspective, we do care because non-Clang frontends use LLVM's data layout as their source of truth. For most targets, the information there is sufficient -- as long as it's actually correct. It seems that sometimes Clang developers also expect that the data layout is the source of truth, which is how we end up with situations like m68k forgetting to specify Clang-side alignments entirely (#144740).

@rjmccall
Copy link
Contributor

Okay. I'm not really opposed to trying to make this accurate, even if it were purely on general principle. I have to say that things like inline asm constraints and libcall lowering sound like gaps that we should be fixing to allow an explicit alignment, though.

@nikic nikic force-pushed the clang-data-layout-verify branch from c4082c4 to 4a70541 Compare June 27, 2025 15:36
@nikic
Copy link
Contributor Author

nikic commented Jun 27, 2025

Ping :)

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I see you opened a PR for m68k and aix; did you open a PR or issue report for wasm?

nikic added a commit that referenced this pull request Jul 1, 2025
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.

The long long alignment is also incorrect, but that would require a
change to the data layout as well.

Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.

This was found by #144720.
nikic added 2 commits July 1, 2025 09:07
Verify that the alignments specified by clang TargetInfo match
the alignments specified by LLVM data layout, which will hopefully
prevent accidental mismatches in the future.

This currently contains opt-outs for a lot of existing mismatches.

I'm also skipping the verification if options like `-malign-double`
are used, or a language that mandates sizes/alignments that differ
from C.
@nikic nikic force-pushed the clang-data-layout-verify branch from 4a70541 to e0b1d80 Compare July 1, 2025 07:17
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 1, 2025
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.

The long long alignment is also incorrect, but that would require a
change to the data layout as well.

Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.

This was found by llvm/llvm-project#144720.
@nikic nikic merged commit 5fa4eb1 into llvm:main Jul 1, 2025
7 checks passed
@nikic nikic deleted the clang-data-layout-verify branch July 1, 2025 08:43
@nikic
Copy link
Contributor Author

nikic commented Jul 1, 2025

I've put up #146494 for the wasm f128 alignment.

rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.

The long long alignment is also incorrect, but that would require a
change to the data layout as well.

Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.

This was found by llvm#144720.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Verify that the alignments specified by clang TargetInfo match the
alignments specified by LLVM data layout, which will hopefully prevent
accidental mismatches in the future.

This currently contains opt-outs for a number of of existing mismatches.

I'm also skipping the verification if options like `-malign-double` are
used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency
in Basic.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.

The long long alignment is also incorrect, but that would require a
change to the data layout as well.

Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.

This was found by llvm#144720.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
Verify that the alignments specified by clang TargetInfo match the
alignments specified by LLVM data layout, which will hopefully prevent
accidental mismatches in the future.

This currently contains opt-outs for a number of of existing mismatches.

I'm also skipping the verification if options like `-malign-double` are
used, or a language that mandates sizes/alignments that differ from C.

The verification happens in CodeGen, as we can't have an IR dependency
in Basic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants