Skip to content

[NVPTX] cap param alignment at 128 (max supported by ptx) #96117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

AlexMaclean
Copy link
Member

Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the PTX Writer's Guide to Interoperability

D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.

@AlexMaclean AlexMaclean requested a review from jlebar June 19, 2024 21:55
@AlexMaclean AlexMaclean self-assigned this Jun 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2024

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the PTX Writer's Guide to Interoperability

> D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.


Full diff: https://github.com/llvm/llvm-project/pull/96117.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+5-3)
  • (added) llvm/test/CodeGen/NVPTX/max-align.ll (+14)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index f4ef7c9914f13..982c191875750 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -5038,7 +5038,9 @@ bool NVPTXTargetLowering::getTgtMemIntrinsic(
 /// ensures that alignment is 16 or greater.
 Align NVPTXTargetLowering::getFunctionParamOptimizedAlign(
     const Function *F, Type *ArgTy, const DataLayout &DL) const {
-  const uint64_t ABITypeAlign = DL.getABITypeAlign(ArgTy).value();
+  // Capping the alignment to 128 bytes as that is the maximum alignment
+  // supported by PTX.
+  const Align ABITypeAlign = std::min(Align(128), DL.getABITypeAlign(ArgTy));
 
   // If a function has linkage different from internal or private, we
   // must use default ABI alignment as external users rely on it. Same
@@ -5048,10 +5050,10 @@ Align NVPTXTargetLowering::getFunctionParamOptimizedAlign(
                          /*IgnoreCallbackUses=*/false,
                          /*IgnoreAssumeLikeCalls=*/true,
                          /*IgnoreLLVMUsed=*/true))
-    return Align(ABITypeAlign);
+    return ABITypeAlign;
 
   assert(!isKernelFunction(*F) && "Expect kernels to have non-local linkage");
-  return Align(std::max(uint64_t(16), ABITypeAlign));
+  return std::max(Align(16), ABITypeAlign);
 }
 
 /// Helper for computing alignment of a device function byval parameter.
diff --git a/llvm/test/CodeGen/NVPTX/max-align.ll b/llvm/test/CodeGen/NVPTX/max-align.ll
new file mode 100644
index 0000000000000..c8b1cb12dee5f
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/max-align.ll
@@ -0,0 +1,14 @@
+; RUN: llc < %s -march=nvptx64 -O0 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -O0 | %ptxas-verify %}
+
+
+; CHECK: .visible .func  (.param .align 128 .b8 func_retval0[256]) repro()
+define <64 x i32> @repro() {
+
+  ; CHECK: .param .align 128 .b8 retval0[256];
+  %1 = tail call <64 x i32> @test(i32 0)
+  ret <64 x i32> %1
+}
+
+; Function Attrs: nounwind
+declare <64 x i32> @test(i32)

@AlexMaclean
Copy link
Member Author

@jlebar ping

Copy link
Member

@jlebar jlebar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this one. LGTM!

@AlexMaclean AlexMaclean merged commit 5c9513a into llvm:main Jun 25, 2024
9 checks passed
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Cap the alignment to 128 bytes as that is the maximum alignment
supported by PTX. The restriction is mentioned in the parameter passing
section (Note D) of the [PTX Writer's Guide to Interoperability]
(https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing)

> D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants