Skip to content

[NVPTX] Fix generic address in st.bulk intrinsic #130740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Wolfram70
Copy link
Contributor

@Wolfram70 Wolfram70 commented Mar 11, 2025

This PR fixes an oversight from the previous change (PR #128856) that introduced the st.bulk intrinsic where llvm_global_ptr_ty was used instead of llvm_ptr_ty for generic addressing.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-st-bulk

This fixes an oversight in the previous change that introduced the `st.bulk`
intrinsic where `llvm_global_ptr_ty` was used instead of `llvm_ptr_ty`.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-st-bulk
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2025

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-llvm-ir

Author: Srinivasa Ravi (Wolfram70)

Changes

This fixes an oversight in a previous change that introduced the st.bulk intrinsic where llvm_global_ptr_ty was used instead of llvm_ptr_ty for generic addressing.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-st-bulk


Full diff: https://github.com/llvm/llvm-project/pull/130740.diff

2 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st_bulk.ll (+3-3)
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 0b183fc30068e..ea58985cbebda 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5274,7 +5274,7 @@ foreach shape = ["16x64b", "16x128b", "16x256b", "32x32b", "16x32bx2"] in {
 //
 
 def int_nvvm_st_bulk: DefaultAttrsIntrinsic<[],
-  [llvm_global_ptr_ty, llvm_i64_ty, llvm_i64_ty],
+  [llvm_ptr_ty, llvm_i64_ty, llvm_i64_ty],
   [IntrArgMemOnly, IntrWriteMem,
     WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
 
diff --git a/llvm/test/CodeGen/NVPTX/st_bulk.ll b/llvm/test/CodeGen/NVPTX/st_bulk.ll
index 085df7f1d8d3f..785f78a6f9519 100644
--- a/llvm/test/CodeGen/NVPTX/st_bulk.ll
+++ b/llvm/test/CodeGen/NVPTX/st_bulk.ll
@@ -4,8 +4,8 @@
 ; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 | %ptxas-verify -arch=sm_100 %}
 ; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100 %}
 
-declare void @llvm.nvvm.st.bulk(ptr addrspace(1), i64, i64)
-define void @st_bulk(ptr addrspace(1) %dest_addr, i64 %size) {
+declare void @llvm.nvvm.st.bulk(ptr, i64, i64)
+define void @st_bulk(ptr %dest_addr, i64 %size) {
 ; CHECK-LABEL: st_bulk(
 ; CHECK:       {
 ; CHECK-NEXT:    .reg .b64 %rd<3>;
@@ -15,7 +15,7 @@ define void @st_bulk(ptr addrspace(1) %dest_addr, i64 %size) {
 ; CHECK-NEXT:    ld.param.u64 %rd2, [st_bulk_param_1];
 ; CHECK-NEXT:    st.bulk [%rd1], %rd2, 0;
 ; CHECK-NEXT:    ret;
-  call void @llvm.nvvm.st.bulk(ptr addrspace(1) %dest_addr, i64 %size, i64 0)
+  call void @llvm.nvvm.st.bulk(ptr %dest_addr, i64 %size, i64 0)
   ret void
 }
 

@Wolfram70 Wolfram70 merged commit 7257261 into llvm:main Mar 12, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants