[AMDGPU] - Add address space for strided buffers #74471

OutOfCache · 2023-12-05T14:04:45Z

This is an experimental address space for strided buffers. These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they point at buffer[index * stride].
Thus, we can use the idxen modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.

llvmbot · 2023-12-05T14:05:11Z

@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Jessica Del (OutOfCache)

Changes

This is an experimental address space for strided buffers. These buffers can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they point at buffer[index * stride].
Thus, we can use the idxen modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.

Full diff: https://github.com/llvm/llvm-project/pull/74471.diff

8 Files Affected:

(modified) llvm/docs/AMDGPUUsage.rst (+30-18)
(modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+18-14)
(modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+5-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+1-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+2-1)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+11-2)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll (+70)
(modified) llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll (+17-2)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7fb3d70bbeffe..ff45efac7e848 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -703,23 +703,24 @@ supported for the ``amdgcn`` target.
   .. table:: AMDGPU Address Spaces
      :name: amdgpu-address-spaces-table
 
-     ================================= =============== =========== ================ ======= ============================
-     ..                                                                                     64-Bit Process Address Space
-     --------------------------------- --------------- ----------- ---------------- ------------------------------------
-     Address Space Name                LLVM IR Address HSA Segment Hardware         Address NULL Value
-                                       Space Number    Name        Name             Size
-     ================================= =============== =========== ================ ======= ============================
-     Generic                           0               flat        flat             64      0x0000000000000000
-     Global                            1               global      global           64      0x0000000000000000
-     Region                            2               N/A         GDS              32      *not implemented for AMDHSA*
-     Local                             3               group       LDS              32      0xFFFFFFFF
-     Constant                          4               constant    *same as global* 64      0x0000000000000000
-     Private                           5               private     scratch          32      0xFFFFFFFF
-     Constant 32-bit                   6               *TODO*                               0x00000000
-     Buffer Fat Pointer (experimental) 7               *TODO*
-     Buffer Resource (experimental)    8               *TODO*
-     Streamout Registers               128             N/A         GS_REGS
-     ================================= =============== =========== ================ ======= ============================
+     ===================================== =============== =========== ================ ======= ============================
+     ..                                                                                         64-Bit Process Address Space
+     ------------------------------------- --------------- ----------- ---------------- ------------------------------------
+     Address Space Name                    LLVM IR Address HSA Segment Hardware         Address NULL Value
+                                           Space Number    Name        Name             Size
+     ===================================== =============== =========== ================ ======= ============================
+     Generic                               0               flat        flat             64      0x0000000000000000
+     Global                                1               global      global           64      0x0000000000000000
+     Region                                2               N/A         GDS              32      *not implemented for AMDHSA*
+     Local                                 3               group       LDS              32      0xFFFFFFFF
+     Constant                              4               constant    *same as global* 64      0x0000000000000000
+     Private                               5               private     scratch          32      0xFFFFFFFF
+     Constant 32-bit                       6               *TODO*                               0x00000000
+     Buffer Fat Pointer (experimental)     7               *TODO*
+     Buffer Resource (experimental)        8               *TODO*
+     Buffer Strided Pointer (experimental) 9               *TODO*
+     Streamout Registers                   128             N/A         GS_REGS
+     ===================================== =============== =========== ================ ======= ============================
 
 **Generic**
   The generic address space is supported unless the *Target Properties* column
@@ -836,7 +837,7 @@ supported for the ``amdgcn`` target.
   the backend.
 
   The buffer descriptor used to construct a buffer fat pointer must be *raw*:
-  the stride must be 0, the "add tid" flag bust be 0, the swizzle enable bits
+  the stride must be 0, the "add tid" flag must be 0, the swizzle enable bits
   must be off, and the extent must be measured in bytes. (On subtargets where
   bounds checking may be disabled, buffer fat pointers may choose to enable
   it or not).
@@ -864,6 +865,17 @@ supported for the ``amdgcn`` target.
   (bits `127:96`). The specific interpretation of these fields varies by the
   target architecture and is detailed in the ISA descriptions.
 
+**Buffer Strided Pointer**
+  The buffer index pointer is an experimental address space. It is supposed to
+  model a 128-bit buffer descriptor and a 32-bit offset, like the **Buffer Fat
+  Pointer**. Additionally, it contains an index into the descriptor, which
+  allows the direct addressing of structured elements.
+
+  The buffer descriptor must be *raw*:
+  the stride is the size of a structured element, the "add tid" flag must be 0, the
+  swizzle eneable bits must be off, and the extent (NumRecords) must be measured in
+  elements.
+
 **Streamout Registers**
   Dedicated registers used by the GS NGG Streamout Instructions. The register
   file is modelled as a memory in a distinct address space because it is indexed
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 323560a46f31d..d05d464e95729 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -401,7 +401,7 @@ enum TargetIndex {
 namespace AMDGPUAS {
 enum : unsigned {
   // The maximum value for flat, generic, local, private, constant and region.
-  MAX_AMDGPU_ADDRESS = 8,
+  MAX_AMDGPU_ADDRESS = 9,
 
   FLAT_ADDRESS = 0,   ///< Address space for flat memory.
   GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
@@ -418,6 +418,9 @@ enum : unsigned {
 
   BUFFER_RESOURCE = 8, ///< Address space for 128-bit buffer resources.
 
+  BUFFER_STRIDED_POINTER = 9, ///< Address space for 192-bit fat buffer
+                              ///< pointers with an additional index.
+
   /// Internal address spaces. Can be freely renumbered.
   STREAMOUT_REGISTER = 128, ///< Address space for GS NGG Streamout registers.
   /// end Internal address spaces.
@@ -472,24 +475,25 @@ inline bool isExtendedGlobalAddrSpace(unsigned AS) {
 }
 
 static inline bool addrspacesMayAlias(unsigned AS1, unsigned AS2) {
-  static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 8, "Addr space out of range");
+  static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 9, "Addr space out of range");
 
   if (AS1 > AMDGPUAS::MAX_AMDGPU_ADDRESS || AS2 > AMDGPUAS::MAX_AMDGPU_ADDRESS)
     return true;
 
-  // This array is indexed by address space value enum elements 0 ... to 8
+  // This array is indexed by address space value enum elements 0 ... to 9
   // clang-format off
-  static const bool ASAliasRules[9][9] = {
-    /*                   Flat   Global Region  Group Constant Private Const32 BufFatPtr BufRsrc */
-    /* Flat     */        {true,  true,  false, true,  true,  true,  true,  true,  true},
-    /* Global   */        {true,  true,  false, false, true,  false, true,  true,  true},
-    /* Region   */        {false, false, true,  false, false, false, false, false, false},
-    /* Group    */        {true,  false, false, true,  false, false, false, false, false},
-    /* Constant */        {true,  true,  false, false, false, false, true,  true,  true},
-    /* Private  */        {true,  false, false, false, false, true,  false, false, false},
-    /* Constant 32-bit */ {true,  true,  false, false, true,  false, false, true,  true},
-    /* Buffer Fat Ptr  */ {true,  true,  false, false, true,  false, true,  true,  true},
-    /* Buffer Resource */ {true,  true,  false, false, true,  false, true,  true,  true},
+  static const bool ASAliasRules[10][10] = {
+    /*                       Flat   Global Region  Group Constant Private Const32 BufFatPtr BufRsrc BufStrdPtr */
+    /* Flat     */            {true,  true,  false, true,  true,  true,  true,  true,  true,  true},
+    /* Global   */            {true,  true,  false, false, true,  false, true,  true,  true,  true},
+    /* Region   */            {false, false, true,  false, false, false, false, false, false, false},
+    /* Group    */            {true,  false, false, true,  false, false, false, false, false, false},
+    /* Constant */            {true,  true,  false, false, false, false, true,  true,  true,  true},
+    /* Private  */            {true,  false, false, false, false, true,  false, false, false, false},
+    /* Constant 32-bit */     {true,  true,  false, false, true,  false, false, true,  true,  true},
+    /* Buffer Fat Ptr  */     {true,  true,  false, false, true,  false, true,  true,  true,  true},
+    /* Buffer Resource */     {true,  true,  false, false, true,  false, true,  true,  true,  true},
+    /* Buffer Strided Ptr  */ {true,  true,  false, false, true,  false, true,  true,  true,  true},
   };
   // clang-format on
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index ccdbd3216e260..55293f463a79d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -633,6 +633,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
   const LLT PrivatePtr = GetAddrSpacePtr(AMDGPUAS::PRIVATE_ADDRESS);
   const LLT BufferFatPtr = GetAddrSpacePtr(AMDGPUAS::BUFFER_FAT_POINTER);
   const LLT RsrcPtr = GetAddrSpacePtr(AMDGPUAS::BUFFER_RESOURCE);
+  const LLT BufferStridedPtr =
+      GetAddrSpacePtr(AMDGPUAS::BUFFER_STRIDED_POINTER);
 
   const LLT CodePtr = FlatPtr;
 
@@ -1103,7 +1105,7 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
   }
 
   getActionDefinitionsBuilder(G_PTR_ADD)
-      .unsupportedFor({BufferFatPtr, RsrcPtr})
+      .unsupportedFor({BufferFatPtr, BufferStridedPtr, RsrcPtr})
       .legalIf(all(isPointer(0), sameSize(0, 1)))
       .scalarize(0)
       .scalarSameSizeAs(1, 0);
@@ -1393,7 +1395,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
     // The custom pointers (fat pointers, buffer resources) don't work with load
     // and store at this level. Fat pointers should have been lowered to
     // intrinsics before the translation to MIR.
-    Actions.unsupportedIf(typeInSet(1, {BufferFatPtr, RsrcPtr}));
+    Actions.unsupportedIf(
+        typeInSet(1, {BufferFatPtr, BufferStridedPtr, RsrcPtr}));
 
     // Address space 8 pointers are handled by a 4xs32 load, bitcast, and
     // ptrtoint. This is needed to account for the fact that we can't have i128
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 0c38fa32c6f33..6506f05cffa29 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -540,7 +540,7 @@ static StringRef computeDataLayout(const Triple &TT) {
   return "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
          "-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:"
          "128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-"
-         "G1-ni:7:8";
+         "G1-ni:7:8:9";
 }
 
 LLVM_READNONE
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index cb877a4695f1e..c2caa83422330 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -368,7 +368,8 @@ unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
       AddrSpace == AMDGPUAS::CONSTANT_ADDRESS ||
       AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT ||
       AddrSpace == AMDGPUAS::BUFFER_FAT_POINTER ||
-      AddrSpace == AMDGPUAS::BUFFER_RESOURCE) {
+      AddrSpace == AMDGPUAS::BUFFER_RESOURCE ||
+      AddrSpace == AMDGPUAS::BUFFER_STRIDED_POINTER) {
     return 512;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a7f4d63229b7e..1c49c27c753fb 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1040,12 +1040,20 @@ static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
 MVT SITargetLowering::getPointerTy(const DataLayout &DL, unsigned AS) const {
   if (AMDGPUAS::BUFFER_FAT_POINTER == AS && DL.getPointerSizeInBits(AS) == 160)
     return MVT::v5i32;
+  if (AMDGPUAS::BUFFER_STRIDED_POINTER == AS &&
+      DL.getPointerSizeInBits(AS) == 192)
+    return MVT::v6i32;
   return AMDGPUTargetLowering::getPointerTy(DL, AS);
 }
 /// Similarly, the in-memory representation of a p7 is {p8, i32}, aka
 /// v8i32 when padding is added.
+/// The in-memory representation of a p9 is {p8, i32, i32}, which is
+/// also v8i32 with padding.
 MVT SITargetLowering::getPointerMemTy(const DataLayout &DL, unsigned AS) const {
-  if (AMDGPUAS::BUFFER_FAT_POINTER == AS && DL.getPointerSizeInBits(AS) == 160)
+  if ((AMDGPUAS::BUFFER_FAT_POINTER == AS &&
+       DL.getPointerSizeInBits(AS) == 160) ||
+      (AMDGPUAS::BUFFER_STRIDED_POINTER == AS &&
+       DL.getPointerSizeInBits(AS) == 192))
     return MVT::v8i32;
   return AMDGPUTargetLowering::getPointerMemTy(DL, AS);
 }
@@ -1405,7 +1413,8 @@ bool SITargetLowering::isLegalAddressingMode(const DataLayout &DL,
 
   if (AS == AMDGPUAS::CONSTANT_ADDRESS ||
       AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT ||
-      AS == AMDGPUAS::BUFFER_FAT_POINTER || AS == AMDGPUAS::BUFFER_RESOURCE) {
+      AS == AMDGPUAS::BUFFER_FAT_POINTER || AS == AMDGPUAS::BUFFER_RESOURCE ||
+      AS == AMDGPUAS::BUFFER_STRIDED_POINTER) {
     // If the offset isn't a multiple of 4, it probably isn't going to be
     // correctly aligned.
     // FIXME: Can we get the real alignment here?
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll
index 2453b1f415ec3..a13eb5c6d085f 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll
@@ -248,3 +248,73 @@ define void @test_8_5(ptr %p) {
   load i8, ptr addrspace(3) @shm
   ret void
 }
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8* %p1
+define void @test_9_0(ptr addrspace(9) %p, ptr addrspace(0) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(0) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(1)* %p1
+define void @test_9_1(ptr addrspace(9) %p, ptr addrspace(1) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(1) %p1
+  ret void
+}
+
+; CHECK: NoAlias:  i8 addrspace(9)* %p, i8 addrspace(2)* %p1
+define void @test_9_2(ptr addrspace(9) %p, ptr addrspace(2) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(2) %p1
+  ret void
+}
+
+; CHECK: NoAlias:  i8 addrspace(9)* %p, i8 addrspace(3)* %p1
+define void @test_9_3(ptr addrspace(9) %p, ptr addrspace(3) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(3) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(4)* %p1
+define void @test_9_4(ptr addrspace(9) %p, ptr addrspace(4) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(4) %p1
+  ret void
+}
+
+; CHECK: NoAlias:  i8 addrspace(9)* %p, i8 addrspace(5)* %p1
+define void @test_9_5(ptr addrspace(9) %p, ptr addrspace(5) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(5) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(6)* %p1
+define void @test_9_6(ptr addrspace(9) %p, ptr addrspace(6) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(6) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(7)* %p1
+define void @test_9_7(ptr addrspace(9) %p, ptr addrspace(7) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(7) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(8)* %p1
+define void @test_9_8(ptr addrspace(9) %p, ptr addrspace(8) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(8) %p1
+  ret void
+}
+
+; CHECK: MayAlias:  i8 addrspace(9)* %p, i8 addrspace(9)* %p1
+define void @test_9_9(ptr addrspace(9) %p, ptr addrspace(9) %p1) {
+  load i8, ptr addrspace(9) %p
+  load i8, ptr addrspace(9) %p1
+  ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll b/llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll
index c109d38b9cb2f..4aab097229a47 100644
--- a/llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll
+++ b/llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll
@@ -1,7 +1,7 @@
 ; RUN: opt -S -mtriple=amdgcn-- -passes=load-store-vectorizer < %s | FileCheck -check-prefix=OPT %s
 
-; OPT-LABEL: @func(
-define void @func(ptr addrspace(7) %out) {
+; OPT-LABEL: @buffer_fat_ptrs(
+define void @buffer_fat_ptrs(ptr addrspace(7) %out) {
 entry:
   %a1 = getelementptr i32, ptr addrspace(7) %out, i32 1
   %a2 = getelementptr i32, ptr addrspace(7) %out, i32 2
@@ -14,3 +14,18 @@ entry:
   store i32 3, ptr addrspace(7) %a3
   ret void
 }
+
+; OPT-LABEL: @buffer_strided_ptrs(
+define void @buffer_strided_ptrs(ptr addrspace(9) %out) {
+entry:
+  %a1 = getelementptr i32, ptr addrspace(9) %out, i32 1
+  %a2 = getelementptr i32, ptr addrspace(9) %out, i32 2
+  %a3 = getelementptr i32, ptr addrspace(9) %out, i32 3
+
+; OPT: store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, ptr addrspace(9) %out, align 4
+  store i32 0, ptr addrspace(9) %out
+  store i32 1, ptr addrspace(9) %a1
+  store i32 2, ptr addrspace(9) %a2
+  store i32 3, ptr addrspace(9) %a3
+  ret void
+}

OutOfCache · 2023-12-05T14:06:05Z

llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll

@@ -1,7 +1,7 @@
 ; RUN: opt -S -mtriple=amdgcn-- -passes=load-store-vectorizer < %s | FileCheck -check-prefix=OPT %s


Should I rename the file? Left it as is for now to see actual changes more easily.

I think it's fine. It's still a kind of buffer fat pointer in a sense.

krzysz00

How do you intend to rewrite these operations down to the underlying instructions?

That is, what's your planned equivalent to https://reviews.llvm.org/D158463 ?

krzysz00 · 2023-12-05T15:49:42Z

llvm/docs/AMDGPUUsage.rst

+  Pointer**. Additionally, it contains an index into the descriptor, which
+  allows the direct addressing of structured elements.
+
+  The buffer descriptor must be *raw*:


I wouldn't describe this as "raw"

And why the requirement for no swizzling?

It's the same language as for addrspace 7, and I think it makes sense: it's the raw bits as expected by the hardware.

I recall that the buffer intrinsics have a swz bit in their flags -- probably due to some HW issue at some point? I don't remember the details.

I changed the wording to avoid confusion.

OutOfCache · 2023-12-07T09:31:54Z

How do you intend to rewrite these operations down to the underlying instructions?

That is, what's your planned equivalent to https://reviews.llvm.org/D158463 ?

Thank you for the link to the code review, I was not aware of your changes before. Up until now, we intended to use the address space in llpc exclusively and lower the pointers to buffer load instructions in PatchBufferOp. If we also need to lower these pointers like you did for the fat buffers, maybe we can extend your new pass to also handle these strided buffer pointers?

nhaehnle

Thanks!

Since the plan for now is to have the lowering in LLPC, it's unclear to me how many of the codegen-related changes we actually want in here, though I suppose they don't hurt.

llvm/docs/AMDGPUUsage.rst

nhaehnle · 2023-12-07T09:43:44Z

llvm/docs/AMDGPUUsage.rst

+  Pointer**. Additionally, it contains an index into the descriptor, which
+  allows the direct addressing of structured elements.
+
+  The buffer descriptor must be *raw*:


It's the same language as for addrspace 7, and I think it makes sense: it's the raw bits as expected by the hardware.

I recall that the buffer intrinsics have a swz bit in their flags -- probably due to some HW issue at some point? I don't remember the details.

llvm/docs/AMDGPUUsage.rst

nhaehnle · 2023-12-07T09:45:29Z

llvm/test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll

@@ -1,7 +1,7 @@
 ; RUN: opt -S -mtriple=amdgcn-- -passes=load-store-vectorizer < %s | FileCheck -check-prefix=OPT %s


I think it's fine. It's still a kind of buffer fat pointer in a sense.

krzysz00 · 2023-12-07T15:13:16Z

I'm going to ask the annoying questions:

Isn't a strided buffer one where the field that's named something like stride (bits 61:48 or 63:48) is non-zero
And therefore it uses structured buffers and the llvm.struct[.ptr].buffer.* intrinsics?
So, with LLVM's gep, how do you tell which field you're GEPing on to?
Therefore, is 32-bit index length the right one?

This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.

nhaehnle · 2023-12-11T20:41:10Z

1. Isn't a strided buffer one where the field that's named something like `stride` (bits 61:48 or 63:48) is non-zero

Yes, and one that is meant to use with buffer instructions using idxen.

2. And therefore it uses structured buffers and the `llvm.struct[.ptr].buffer.*` intrinsics?

Yes.

3. So, with LLVM's gep, how do you tell which field you're GEPing on to?

The intended use is that GEPs affect only the offset part of the fat pointer. The index part is changed using a separate instruction/operation/intrinsic.

4. Therefore, is 32-bit index length the right one?

Yes, because that's the index size in the underlying HW.

nhaehnle

FWIW, this LGTM. Give it another day or two for other feedback.

krzysz00

Looks good to me, aside from a documentation nit.

krzysz00 · 2023-12-12T15:27:07Z

llvm/docs/AMDGPUUsage.rst

+  The buffer index pointer is an experimental address space. It is supposed to
+  model a 128-bit buffer descriptor and a 32-bit offset, like the **Buffer Fat
+  Pointer**. Additionally, it contains an index into the buffer, which
+  allows the direct addressing of structured elements.


Document the order of the fields please

OutOfCache added the backend:AMDGPU label Dec 5, 2023

OutOfCache requested review from arsenm, krzysz00, nhaehnle and rovka December 5, 2023 14:04

OutOfCache commented Dec 5, 2023

View reviewed changes

krzysz00 reviewed Dec 5, 2023

View reviewed changes

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" mlir:llvm mlir llvm:transforms clang:openmp OpenMP related changes to Clang llvm:ir labels Dec 5, 2023

nhaehnle reviewed Dec 7, 2023

View reviewed changes

OutOfCache added 6 commits December 11, 2023 11:38

fixup! [AMDGPU] - Add address space for strided buffers

b22d73f

fixup! [AMDGPU] - Add address space for strided buffers

cdd441f

fixup! [AMDGPU] - Add address space for strided buffers

0ae0606

fixup! [AMDGPU] - Add address space for strided buffers

6cd2f72

fixup! [AMDGPU] - Add address space for strided buffers

985683f

OutOfCache force-pushed the addrspace-9 branch from cc1850b to 985683f Compare December 11, 2023 10:43

llvmbot added the llvm:support label Dec 11, 2023

fixup! [AMDGPU] - Add address space for strided buffers

1572812

nhaehnle approved these changes Dec 11, 2023

View reviewed changes

krzysz00 approved these changes Dec 12, 2023

View reviewed changes

fixup! [AMDGPU] - Add address space for strided buffers

9387ad9

OutOfCache merged commit 32f9983 into llvm:main Dec 15, 2023

OutOfCache deleted the addrspace-9 branch December 15, 2023 14:49

		@@ -1,7 +1,7 @@
		; RUN: opt -S -mtriple=amdgcn-- -passes=load-store-vectorizer < %s \| FileCheck -check-prefix=OPT %s

[AMDGPU] - Add address space for strided buffers #74471

[AMDGPU] - Add address space for strided buffers #74471

Uh oh!

Conversation

OutOfCache commented Dec 5, 2023

Uh oh!

llvmbot commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OutOfCache commented Dec 7, 2023

Uh oh!

nhaehnle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Dec 7, 2023

Uh oh!

nhaehnle commented Dec 11, 2023

Uh oh!

nhaehnle left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Dec 5, 2023 •

edited

Loading