Skip to content

[AMDGPU] Support tfe operand in image_atomic instructions #92469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/lib/Target/AMDGPU/MIMGInstructions.td
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,10 @@ multiclass MIMG_Atomic <mimgopc op, string asm, bit isCmpSwap = 0, bit isFP = 0,
defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_64, VGPR_32), 1, isFP, renamed>;
let VDataDwords = !if(isCmpSwap, 4, 2) in
defm _V2 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_128, VReg_64), 0, isFP, renamed>;
let VDataDwords = !if(isCmpSwap, 2, 2) in
defm _V3 : MIMG_Atomic_Addr_Helper_m <op, asm, VReg_96, 0, isFP, renamed>;
let VDataDwords = !if(isCmpSwap, 4, 4) in
defm _V4 : MIMG_Atomic_Addr_Helper_m <op, asm, VReg_160, 0, isFP, renamed>;
}
} // End IsAtomicRet = 1
}
Expand Down
91 changes: 91 additions & 0 deletions llvm/test/MC/AMDGPU/gfx10_asm_mimg.s
Original file line number Diff line number Diff line change
Expand Up @@ -654,3 +654,94 @@ image_sample_c_d_o_g16 v[0:1], [v0, v1, v2, v4, v6, v7, v8], s[0:7], s[8:11] dma

image_sample_d v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
; GFX10: image_sample_d v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16 ; encoding: [0x08,0x0f,0x88,0xf0,0x00,0x00,0x40,0x40]

; Test dmask + tfe for image_atomic instructions
image_atomic_add v0, v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D
; GFX10: image_atomic_add v0, v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x01,0x44,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x45,0xf0,0x0a,0x00,0x04,0x00]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests didn't check dmask 0x2? Also, should comprehensively test all the image opcodes, not just add and cmpswap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests didn't check dmask 0x2? Also, should comprehensively test all the image opcodes, not just add and cmpswap

For image_atomic instructions, code in AMDGPUAsmParser::validateMIMGAtomicDMask requires DMask to be 0x1, 0x3, or 0xf.

bool AMDGPUAsmParser::validateMIMGAtomicDMask(const MCInst &Inst) {
...
  // This is an incomplete check because image_atomic_cmpswap
  // may only use 0x3 and 0xf while other atomic operations
  // may use 0x1 and 0x3. However these limitations are
  // verified when we check that dmask matches dst size.
  return DMask == 0x1 || DMask == 0x3 || DMask == 0xf;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still test all the various other opcodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been added for all other image_atomic opcodes.

image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D
; GFX10: image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x03,0x44,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_add v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_add v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x45,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_swap v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_swap v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x3d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_swap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_swap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x3d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_sub v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_sub v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x49,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_sub v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_sub v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x49,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_smin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_smin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x51,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_smin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_smin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x51,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_umin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_umin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x55,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_umin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_umin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x55,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_smax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_smax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x59,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_smax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_smax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x59,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_umax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_umax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x5d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_umax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_umax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x5d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_and v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_and v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x61,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_and v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_and v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x61,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_or v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_or v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x65,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_or v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_or v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x65,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_xor v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_xor v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x69,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_xor v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_xor v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x69,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_inc v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_inc v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x6d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_inc v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_inc v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x6d,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_dec v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_dec v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x01,0x71,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_dec v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_dec v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x71,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D
; GFX10: image_atomic_cmpswap v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x03,0x40,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_cmpswap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x03,0x41,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:3], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D
; GFX10: image_atomic_cmpswap v[0:3], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x40,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:4], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D tfe
; GFX10: image_atomic_cmpswap v[0:4], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D tfe ; encoding: [0x08,0x0f,0x41,0xf0,0x0a,0x00,0x04,0x00]
91 changes: 91 additions & 0 deletions llvm/test/MC/AMDGPU/gfx11_asm_mimg.s
Original file line number Diff line number Diff line change
Expand Up @@ -5603,3 +5603,94 @@ image_store_pck v1, v[2:3], s[96:103] dmask:0x4 dim:SQ_RSRC_IMG_2D_MSAA unorm a1

image_store_pck v255, v[254:255], ttmp[8:15] dmask:0x4 dim:SQ_RSRC_IMG_2D_MSAA unorm glc slc dlc a16 lwe
// GFX11: [0x98,0x74,0x21,0xf0,0xfe,0xff,0x5d,0x00]

; Test dmask + tfe for image_atomic instructions
image_atomic_add v0, v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D
// GFX11: [0x04,0x01,0x30,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x30,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_add v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D
// GFX11: [0x04,0x03,0x30,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_add v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x30,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_swap v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x28,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_swap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x28,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_sub v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x34,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_sub v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x34,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_smin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x38,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_smin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x38,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_umin v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x3c,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_umin v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x3c,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_smax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x40,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_smax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x40,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_umax v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x44,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_umax v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x44,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_and v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x48,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_and v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x48,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_or v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x4c,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_or v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x4c,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_xor v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x50,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_xor v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x50,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_inc v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x54,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_inc v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x54,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_dec v[0:1], v[10:11], s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x01,0x58,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_dec v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x58,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_cmpswap v[0:1], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D
// GFX11: [0x04,0x03,0x2c,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:2], v[10:11], s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x03,0x2c,0xf0,0x0a,0x00,0x24,0x00]

image_atomic_cmpswap v[0:3], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D
// GFX11: [0x04,0x0f,0x2c,0xf0,0x0a,0x00,0x04,0x00]

image_atomic_cmpswap v[0:4], v[10:11], s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D tfe
// GFX11: [0x04,0x0f,0x2c,0xf0,0x0a,0x00,0x24,0x00]
Loading
Loading