-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Targets] Migrate from atomic_load_8/16/32/64 to atomic_load_nonext_8/16/32/64. NFC #137428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…/16/32/64. NFC atomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets.
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-backend-amdgpu Author: Craig Topper (topperc) Changesatomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets. Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137428.diff 25 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index 28d45fe25d30c..f3734e05ae667 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -55,9 +55,9 @@ let Predicates = [HasRCPC] in {
// 16-bit loads
def : Pat<(acquiring_load<atomic_load_azext_16> GPR64sp:$ptr), (LDAPRH GPR64sp:$ptr)>;
// 32-bit loads
- def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
// 64-bit loads
- def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
}
// 8-bit loads
@@ -93,62 +93,66 @@ def : Pat<(relaxed_load<atomic_load_azext_16>
(LDURHHi GPR64sp:$Rn, simm9:$offset)>;
// 32-bit loads
-def : Pat<(seq_cst_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)),
(LDRWroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)),
(LDRWroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s4:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),
(LDRWui GPR64sp:$Rn, uimm12s4:$offset)>;
-def : Pat<(relaxed_load<atomic_load_32>
+def : Pat<(relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDURWi GPR64sp:$Rn, simm9:$offset)>;
// 64-bit loads
-def : Pat<(seq_cst_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)),
(LDRXroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)),
(LDRXroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(LDRXui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(relaxed_load<atomic_load_64>
+def : Pat<(relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDURXi GPR64sp:$Rn, simm9:$offset)>;
// FP 32-bit loads
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))))),
(LDRSroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))))),
(LDRSroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRSui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32>
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset))))),
(LDURSi GPR64sp:$Rn, simm9:$offset)>;
// FP 64-bit loads
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))))),
(LDRDroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))))),
(LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64>
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset))))),
(LDURDi GPR64sp:$Rn, simm9:$offset)>;
@@ -561,16 +565,16 @@ let Predicates = [HasLSFE] in {
let Predicates = [HasRCPC3, HasNEON] in {
// LDAP1 loads
def : Pat<(vector_insert (v2i64 VecListOne128:$Rd),
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(vector_insert (v2f64 VecListOne128:$Rd),
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(v1i64 (scalar_to_vector
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))),
(EXTRACT_SUBREG (LDAP1 (v2i64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
def : Pat<(v1f64 (scalar_to_vector
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))))),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))))),
(EXTRACT_SUBREG (LDAP1 (v2f64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
// STL1 stores
@@ -597,10 +601,10 @@ let Predicates = [HasRCPC_IMMO, UseLDAPUR] in {
def : Pat<(acquiring_load<atomic_load_azext_16>
(am_unscaled16 GPR64sp:$Rn, simm9:$offset)),
(LDAPURHi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_32>
+ def : Pat<(acquiring_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDAPURi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_64>
+ def : Pat<(acquiring_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDAPURXi GPR64sp:$Rn, simm9:$offset)>;
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index 6cc76b44f1e14..78a92d85cfd8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -502,15 +502,15 @@ def zextloadi16_#as : PatFrag<(ops node:$ptr), (zextloadi16 node:$ptr)> {
let IsLoad = 1;
}
-def atomic_load_16_#as : PatFrag<(ops node:$ptr), (atomic_load_16 node:$ptr)> {
+def atomic_load_nonext_16_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_16 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_32_#as : PatFrag<(ops node:$ptr), (atomic_load_32 node:$ptr)> {
+def atomic_load_nonext_32_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_32 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {
+def atomic_load_nonext_64_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_64 node:$ptr)> {
let IsAtomic = 1;
}
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 7d64a3dd240c8..efcc81716a0f1 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -959,7 +959,7 @@ defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_aext_16_glo
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_zext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_aext_8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_zext_8_global>;
-defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_16_global>;
+defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_nonext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, extloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, zextloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_SBYTE", i32, sextloadi8_global>;
@@ -1933,8 +1933,8 @@ def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_const
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, extloadi16_constant>;
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, zextloadi16_constant>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_32_global>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_64_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_nonext_32_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_nonext_64_global>;
} // End SubtargetPredicate = isGFX6GFX7
multiclass MUBUFLoad_PatternOffset_Common <string Instr, ValueType vt,
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index 74884a2207079..604eb7f2c3878 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -859,12 +859,12 @@ defm : DSReadPat_t16 <DS_READ_U8, i16, "atomic_load_zext_8_local">;
defm : DSReadPat_mc <DS_READ_U8, i32, "atomic_load_zext_8_local">;
defm : DSReadPat_t16 <DS_READ_I8, i16, "atomic_load_sext_8_local">;
defm : DSReadPat_mc <DS_READ_I8, i32, "atomic_load_sext_8_local">;
-defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_16_local">;
+defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_nonext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_aext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_zext_16_local">;
defm : DSReadPat_mc <DS_READ_I16, i32, "atomic_load_sext_16_local">;
-defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_32_local">;
-defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_64_local">;
+defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_nonext_32_local">;
+defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_nonext_64_local">;
let OtherPredicates = [D16PreservesUnusedBits] in {
// TODO: Atomic loads
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index d8bb6e4378924..c17fda1346115 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -1541,7 +1541,7 @@ def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_aext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_aext_16_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_16_flat, i16>;
+def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_nonext_16_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_zext_16_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;
@@ -1573,8 +1573,8 @@ let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predi
def : FlatStorePat <FLAT_STORE_SHORT_t16, store_flat, i16>;
} // End let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predicate = UseRealTrue16Insts
-def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_32_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_64_flat, i64>;
+def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_nonext_32_flat, i32>;
+def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_nonext_64_flat, i64>;
def : FlatStorePat <FLAT_STORE_BYTE, truncstorei8_flat, i32>;
def : FlatStorePat <FLAT_STORE_SHORT, truncstorei16_flat, i32>;
@@ -1682,7 +1682,7 @@ defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_aext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_aext_16_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_16_global, i16>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_nonext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, atomic_load_sext_8_global, i32>;
@@ -1733,8 +1733,8 @@ defm : GlobalFLATStorePats <GLOBAL_STORE_DWORDX4, store_global, vt>;
// There is no distinction for atomic load lowering during selection;
// the memory legalizer will set the cache bits and insert the
// appropriate waits.
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_32_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_64_global, i64>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_nonext_32_global, i32>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_nonext_64_global, i64>;
defm : GlobalFLATStorePats <GLOBAL_STORE_BYTE, truncstorei8_global, i32>;
defm : GlobalFLATStorePats <GLOBAL_STORE_SHORT, truncstorei16_global, i32>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index ec1fd6fb60d57..5d837d853ac98 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -361,6 +361,12 @@ def load_glue : PatFrag <(ops node:$ptr), (unindexedload_glue node:$ptr)> {
let IsNonExtLoad = 1;
}
+def atomic_load_nonext_glue :
+ PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
+ let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
+ let IsNonExtLoad = true;
+}
+
def atomic_load_zext_glue :
PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
@@ -379,20 +385,20 @@ def atomic_load_aext_glue :
let IsAnyExtLoad = true;
}
-def atomic_load_16_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_16_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i16;
}
-def atomic_load_32_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_32_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i32;
}
-def atomic_load_64_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_64_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i64;
}
@@ -506,12 +512,12 @@ def load_align16_local_m0 : PatFrag<(ops node:$ptr),
}
let IsAtomic = 1, AddressSpaces = LoadAddress_local.AddrSpaces in {
-def atomic_load_16_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_16_glue node:$ptr)>;
-def atomic_load_32_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_32_glue node:$ptr)>;
-def atomic_load_64_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_64_glue node:$ptr)>;
+def atomic_load_nonext_16_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_16_glue node:$ptr)>;
+def atomic_load_nonext_32_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_32_glue node:$ptr)>;
+def atomic_load_nonext_64_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_64_glue node:$ptr)>;
def atomic_load_zext_8_local_m0 : PatFrag<(ops node:$ptr),
(atomic_load_zext_8_glue node:$ptr)>;
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 1ce9190a68f3c..c682f597401ec 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -5384,7 +5384,7 @@ class acquiring_load<PatFrags base>
def atomic_load_azext_acquire_8 : acquiring_load<atomic_load_azext_8>;
def atomic_load_azext_acquire_16 : acquiring_load<atomic_load_azext_16>;
-def atomic_load_acquire_32 : acquiring_load<atomic_load_32>;
+def atomic_load_nonext_acquire_32 : acquiring_load<atomic_load_nonext_32>;
class releasing_store<PatFrag base>
: PatFrag<(ops node:$ptr, node:$val), (base node:$val, node:$ptr), [{
@@ -5399,7 +5399,7 @@ def atomic_store_release_32 : releasing_store<atomic_store_32>;
let AddedComplexity = 8 in {
def : ARMPat<(atomic_load_azext_acquire_8 addr_offset_none:$addr), (LDAB addr_offset_none:$addr)>;
def : ARMPat<(atomic_load_azext_acquire_16 addr_offset_none:$addr), (LDAH addr_offset_none:$addr)>;
- def : ARMPat<(atomic_load_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
+ def : ARMPat<(atomic_load_nonext_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_8 addr_offset_none:$addr, GPR:$val), (STLB GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_16 addr_offset_none:$addr, GPR:$val), (STLH GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_32 addr_offset_none:$addr, GPR:$val), (STL GPR:$val, addr_offset_none:$addr)>;
@@ -6220,9 +6220,9 @@ def : ARMPat<(atomic_load_azext_8 addrmode_imm12:$src),
(LDRBi12 addrmode_imm12:$src)>;
def : ARMPat<(atomic_load_azext_1...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) Changesatomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets. Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137428.diff 25 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index 28d45fe25d30c..f3734e05ae667 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -55,9 +55,9 @@ let Predicates = [HasRCPC] in {
// 16-bit loads
def : Pat<(acquiring_load<atomic_load_azext_16> GPR64sp:$ptr), (LDAPRH GPR64sp:$ptr)>;
// 32-bit loads
- def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
// 64-bit loads
- def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
}
// 8-bit loads
@@ -93,62 +93,66 @@ def : Pat<(relaxed_load<atomic_load_azext_16>
(LDURHHi GPR64sp:$Rn, simm9:$offset)>;
// 32-bit loads
-def : Pat<(seq_cst_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)),
(LDRWroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)),
(LDRWroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s4:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),
(LDRWui GPR64sp:$Rn, uimm12s4:$offset)>;
-def : Pat<(relaxed_load<atomic_load_32>
+def : Pat<(relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDURWi GPR64sp:$Rn, simm9:$offset)>;
// 64-bit loads
-def : Pat<(seq_cst_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)),
(LDRXroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)),
(LDRXroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(LDRXui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(relaxed_load<atomic_load_64>
+def : Pat<(relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDURXi GPR64sp:$Rn, simm9:$offset)>;
// FP 32-bit loads
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))))),
(LDRSroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))))),
(LDRSroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRSui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32>
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset))))),
(LDURSi GPR64sp:$Rn, simm9:$offset)>;
// FP 64-bit loads
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))))),
(LDRDroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))))),
(LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64>
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset))))),
(LDURDi GPR64sp:$Rn, simm9:$offset)>;
@@ -561,16 +565,16 @@ let Predicates = [HasLSFE] in {
let Predicates = [HasRCPC3, HasNEON] in {
// LDAP1 loads
def : Pat<(vector_insert (v2i64 VecListOne128:$Rd),
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(vector_insert (v2f64 VecListOne128:$Rd),
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(v1i64 (scalar_to_vector
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))),
(EXTRACT_SUBREG (LDAP1 (v2i64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
def : Pat<(v1f64 (scalar_to_vector
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))))),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))))),
(EXTRACT_SUBREG (LDAP1 (v2f64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
// STL1 stores
@@ -597,10 +601,10 @@ let Predicates = [HasRCPC_IMMO, UseLDAPUR] in {
def : Pat<(acquiring_load<atomic_load_azext_16>
(am_unscaled16 GPR64sp:$Rn, simm9:$offset)),
(LDAPURHi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_32>
+ def : Pat<(acquiring_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDAPURi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_64>
+ def : Pat<(acquiring_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDAPURXi GPR64sp:$Rn, simm9:$offset)>;
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index 6cc76b44f1e14..78a92d85cfd8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -502,15 +502,15 @@ def zextloadi16_#as : PatFrag<(ops node:$ptr), (zextloadi16 node:$ptr)> {
let IsLoad = 1;
}
-def atomic_load_16_#as : PatFrag<(ops node:$ptr), (atomic_load_16 node:$ptr)> {
+def atomic_load_nonext_16_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_16 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_32_#as : PatFrag<(ops node:$ptr), (atomic_load_32 node:$ptr)> {
+def atomic_load_nonext_32_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_32 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {
+def atomic_load_nonext_64_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_64 node:$ptr)> {
let IsAtomic = 1;
}
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 7d64a3dd240c8..efcc81716a0f1 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -959,7 +959,7 @@ defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_aext_16_glo
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_zext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_aext_8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_zext_8_global>;
-defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_16_global>;
+defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_nonext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, extloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, zextloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_SBYTE", i32, sextloadi8_global>;
@@ -1933,8 +1933,8 @@ def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_const
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, extloadi16_constant>;
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, zextloadi16_constant>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_32_global>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_64_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_nonext_32_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_nonext_64_global>;
} // End SubtargetPredicate = isGFX6GFX7
multiclass MUBUFLoad_PatternOffset_Common <string Instr, ValueType vt,
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index 74884a2207079..604eb7f2c3878 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -859,12 +859,12 @@ defm : DSReadPat_t16 <DS_READ_U8, i16, "atomic_load_zext_8_local">;
defm : DSReadPat_mc <DS_READ_U8, i32, "atomic_load_zext_8_local">;
defm : DSReadPat_t16 <DS_READ_I8, i16, "atomic_load_sext_8_local">;
defm : DSReadPat_mc <DS_READ_I8, i32, "atomic_load_sext_8_local">;
-defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_16_local">;
+defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_nonext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_aext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_zext_16_local">;
defm : DSReadPat_mc <DS_READ_I16, i32, "atomic_load_sext_16_local">;
-defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_32_local">;
-defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_64_local">;
+defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_nonext_32_local">;
+defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_nonext_64_local">;
let OtherPredicates = [D16PreservesUnusedBits] in {
// TODO: Atomic loads
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index d8bb6e4378924..c17fda1346115 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -1541,7 +1541,7 @@ def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_aext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_aext_16_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_16_flat, i16>;
+def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_nonext_16_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_zext_16_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;
@@ -1573,8 +1573,8 @@ let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predi
def : FlatStorePat <FLAT_STORE_SHORT_t16, store_flat, i16>;
} // End let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predicate = UseRealTrue16Insts
-def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_32_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_64_flat, i64>;
+def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_nonext_32_flat, i32>;
+def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_nonext_64_flat, i64>;
def : FlatStorePat <FLAT_STORE_BYTE, truncstorei8_flat, i32>;
def : FlatStorePat <FLAT_STORE_SHORT, truncstorei16_flat, i32>;
@@ -1682,7 +1682,7 @@ defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_aext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_aext_16_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_16_global, i16>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_nonext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, atomic_load_sext_8_global, i32>;
@@ -1733,8 +1733,8 @@ defm : GlobalFLATStorePats <GLOBAL_STORE_DWORDX4, store_global, vt>;
// There is no distinction for atomic load lowering during selection;
// the memory legalizer will set the cache bits and insert the
// appropriate waits.
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_32_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_64_global, i64>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_nonext_32_global, i32>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_nonext_64_global, i64>;
defm : GlobalFLATStorePats <GLOBAL_STORE_BYTE, truncstorei8_global, i32>;
defm : GlobalFLATStorePats <GLOBAL_STORE_SHORT, truncstorei16_global, i32>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index ec1fd6fb60d57..5d837d853ac98 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -361,6 +361,12 @@ def load_glue : PatFrag <(ops node:$ptr), (unindexedload_glue node:$ptr)> {
let IsNonExtLoad = 1;
}
+def atomic_load_nonext_glue :
+ PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
+ let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
+ let IsNonExtLoad = true;
+}
+
def atomic_load_zext_glue :
PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
@@ -379,20 +385,20 @@ def atomic_load_aext_glue :
let IsAnyExtLoad = true;
}
-def atomic_load_16_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_16_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i16;
}
-def atomic_load_32_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_32_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i32;
}
-def atomic_load_64_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_64_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i64;
}
@@ -506,12 +512,12 @@ def load_align16_local_m0 : PatFrag<(ops node:$ptr),
}
let IsAtomic = 1, AddressSpaces = LoadAddress_local.AddrSpaces in {
-def atomic_load_16_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_16_glue node:$ptr)>;
-def atomic_load_32_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_32_glue node:$ptr)>;
-def atomic_load_64_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_64_glue node:$ptr)>;
+def atomic_load_nonext_16_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_16_glue node:$ptr)>;
+def atomic_load_nonext_32_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_32_glue node:$ptr)>;
+def atomic_load_nonext_64_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_64_glue node:$ptr)>;
def atomic_load_zext_8_local_m0 : PatFrag<(ops node:$ptr),
(atomic_load_zext_8_glue node:$ptr)>;
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 1ce9190a68f3c..c682f597401ec 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -5384,7 +5384,7 @@ class acquiring_load<PatFrags base>
def atomic_load_azext_acquire_8 : acquiring_load<atomic_load_azext_8>;
def atomic_load_azext_acquire_16 : acquiring_load<atomic_load_azext_16>;
-def atomic_load_acquire_32 : acquiring_load<atomic_load_32>;
+def atomic_load_nonext_acquire_32 : acquiring_load<atomic_load_nonext_32>;
class releasing_store<PatFrag base>
: PatFrag<(ops node:$ptr, node:$val), (base node:$val, node:$ptr), [{
@@ -5399,7 +5399,7 @@ def atomic_store_release_32 : releasing_store<atomic_store_32>;
let AddedComplexity = 8 in {
def : ARMPat<(atomic_load_azext_acquire_8 addr_offset_none:$addr), (LDAB addr_offset_none:$addr)>;
def : ARMPat<(atomic_load_azext_acquire_16 addr_offset_none:$addr), (LDAH addr_offset_none:$addr)>;
- def : ARMPat<(atomic_load_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
+ def : ARMPat<(atomic_load_nonext_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_8 addr_offset_none:$addr, GPR:$val), (STLB GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_16 addr_offset_none:$addr, GPR:$val), (STLH GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_32 addr_offset_none:$addr, GPR:$val), (STL GPR:$val, addr_offset_none:$addr)>;
@@ -6220,9 +6220,9 @@ def : ARMPat<(atomic_load_azext_8 addrmode_imm12:$src),
(LDRBi12 addrmode_imm12:$src)>;
def : ARMPat<(atomic_load_azext_1...
[truncated]
|
@llvm/pr-subscribers-backend-aarch64 Author: Craig Topper (topperc) Changesatomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets. Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137428.diff 25 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index 28d45fe25d30c..f3734e05ae667 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -55,9 +55,9 @@ let Predicates = [HasRCPC] in {
// 16-bit loads
def : Pat<(acquiring_load<atomic_load_azext_16> GPR64sp:$ptr), (LDAPRH GPR64sp:$ptr)>;
// 32-bit loads
- def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
// 64-bit loads
- def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
}
// 8-bit loads
@@ -93,62 +93,66 @@ def : Pat<(relaxed_load<atomic_load_azext_16>
(LDURHHi GPR64sp:$Rn, simm9:$offset)>;
// 32-bit loads
-def : Pat<(seq_cst_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)),
(LDRWroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)),
(LDRWroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s4:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),
(LDRWui GPR64sp:$Rn, uimm12s4:$offset)>;
-def : Pat<(relaxed_load<atomic_load_32>
+def : Pat<(relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDURWi GPR64sp:$Rn, simm9:$offset)>;
// 64-bit loads
-def : Pat<(seq_cst_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)),
(LDRXroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)),
(LDRXroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(LDRXui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(relaxed_load<atomic_load_64>
+def : Pat<(relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDURXi GPR64sp:$Rn, simm9:$offset)>;
// FP 32-bit loads
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))))),
(LDRSroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))))),
(LDRSroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRSui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32>
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset))))),
(LDURSi GPR64sp:$Rn, simm9:$offset)>;
// FP 64-bit loads
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))))),
(LDRDroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))))),
(LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64>
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset))))),
(LDURDi GPR64sp:$Rn, simm9:$offset)>;
@@ -561,16 +565,16 @@ let Predicates = [HasLSFE] in {
let Predicates = [HasRCPC3, HasNEON] in {
// LDAP1 loads
def : Pat<(vector_insert (v2i64 VecListOne128:$Rd),
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(vector_insert (v2f64 VecListOne128:$Rd),
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(v1i64 (scalar_to_vector
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))),
(EXTRACT_SUBREG (LDAP1 (v2i64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
def : Pat<(v1f64 (scalar_to_vector
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))))),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))))),
(EXTRACT_SUBREG (LDAP1 (v2f64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
// STL1 stores
@@ -597,10 +601,10 @@ let Predicates = [HasRCPC_IMMO, UseLDAPUR] in {
def : Pat<(acquiring_load<atomic_load_azext_16>
(am_unscaled16 GPR64sp:$Rn, simm9:$offset)),
(LDAPURHi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_32>
+ def : Pat<(acquiring_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDAPURi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_64>
+ def : Pat<(acquiring_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDAPURXi GPR64sp:$Rn, simm9:$offset)>;
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index 6cc76b44f1e14..78a92d85cfd8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -502,15 +502,15 @@ def zextloadi16_#as : PatFrag<(ops node:$ptr), (zextloadi16 node:$ptr)> {
let IsLoad = 1;
}
-def atomic_load_16_#as : PatFrag<(ops node:$ptr), (atomic_load_16 node:$ptr)> {
+def atomic_load_nonext_16_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_16 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_32_#as : PatFrag<(ops node:$ptr), (atomic_load_32 node:$ptr)> {
+def atomic_load_nonext_32_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_32 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {
+def atomic_load_nonext_64_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_64 node:$ptr)> {
let IsAtomic = 1;
}
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 7d64a3dd240c8..efcc81716a0f1 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -959,7 +959,7 @@ defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_aext_16_glo
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_zext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_aext_8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_zext_8_global>;
-defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_16_global>;
+defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_nonext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, extloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, zextloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_SBYTE", i32, sextloadi8_global>;
@@ -1933,8 +1933,8 @@ def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_const
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, extloadi16_constant>;
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, zextloadi16_constant>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_32_global>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_64_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_nonext_32_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_nonext_64_global>;
} // End SubtargetPredicate = isGFX6GFX7
multiclass MUBUFLoad_PatternOffset_Common <string Instr, ValueType vt,
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index 74884a2207079..604eb7f2c3878 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -859,12 +859,12 @@ defm : DSReadPat_t16 <DS_READ_U8, i16, "atomic_load_zext_8_local">;
defm : DSReadPat_mc <DS_READ_U8, i32, "atomic_load_zext_8_local">;
defm : DSReadPat_t16 <DS_READ_I8, i16, "atomic_load_sext_8_local">;
defm : DSReadPat_mc <DS_READ_I8, i32, "atomic_load_sext_8_local">;
-defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_16_local">;
+defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_nonext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_aext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_zext_16_local">;
defm : DSReadPat_mc <DS_READ_I16, i32, "atomic_load_sext_16_local">;
-defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_32_local">;
-defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_64_local">;
+defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_nonext_32_local">;
+defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_nonext_64_local">;
let OtherPredicates = [D16PreservesUnusedBits] in {
// TODO: Atomic loads
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index d8bb6e4378924..c17fda1346115 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -1541,7 +1541,7 @@ def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_aext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_aext_16_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_16_flat, i16>;
+def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_nonext_16_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_zext_16_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;
@@ -1573,8 +1573,8 @@ let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predi
def : FlatStorePat <FLAT_STORE_SHORT_t16, store_flat, i16>;
} // End let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predicate = UseRealTrue16Insts
-def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_32_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_64_flat, i64>;
+def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_nonext_32_flat, i32>;
+def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_nonext_64_flat, i64>;
def : FlatStorePat <FLAT_STORE_BYTE, truncstorei8_flat, i32>;
def : FlatStorePat <FLAT_STORE_SHORT, truncstorei16_flat, i32>;
@@ -1682,7 +1682,7 @@ defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_aext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_aext_16_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_16_global, i16>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_nonext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, atomic_load_sext_8_global, i32>;
@@ -1733,8 +1733,8 @@ defm : GlobalFLATStorePats <GLOBAL_STORE_DWORDX4, store_global, vt>;
// There is no distinction for atomic load lowering during selection;
// the memory legalizer will set the cache bits and insert the
// appropriate waits.
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_32_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_64_global, i64>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_nonext_32_global, i32>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_nonext_64_global, i64>;
defm : GlobalFLATStorePats <GLOBAL_STORE_BYTE, truncstorei8_global, i32>;
defm : GlobalFLATStorePats <GLOBAL_STORE_SHORT, truncstorei16_global, i32>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index ec1fd6fb60d57..5d837d853ac98 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -361,6 +361,12 @@ def load_glue : PatFrag <(ops node:$ptr), (unindexedload_glue node:$ptr)> {
let IsNonExtLoad = 1;
}
+def atomic_load_nonext_glue :
+ PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
+ let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
+ let IsNonExtLoad = true;
+}
+
def atomic_load_zext_glue :
PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
@@ -379,20 +385,20 @@ def atomic_load_aext_glue :
let IsAnyExtLoad = true;
}
-def atomic_load_16_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_16_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i16;
}
-def atomic_load_32_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_32_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i32;
}
-def atomic_load_64_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_64_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i64;
}
@@ -506,12 +512,12 @@ def load_align16_local_m0 : PatFrag<(ops node:$ptr),
}
let IsAtomic = 1, AddressSpaces = LoadAddress_local.AddrSpaces in {
-def atomic_load_16_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_16_glue node:$ptr)>;
-def atomic_load_32_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_32_glue node:$ptr)>;
-def atomic_load_64_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_64_glue node:$ptr)>;
+def atomic_load_nonext_16_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_16_glue node:$ptr)>;
+def atomic_load_nonext_32_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_32_glue node:$ptr)>;
+def atomic_load_nonext_64_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_64_glue node:$ptr)>;
def atomic_load_zext_8_local_m0 : PatFrag<(ops node:$ptr),
(atomic_load_zext_8_glue node:$ptr)>;
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 1ce9190a68f3c..c682f597401ec 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -5384,7 +5384,7 @@ class acquiring_load<PatFrags base>
def atomic_load_azext_acquire_8 : acquiring_load<atomic_load_azext_8>;
def atomic_load_azext_acquire_16 : acquiring_load<atomic_load_azext_16>;
-def atomic_load_acquire_32 : acquiring_load<atomic_load_32>;
+def atomic_load_nonext_acquire_32 : acquiring_load<atomic_load_nonext_32>;
class releasing_store<PatFrag base>
: PatFrag<(ops node:$ptr, node:$val), (base node:$val, node:$ptr), [{
@@ -5399,7 +5399,7 @@ def atomic_store_release_32 : releasing_store<atomic_store_32>;
let AddedComplexity = 8 in {
def : ARMPat<(atomic_load_azext_acquire_8 addr_offset_none:$addr), (LDAB addr_offset_none:$addr)>;
def : ARMPat<(atomic_load_azext_acquire_16 addr_offset_none:$addr), (LDAH addr_offset_none:$addr)>;
- def : ARMPat<(atomic_load_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
+ def : ARMPat<(atomic_load_nonext_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_8 addr_offset_none:$addr, GPR:$val), (STLB GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_16 addr_offset_none:$addr, GPR:$val), (STLH GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_32 addr_offset_none:$addr, GPR:$val), (STL GPR:$val, addr_offset_none:$addr)>;
@@ -6220,9 +6220,9 @@ def : ARMPat<(atomic_load_azext_8 addrmode_imm12:$src),
(LDRBi12 addrmode_imm12:$src)>;
def : ARMPat<(atomic_load_azext_1...
[truncated]
|
@llvm/pr-subscribers-backend-powerpc Author: Craig Topper (topperc) Changesatomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets. Patch is 48.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137428.diff 25 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
index 28d45fe25d30c..f3734e05ae667 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrAtomics.td
@@ -55,9 +55,9 @@ let Predicates = [HasRCPC] in {
// 16-bit loads
def : Pat<(acquiring_load<atomic_load_azext_16> GPR64sp:$ptr), (LDAPRH GPR64sp:$ptr)>;
// 32-bit loads
- def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr), (LDAPRW GPR64sp:$ptr)>;
// 64-bit loads
- def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
+ def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr), (LDAPRX GPR64sp:$ptr)>;
}
// 8-bit loads
@@ -93,62 +93,66 @@ def : Pat<(relaxed_load<atomic_load_azext_16>
(LDURHHi GPR64sp:$Rn, simm9:$offset)>;
// 32-bit loads
-def : Pat<(seq_cst_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_32> GPR64sp:$ptr), (LDARW GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_32> GPR64sp:$ptr),
+ (LDARW GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)),
(LDRWroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)),
(LDRWroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s4:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)),
(LDRWui GPR64sp:$Rn, uimm12s4:$offset)>;
-def : Pat<(relaxed_load<atomic_load_32>
+def : Pat<(relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDURWi GPR64sp:$Rn, simm9:$offset)>;
// 64-bit loads
-def : Pat<(seq_cst_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(acquiring_load<atomic_load_64> GPR64sp:$ptr), (LDARX GPR64sp:$ptr)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend)),
+def : Pat<(seq_cst_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(acquiring_load<atomic_load_nonext_64> GPR64sp:$ptr),
+ (LDARX GPR64sp:$ptr)>;
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)),
(LDRXroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)),
(LDRXroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset)),
+def : Pat<(relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset)),
(LDRXui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(relaxed_load<atomic_load_64>
+def : Pat<(relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDURXi GPR64sp:$Rn, simm9:$offset)>;
// FP 32-bit loads
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Windexed32 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend))))),
(LDRSroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend32:$extend))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (ro_Xindexed32 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend))))),
(LDRSroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend32:$extend)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32> (am_indexed32 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
+ (am_indexed32 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRSui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_32>
+def : Pat<(f32 (bitconvert (i32 (relaxed_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset))))),
(LDURSi GPR64sp:$Rn, simm9:$offset)>;
// FP 64-bit loads
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm,
- ro_Wextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Windexed64 GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend))))),
(LDRDroW GPR64sp:$Rn, GPR32:$Rm, ro_Wextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm,
- ro_Xextend64:$extend))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (ro_Xindexed64 GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend))))),
(LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro_Xextend64:$extend)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64> (am_indexed64 GPR64sp:$Rn,
- uimm12s8:$offset))))),
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
+ (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))))),
(LDRDui GPR64sp:$Rn, uimm12s8:$offset)>;
-def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_64>
+def : Pat<(f64 (bitconvert (i64 (relaxed_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset))))),
(LDURDi GPR64sp:$Rn, simm9:$offset)>;
@@ -561,16 +565,16 @@ let Predicates = [HasLSFE] in {
let Predicates = [HasRCPC3, HasNEON] in {
// LDAP1 loads
def : Pat<(vector_insert (v2i64 VecListOne128:$Rd),
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(vector_insert (v2f64 VecListOne128:$Rd),
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))), (i64 VectorIndexD:$idx)),
(LDAP1 VecListOne128:$Rd, VectorIndexD:$idx, GPR64sp:$Rn)>;
def : Pat<(v1i64 (scalar_to_vector
- (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))),
+ (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))),
(EXTRACT_SUBREG (LDAP1 (v2i64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
def : Pat<(v1f64 (scalar_to_vector
- (f64 (bitconvert (i64 (acquiring_load<atomic_load_64> GPR64sp:$Rn)))))),
+ (f64 (bitconvert (i64 (acquiring_load<atomic_load_nonext_64> GPR64sp:$Rn)))))),
(EXTRACT_SUBREG (LDAP1 (v2f64 (IMPLICIT_DEF)), (i64 0), GPR64sp:$Rn), dsub)>;
// STL1 stores
@@ -597,10 +601,10 @@ let Predicates = [HasRCPC_IMMO, UseLDAPUR] in {
def : Pat<(acquiring_load<atomic_load_azext_16>
(am_unscaled16 GPR64sp:$Rn, simm9:$offset)),
(LDAPURHi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_32>
+ def : Pat<(acquiring_load<atomic_load_nonext_32>
(am_unscaled32 GPR64sp:$Rn, simm9:$offset)),
(LDAPURi GPR64sp:$Rn, simm9:$offset)>;
- def : Pat<(acquiring_load<atomic_load_64>
+ def : Pat<(acquiring_load<atomic_load_nonext_64>
(am_unscaled64 GPR64sp:$Rn, simm9:$offset)),
(LDAPURXi GPR64sp:$Rn, simm9:$offset)>;
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index 6cc76b44f1e14..78a92d85cfd8e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -502,15 +502,15 @@ def zextloadi16_#as : PatFrag<(ops node:$ptr), (zextloadi16 node:$ptr)> {
let IsLoad = 1;
}
-def atomic_load_16_#as : PatFrag<(ops node:$ptr), (atomic_load_16 node:$ptr)> {
+def atomic_load_nonext_16_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_16 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_32_#as : PatFrag<(ops node:$ptr), (atomic_load_32 node:$ptr)> {
+def atomic_load_nonext_32_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_32 node:$ptr)> {
let IsAtomic = 1;
}
-def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {
+def atomic_load_nonext_64_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_64 node:$ptr)> {
let IsAtomic = 1;
}
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 7d64a3dd240c8..efcc81716a0f1 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -959,7 +959,7 @@ defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_aext_16_glo
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i32, atomic_load_zext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_aext_8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i16, atomic_load_zext_8_global>;
-defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_16_global>;
+defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_USHORT", i16, atomic_load_nonext_16_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, extloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_UBYTE", i32, zextloadi8_global>;
defm : MUBUF_Pseudo_Load_Pats<"BUFFER_LOAD_SBYTE", i32, sextloadi8_global>;
@@ -1933,8 +1933,8 @@ def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_const
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, extloadi16_constant>;
def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, zextloadi16_constant>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_32_global>;
-defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_64_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, atomic_load_nonext_32_global>;
+defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, atomic_load_nonext_64_global>;
} // End SubtargetPredicate = isGFX6GFX7
multiclass MUBUFLoad_PatternOffset_Common <string Instr, ValueType vt,
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index 74884a2207079..604eb7f2c3878 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -859,12 +859,12 @@ defm : DSReadPat_t16 <DS_READ_U8, i16, "atomic_load_zext_8_local">;
defm : DSReadPat_mc <DS_READ_U8, i32, "atomic_load_zext_8_local">;
defm : DSReadPat_t16 <DS_READ_I8, i16, "atomic_load_sext_8_local">;
defm : DSReadPat_mc <DS_READ_I8, i32, "atomic_load_sext_8_local">;
-defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_16_local">;
+defm : DSReadPat_t16 <DS_READ_U16, i16, "atomic_load_nonext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_aext_16_local">;
defm : DSReadPat_mc <DS_READ_U16, i32, "atomic_load_zext_16_local">;
defm : DSReadPat_mc <DS_READ_I16, i32, "atomic_load_sext_16_local">;
-defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_32_local">;
-defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_64_local">;
+defm : DSReadPat_mc <DS_READ_B32, i32, "atomic_load_nonext_32_local">;
+defm : DSReadPat_mc <DS_READ_B64, i64, "atomic_load_nonext_64_local">;
let OtherPredicates = [D16PreservesUnusedBits] in {
// TODO: Atomic loads
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index d8bb6e4378924..c17fda1346115 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td
+++ b/llvm/lib/Target/AMDGPU/FLATInstructions.td
@@ -1541,7 +1541,7 @@ def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_aext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, atomic_load_zext_8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_aext_16_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_16_flat, i16>;
+def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_nonext_16_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, atomic_load_zext_16_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;
@@ -1573,8 +1573,8 @@ let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predi
def : FlatStorePat <FLAT_STORE_SHORT_t16, store_flat, i16>;
} // End let OtherPredicates = [D16PreservesUnusedBits, HasFlatAddressSpace], True16Predicate = UseRealTrue16Insts
-def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_32_flat, i32>;
-def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_64_flat, i64>;
+def : FlatLoadPat <FLAT_LOAD_DWORD, atomic_load_nonext_32_flat, i32>;
+def : FlatLoadPat <FLAT_LOAD_DWORDX2, atomic_load_nonext_64_flat, i64>;
def : FlatStorePat <FLAT_STORE_BYTE, truncstorei8_flat, i32>;
def : FlatStorePat <FLAT_STORE_SHORT, truncstorei16_flat, i32>;
@@ -1682,7 +1682,7 @@ defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_aext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, atomic_load_zext_8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_aext_16_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_16_global, i16>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_nonext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_USHORT, atomic_load_zext_16_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, atomic_load_sext_8_global, i32>;
@@ -1733,8 +1733,8 @@ defm : GlobalFLATStorePats <GLOBAL_STORE_DWORDX4, store_global, vt>;
// There is no distinction for atomic load lowering during selection;
// the memory legalizer will set the cache bits and insert the
// appropriate waits.
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_32_global, i32>;
-defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_64_global, i64>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORD, atomic_load_nonext_32_global, i32>;
+defm : GlobalFLATLoadPats <GLOBAL_LOAD_DWORDX2, atomic_load_nonext_64_global, i64>;
defm : GlobalFLATStorePats <GLOBAL_STORE_BYTE, truncstorei8_global, i32>;
defm : GlobalFLATStorePats <GLOBAL_STORE_SHORT, truncstorei16_global, i32>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index ec1fd6fb60d57..5d837d853ac98 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -361,6 +361,12 @@ def load_glue : PatFrag <(ops node:$ptr), (unindexedload_glue node:$ptr)> {
let IsNonExtLoad = 1;
}
+def atomic_load_nonext_glue :
+ PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
+ let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
+ let IsNonExtLoad = true;
+}
+
def atomic_load_zext_glue :
PatFrag<(ops node:$ptr), (AMDGPUatomic_ld_glue node:$ptr)> {
let IsAtomic = true; // FIXME: Should be IsLoad and/or IsAtomic?
@@ -379,20 +385,20 @@ def atomic_load_aext_glue :
let IsAnyExtLoad = true;
}
-def atomic_load_16_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_16_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i16;
}
-def atomic_load_32_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_32_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i32;
}
-def atomic_load_64_glue : PatFrag<(ops node:$ptr),
- (AMDGPUatomic_ld_glue node:$ptr)> {
+def atomic_load_nonext_64_glue : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_glue node:$ptr)> {
let IsAtomic = 1;
let MemoryVT = i64;
}
@@ -506,12 +512,12 @@ def load_align16_local_m0 : PatFrag<(ops node:$ptr),
}
let IsAtomic = 1, AddressSpaces = LoadAddress_local.AddrSpaces in {
-def atomic_load_16_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_16_glue node:$ptr)>;
-def atomic_load_32_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_32_glue node:$ptr)>;
-def atomic_load_64_local_m0 : PatFrag<(ops node:$ptr),
- (atomic_load_64_glue node:$ptr)>;
+def atomic_load_nonext_16_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_16_glue node:$ptr)>;
+def atomic_load_nonext_32_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_32_glue node:$ptr)>;
+def atomic_load_nonext_64_local_m0 : PatFrag<(ops node:$ptr),
+ (atomic_load_nonext_64_glue node:$ptr)>;
def atomic_load_zext_8_local_m0 : PatFrag<(ops node:$ptr),
(atomic_load_zext_8_glue node:$ptr)>;
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 1ce9190a68f3c..c682f597401ec 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -5384,7 +5384,7 @@ class acquiring_load<PatFrags base>
def atomic_load_azext_acquire_8 : acquiring_load<atomic_load_azext_8>;
def atomic_load_azext_acquire_16 : acquiring_load<atomic_load_azext_16>;
-def atomic_load_acquire_32 : acquiring_load<atomic_load_32>;
+def atomic_load_nonext_acquire_32 : acquiring_load<atomic_load_nonext_32>;
class releasing_store<PatFrag base>
: PatFrag<(ops node:$ptr, node:$val), (base node:$val, node:$ptr), [{
@@ -5399,7 +5399,7 @@ def atomic_store_release_32 : releasing_store<atomic_store_32>;
let AddedComplexity = 8 in {
def : ARMPat<(atomic_load_azext_acquire_8 addr_offset_none:$addr), (LDAB addr_offset_none:$addr)>;
def : ARMPat<(atomic_load_azext_acquire_16 addr_offset_none:$addr), (LDAH addr_offset_none:$addr)>;
- def : ARMPat<(atomic_load_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
+ def : ARMPat<(atomic_load_nonext_acquire_32 addr_offset_none:$addr), (LDA addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_8 addr_offset_none:$addr, GPR:$val), (STLB GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_16 addr_offset_none:$addr, GPR:$val), (STLH GPR:$val, addr_offset_none:$addr)>;
def : ARMPat<(atomic_store_release_32 addr_offset_none:$addr, GPR:$val), (STL GPR:$val, addr_offset_none:$addr)>;
@@ -6220,9 +6220,9 @@ def : ARMPat<(atomic_load_azext_8 addrmode_imm12:$src),
(LDRBi12 addrmode_imm12:$src)>;
def : ARMPat<(atomic_load_azext_1...
[truncated]
|
Will they be reintroduced with new semantics or why remove them? |
I don't plan to reintroduce them. I guess I can leave them as dead code to avoid breaking out of tree targets? |
This is more question to #137401 which I slept through. Why do we need to spell out |
If I add the IsNonExtLoad flag to atomic_load_8/16/32/64 it will cause "cannot select" failures for out of tree target that haven't implemented the equivalent of #137279. Spelling it out explicitly and deleting the old names give them build time failures instead. I figured the build time failure was preferable to a test failure. |
I see, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -502,15 +502,15 @@ def zextloadi16_#as : PatFrag<(ops node:$ptr), (zextloadi16 node:$ptr)> { | |||
let IsLoad = 1; | |||
} | |||
|
|||
def atomic_load_16_#as : PatFrag<(ops node:$ptr), (atomic_load_16 node:$ptr)> { | |||
def atomic_load_nonext_16_#as : PatFrag<(ops node:$ptr), (atomic_load_nonext_16 node:$ptr)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could leave the old name (atomic_load_16_#as
) if we ever change atomic_load_nonext_16
back to atomic_load_16
.
Not a strong objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better, but requires touching more code at once
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/15181 Here is the relevant piece of the build log for the reference
|
…/16/32/64. NFC (llvm#137428) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
…/16/32/64. NFC (llvm#137428) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
…/16/32/64. NFC (llvm#137428) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
…/16/32/64. NFC (llvm#137428) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
…/16/32/64. NFC (llvm#137428) This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
atomic_load_8/16/32/64 will be removed in a separate patch as it will affect out of tree targets.