-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] GFX12: select @llvm.prefetch intrinsic #74576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2375974
818c33d
3d56730
a4366be
c2b2ede
0598ae2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -814,6 +814,14 @@ def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{ return isUniformL | |
}]; | ||
} | ||
|
||
def smrd_prefetch : PatFrag <(ops node:$ptr, node:$rw, node:$loc, node:$type), | ||
(prefetch node:$ptr, node:$rw, node:$loc, node:$type), | ||
[{ return !N->getOperand(1)->isDivergent();}]> { | ||
let GISelPredicateCode = [{ | ||
return isInstrUniform(MI); | ||
}]; | ||
} | ||
|
||
def SMRDImm : ComplexPattern<iPTR, 2, "SelectSMRDImm">; | ||
def SMRDImm32 : ComplexPattern<iPTR, 2, "SelectSMRDImm32">; | ||
def SMRDSgpr : ComplexPattern<iPTR, 2, "SelectSMRDSgpr">; | ||
|
@@ -959,6 +967,32 @@ def : GCNPat < | |
} | ||
} // let OtherPredicates = [HasShaderCyclesRegister] | ||
|
||
def SIMM24bitPtr : ImmLeaf <iPTR, | ||
[{return isInt<24>(Imm);}] | ||
>; | ||
|
||
multiclass SMPrefetchPat<string type, int cache_type> { | ||
def : GCNPat < | ||
(smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, (i32 cache_type)), | ||
(!cast<SM_Prefetch_Pseudo>("S_PREFETCH_"#type) $sbase, $offset, (i32 SGPR_NULL), (i8 0)) | ||
>; | ||
|
||
def : GCNPat < | ||
(smrd_prefetch (i64 SReg_64:$sbase), timm, timm, (i32 cache_type)), | ||
(!cast<SM_Prefetch_Pseudo>("S_PREFETCH_"#type) $sbase, 0, (i32 SGPR_NULL), (i8 0)) | ||
>; | ||
|
||
def : GCNPat < | ||
(prefetch SIMM24bitPtr:$offset, timm, timm, (i32 cache_type)), | ||
(!cast<SM_Prefetch_Pseudo>("S_PREFETCH_"#type#"_PC_REL") (as_i32timm $offset), (i32 SGPR_NULL), (i8 0)) | ||
> { | ||
let AddedComplexity = 10; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this pattern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you do not have pointer (essentially provide a null to the prefetch intrinsic as a base pointer) this pc_rel pattern will be used. It may have no value as a data prefetch, but makes sense as inst prefetch. Say if you want to prefetch a next page of code. Like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would interpret this as using the absolute address, you would need something else to represent a PC relative input There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe for now I will remove PC_REL part. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefetch on an absolute address is practically useless. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that is how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So you want a target intrinsic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't know. What would the use cases look like? Maybe it could be a generic intrinsic, if there is consensus that it is useful. For the existing llvm.prefetch intrinsic, the only useful case I think of for instruction prefetching is:
to prefetch the code at the start of a function you are going to call. We could codegen that case using the _pc_rel form of the instruction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think we need to use PC_REL form to prefetch on a function's address. The instruction can take full 64-bit address, so one can just use this address. My understanding that PC_REL form can be useful if you expect something like a huge loop or a local branch and want to prefetch something like 1K from the PC. I am not sure though how useful this can be at a high language level or even in IR. |
||
} | ||
|
||
defm : SMPrefetchPat<"INST", 0>; | ||
defm : SMPrefetchPat<"DATA", 1>; | ||
|
||
//===----------------------------------------------------------------------===// | ||
// GFX10. | ||
//===----------------------------------------------------------------------===// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a G_ instruction.
Why erase here and not in the legalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have G_PREFETCH, but we do not currently have it.
Then I honestly do not remember why do I erase it here. Likely because I am erasing it here for VGPRBank anyway.