-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP][CodeExtractor]Add align metadata to load instructions #131131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP][CodeExtractor]Add align metadata to load instructions #131131
Conversation
Moving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details. One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code. If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization.
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-llvm-transforms Author: Dominik Adamski (DominikAdamski) ChangesMoving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details. One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code. If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization. Full diff: https://github.com/llvm/llvm-project/pull/131131.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index 7277603b3ec2b..61d70a1500028 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -1604,8 +1604,26 @@ void CodeExtractor::emitFunctionBody(
Idx[1] = ConstantInt::get(Type::getInt32Ty(header->getContext()), aggIdx);
GetElementPtrInst *GEP = GetElementPtrInst::Create(
StructArgTy, AggArg, Idx, "gep_" + inputs[i]->getName(), newFuncRoot);
- RewriteVal = new LoadInst(StructArgTy->getElementType(aggIdx), GEP,
- "loadgep_" + inputs[i]->getName(), newFuncRoot);
+ LoadInst *LoadGEP =
+ new LoadInst(StructArgTy->getElementType(aggIdx), GEP,
+ "loadgep_" + inputs[i]->getName(), newFuncRoot);
+ PointerType *ItemType =
+ dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx));
+ if (ItemType && !LoadGEP->getMetadata(LLVMContext::MD_align)) {
+ unsigned AddressSpace = ItemType->getAddressSpace();
+ unsigned AlignmentValue = oldFunction->getDataLayout()
+ .getPointerPrefAlignment(AddressSpace)
+ .value();
+
+ MDBuilder MDB(header->getContext());
+ LoadGEP->setMetadata(
+ LLVMContext::MD_align,
+ MDNode::get(
+ header->getContext(),
+ MDB.createConstant(ConstantInt::get(
+ Type::getInt64Ty(header->getContext()), AlignmentValue))));
+ }
+ RewriteVal = LoadGEP;
++aggIdx;
} else
RewriteVal = &*ScalarAI++;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some questions about alignment:
-
The StructArgTy is allocated at
Struct = new AllocaInst(StructArgTy, DL.getAllocaAddrSpace(), nullptr, getPointerPrefAlignment
? If so, could add a comment? -
Shouldn't there be a pass or similar that propagates alignment information from the alloca to the load? Interprocedurally we have the attributor, so what is doing this within a function?It's not the same function, the LoadGEP is in the extracted function.
"loadgep_" + inputs[i]->getName(), newFuncRoot); | ||
PointerType *ItemType = | ||
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx)); | ||
if (ItemType && !LoadGEP->getMetadata(LLVMContext::MD_align)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the !LoadGEP->getMetadata()
condition superfluous? How can there be metadata if we just created the instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for pointing out.
I added a comment and ensured that we use the same function for alignment calculation. I modified the struct allocation - I explicitly set the alignment for struct allocation. Changes in the struct allocation constructors are non-functional. I explicitly set the alignment parameter in the constructor just to ease code analysis. Previous constructor: llvm-project/llvm/lib/IR/Instructions.cpp Line 1260 in 0a5847f
calls DL.getPrefTypeAlign :llvm-project/llvm/lib/IR/Instructions.cpp Line 1246 in 0a5847f
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoadInst is loading a member of the struct whose alignment is not necessarily the same as the struct's alignment itself. for instance:
struct {
char a,b;
int c;
} MyStruct;
alignof(MyStruct)
should be 4, but b
will offset 1 and only char-aligned.
What the alignment of each member is, is determined by the ABI, and I am not sure it will always be the processor's preferred alignment.
Consider adding alignment info to the store for live-out values as well.
PointerType *ItemType = | ||
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx)); | ||
if (ItemType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PointerType *ItemType = | |
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx)); | |
if (ItemType) { | |
if (PointerType *ItemType = | |
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx))) { |
This reverts commit 3a41608.
This reverts commit 802fef5.
LLVM IR language reference manual states that align metadata tells the optimizer that the value loaded is known to be aligned to a boundary specified by the integer value in the metadata node. This information is used by the optimizer, for example, to generate more efficient memcpy calls. The LLVM Optimizer requires align metadata to generate optimized code because information about the alignment of objects is lost during OpenMP target code generation (outlining of loop body helper function).
Scope of changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider also updatating the PR summary, you probably want to use it as commit message.
AlignmentValue = | ||
inputs[i]->stripPointerCasts()->getPointerAlignment(DL).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make getPointerAlignment
to strip irrelevant casts itsself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getPointerAlignment
function is from the Value
class. Do you think it's worth adding another version of this function to this basic class? If yes, I will create a separate patch that will contain a version of the getPointerAlignment
function with two arguments. One of them will be a flag to get the value alignment without pointer casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think improving getPointerAlignment
itself would be generally useful. I was using it myself in
llvm-project/llvm/lib/Transforms/Utils/BuildBuiltins.cpp
Lines 379 to 380 in 70c65b3
// TODO: Would be great if this could determine alignment through a GEP | |
EffectiveAlign = AtomicPtr->getPointerAlignment(EmitOptions.DL); |
But also maybe does not belong into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
…131131) Moving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details. One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code. If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization. Scope of changes: 1. Analyze MLIR map operations. Add information about the alignment of objects that are passed by reference to OpenMP GPU kernels. 2. Propagate alignment information to the outlined by `CodeExtractor` helper functions.
Moving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details.
One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code.
If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization.
Scope of changes:
CodeExtractor
helper functions.