Skip to content

[OpenMP][CodeExtractor]Add align metadata to load instructions #131131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DominikAdamski
Copy link
Contributor

@DominikAdamski DominikAdamski commented Mar 13, 2025

Moving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details.

One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code.

If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization.

Scope of changes:

  1. Analyze MLIR map operations. Add information about the alignment of objects that are passed by reference to OpenMP GPU kernels.
  2. Propagate alignment information to the outlined by CodeExtractor helper functions.

Moving code to another function can lead to missed optimization opportunities,
because function passes operate on smaller chunks of code,
and they cannot figure out all details.

One example of missed optimization opportunities after code extraction
is information about pointer alignment. The instruction combine pass
adds information about pointer alignment to LLVM intrinsic memcpy calls
if it can deduce it from the code or if align metadata is added.
If this information is not present, then further optimization passes
can generate inefficient code.

If we add align metadata to extracted pointers, then the instruction
combine pass can add the align attribute to the LLVM intrinsic memcpy call
and unblock further optimization.
@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-llvm-transforms

Author: Dominik Adamski (DominikAdamski)

Changes

Moving code to another function can lead to missed optimization opportunities, because function passes operate on smaller chunks of code, and they cannot figure out all details.

One example of missed optimization opportunities after code extraction is information about pointer alignment. The instruction combine pass adds information about pointer alignment to LLVM intrinsic memcpy calls if it can deduce it from the code or if align metadata is added. If this information is not present, then further optimization passes can generate inefficient code.

If we add align metadata to extracted pointers, then the instruction combine pass can add the align attribute to the LLVM intrinsic memcpy call and unblock further optimization.


Full diff: https://github.com/llvm/llvm-project/pull/131131.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+20-2)
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index 7277603b3ec2b..61d70a1500028 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -1604,8 +1604,26 @@ void CodeExtractor::emitFunctionBody(
       Idx[1] = ConstantInt::get(Type::getInt32Ty(header->getContext()), aggIdx);
       GetElementPtrInst *GEP = GetElementPtrInst::Create(
           StructArgTy, AggArg, Idx, "gep_" + inputs[i]->getName(), newFuncRoot);
-      RewriteVal = new LoadInst(StructArgTy->getElementType(aggIdx), GEP,
-                                "loadgep_" + inputs[i]->getName(), newFuncRoot);
+      LoadInst *LoadGEP =
+          new LoadInst(StructArgTy->getElementType(aggIdx), GEP,
+                       "loadgep_" + inputs[i]->getName(), newFuncRoot);
+      PointerType *ItemType =
+          dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx));
+      if (ItemType && !LoadGEP->getMetadata(LLVMContext::MD_align)) {
+        unsigned AddressSpace = ItemType->getAddressSpace();
+        unsigned AlignmentValue = oldFunction->getDataLayout()
+                                      .getPointerPrefAlignment(AddressSpace)
+                                      .value();
+
+        MDBuilder MDB(header->getContext());
+        LoadGEP->setMetadata(
+            LLVMContext::MD_align,
+            MDNode::get(
+                header->getContext(),
+                MDB.createConstant(ConstantInt::get(
+                    Type::getInt64Ty(header->getContext()), AlignmentValue))));
+      }
+      RewriteVal = LoadGEP;
       ++aggIdx;
     } else
       RewriteVal = &*ScalarAI++;

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions about alignment:

  • The StructArgTy is allocated at

    Struct = new AllocaInst(StructArgTy, DL.getAllocaAddrSpace(), nullptr,
    . Does this guaranteed to have the alignment returned by getPointerPrefAlignment? If so, could add a comment?

  • Shouldn't there be a pass or similar that propagates alignment information from the alloca to the load? Interprocedurally we have the attributor, so what is doing this within a function? It's not the same function, the LoadGEP is in the extracted function.

"loadgep_" + inputs[i]->getName(), newFuncRoot);
PointerType *ItemType =
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx));
if (ItemType && !LoadGEP->getMetadata(LLVMContext::MD_align)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the !LoadGEP->getMetadata() condition superfluous? How can there be metadata if we just created the instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for pointing out.

@DominikAdamski
Copy link
Contributor Author

I had some questions about alignment:

  • The StructArgTy is allocated at
    Struct = new AllocaInst(StructArgTy, DL.getAllocaAddrSpace(), nullptr,

    . Does this guaranteed to have the alignment returned by getPointerPrefAlignment? If so, could add a comment?
  • Shouldn't there be a pass or similar that propagates alignment information from the alloca to the load? Interprocedurally we have the attributor, so what is doing this within a function? It's not the same function, the LoadGEP is in the extracted function.

I added a comment and ensured that we use the same function for alignment calculation. I modified the struct allocation - I explicitly set the alignment for struct allocation. Changes in the struct allocation constructors are non-functional. I explicitly set the alignment parameter in the constructor just to ease code analysis.

Previous constructor:

AllocaInst::AllocaInst(Type *Ty, unsigned AddrSpace, Value *ArraySize,

calls DL.getPrefTypeAlign:
static Align computeAllocaDefaultAlign(Type *Ty, InsertPosition Pos) {

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoadInst is loading a member of the struct whose alignment is not necessarily the same as the struct's alignment itself. for instance:

struct {
  char a,b;
  int c;
} MyStruct;

alignof(MyStruct) should be 4, but b will offset 1 and only char-aligned.

What the alignment of each member is, is determined by the ABI, and I am not sure it will always be the processor's preferred alignment.

Consider adding alignment info to the store for live-out values as well.

Comment on lines 1610 to 1612
PointerType *ItemType =
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx));
if (ItemType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PointerType *ItemType =
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx));
if (ItemType) {
if (PointerType *ItemType =
dyn_cast<PointerType>(StructArgTy->getElementType(aggIdx))) {

This reverts commit 3a41608.
LLVM IR language reference manual states that align metadata
tells the optimizer that the value loaded is known to be aligned
to a boundary specified by the integer value in the metadata node.

This information is used by the optimizer, for example,
to generate more efficient memcpy calls. The LLVM Optimizer requires
align metadata to generate optimized code because information about
the alignment of objects is lost during OpenMP target code generation
(outlining of loop body helper function).
@DominikAdamski DominikAdamski changed the title [CodeExtractor] Add align metadata to extracted pointers [OpenMP][CodeExtractor]Add align metadata to load instructions Apr 7, 2025
@DominikAdamski
Copy link
Contributor Author

Scope of changes:

  1. Analyze MLIR map operations. Add information about the alignment of objects that are passed by reference to OpenMP GPU kernels.
  2. Propagate alignment information to the outlined helper functions.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider also updatating the PR summary, you probably want to use it as commit message.

Comment on lines +1640 to +1641
AlignmentValue =
inputs[i]->stripPointerCasts()->getPointerAlignment(DL).value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make getPointerAlignment to strip irrelevant casts itsself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getPointerAlignment function is from the Value class. Do you think it's worth adding another version of this function to this basic class? If yes, I will create a separate patch that will contain a version of the getPointerAlignment function with two arguments. One of them will be a flag to get the value alignment without pointer casts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think improving getPointerAlignment itself would be generally useful. I was using it myself in

// TODO: Would be great if this could determine alignment through a GEP
EffectiveAlign = AtomicPtr->getPointerAlignment(EmitOptions.DL);
and was disappointed how quickly it gives up.

But also maybe does not belong into this PR.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@DominikAdamski DominikAdamski merged commit adfc577 into llvm:main Apr 10, 2025
11 checks passed
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…131131)

Moving code to another function can lead to missed optimization
opportunities, because function passes operate on smaller chunks of
code, and they cannot figure out all details.

One example of missed optimization opportunities after code extraction
is information about pointer alignment. The instruction combine pass
adds information about pointer alignment to LLVM intrinsic memcpy calls
if it can deduce it from the code or if align metadata is added. If this
information is not present, then further optimization passes can
generate inefficient code.

If we add align metadata to extracted pointers, then the instruction
combine pass can add the align attribute to the LLVM intrinsic memcpy
call and unblock further optimization.

Scope of changes:
1. Analyze MLIR map operations. Add information about the alignment of
objects that are passed by reference to OpenMP GPU kernels.
2. Propagate alignment information to the outlined by `CodeExtractor`
helper functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants