Skip to content

[OMPIRBuilder] - Handle dependencies in createTarget #93977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bhandarkar-pranav
Copy link
Contributor

This PR handles dependencies specified by the depend clause on an OpenMP target construct. It does this much the same way clang does it by materializing an OpenMP task that is tagged with the dependencies.

To aid review, here are the relevant implementation details

  1. createTarget - This function itself is largely unchanged except that it now accepts a vector of DependData objects that it simply forwards to emitTargetCall
  2. emitTargetCall - This function has changed now to check if an outer target-task needs to be materialized (Yes, if target construct has nowait or has depend clause). If yes, it calls emitTargetTask to do all the heavy lifting for creating and dispatching the task.
  3. emitTargetTask - Bulk of the PR is here. See the large comment explaining what it does at the beginning of the function.

Things that will/should follow in subsequent PRs

  1. Handle this for target enter data, exit data and update data constructs
  2. (Nice to have) - Outline logic for creating code for a task and use in createTask and emitTargetTask

@bhandarkar-pranav
Copy link
Contributor Author

ping! Gentle reminder, I'd appreciate reviews on this.

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pranavb-ca! I did a first round and have a few comments.

DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr");

unsigned P = 0;
for (const OpenMPIRBuilder::DependData &Dep : Dependencies) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is a similar loop in OpenMPIRBuilder::createTask. Can we outline this to a shared util used in both locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my intention was to switch OpenMPIRBuilder::createTask to use this function but i had a couple of other changes to OpenMPIRBuilder::createTask in mind so I decided to leave that out and roll it into a different PR. If you prefer, I can make the switch in this PR itself though. Just a preference, no strong opinions on this.

RTLoc, TargetTaskAllocaIP));

OI.ExitBB = Builder.saveIP().getBlock();
OI.PostOutlineCB = [this, ToBeDeleted, Dependencies,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there are pieces in this lambda that can be outlined into shared utils and called from here and OpenMPIRBuilder::createTask. For example, the logic to setup TaskData (but, other parts as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is an active Todo which I was planning to add in a different PR because now this PR doesn't touch the handling of the task construct at all. But that's only a preference, I'll be happy to roll those changes into this PR too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes ToBeDeleted and Dependencies by-value to the lambda, i.e. makes a copy of those lists. If called multiple times, items in ToBeDeleted will be erased multiple times. I assume this is not intended?

Suggested change
OI.PostOutlineCB = [this, ToBeDeleted, Dependencies,
OI.PostOutlineCB = [this, &ToBeDeleted, &Dependencies,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes ToBeDeleted and Dependencies by-value to the lambda, i.e. makes a copy of those lists. If called multiple times, items in ToBeDeleted will be erased multiple times. I assume this is not intended?

In fact a copy is necessary to "own" the data. PostOutlineCB is called only once so multiple deletions is not a problem. However, and what is more important to note, is that it is called long after ToBeDeleted is destroyed (that is after we have returned from emitTargetTask) making it necessary to keep a copy around.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Pranav, I think this looks good. I just have a few minor comments.

dds.emplace_back(dd);
}
}
if (!taskOp.getDependVars().empty() && taskOp.getDepends())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd suggest moving this check into buildDependData, since it always needs to be done. It would just return without adding anything to dds if the condition is not met.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this comment hasn't been addressed. Let me know if you just missed it, if you disagree with it or if it isn't very clear and I should try to explain better.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full a review, but the first notes that I started with

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, looks quite good. Just some style comments.

Comment on lines 5314 to 5315
AllocaInst *ArgStructAlloca =
dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AllocaInst *ArgStructAlloca =
dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
auto *ArgStructAlloca =
dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));

Comment on lines 5626 to 5629
while (!ToBeDeleted.empty()) {
ToBeDeleted.top()->eraseFromParent();
ToBeDeleted.pop();
}
Copy link
Member

@Meinersbur Meinersbur Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (!ToBeDeleted.empty()) {
ToBeDeleted.top()->eraseFromParent();
ToBeDeleted.pop();
}
llvm::for_each(ToBeDeleted, [](Instruction *I) { I->ereaseFromParent(); } );

or a simple foreach-loop. ToBeDeleted goes out of scope anyway.

OI.OuterAllocaBB = AllocaIP.getBlock();

// Add the thread ID argument.
std::stack<Instruction *> ToBeDeleted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::stack<Instruction *> ToBeDeleted;
SmallVector<Instruction *> ToBeDeleted;

std::stack uses std::deque (by default), which is quite malloc-heavy. From cppreference:

deques typically have large minimal memory cost; a deque holding just one element has to allocate its full internal array (e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size or 4096 bytes, whichever is larger, on 64-bit libc++).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meinersbur -> The only reason, I didnt do it is because createFakeIntVal is used by a few other functions not related to this PR. So changing ToBeDeleted from std::stack to SmallVector would involve changing other functions (such as createTeams) which are not related to this PR. I could go ahead and add that to this PR. Or simply open a separate PR just to make this change which should be quick. Let me know what you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meinersbur - Please take a look. I have gone ahead and made this change. If you agree with my reason above then I'll simply back out that one commit and make it a new PR.

static void
buildDependData(std::optional<ArrayAttr> depends, OperandRange dependVars,
LLVM::ModuleTranslation &moduleTranslation,
SmallVector<llvm::OpenMPIRBuilder::DependData> &dds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<llvm::OpenMPIRBuilder::DependData> &dds) {
SmallVectorImpl<llvm::OpenMPIRBuilder::DependData> &dds) {

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bhandarkar-pranav
Copy link
Contributor Author

@Meinersbur, @ergawy, @skatrak - Could you please give this another look. I have updated the testcase target-depend.f90 because the previous version was flaky; I had used depend incorrectly which I have fixed now. (Previously, depend was used on tasks that were not sibling tasks)

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on my previous comments. I just have a couple more minor ones.

dds.emplace_back(dd);
}
}
if (!taskOp.getDependVars().empty() && taskOp.getDepends())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this comment hasn't been addressed. Let me know if you just missed it, if you disagree with it or if it isn't very clear and I should try to explain better.

@@ -3088,10 +3102,13 @@ convertOmpTarget(Operation &opInst, llvm::IRBuilderBase &builder,
if (!mapData.IsDeclareTarget[i] && !mapData.IsAMember[i])
kernelInput.push_back(mapData.OriginalValue[i]);
}
SmallVector<llvm::OpenMPIRBuilder::DependData> dds;
buildDependData(targetOp.getOperation(), moduleTranslation, dds);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skatrak - Apologies for missing your previous comment about moving the empty depends list check into buildDependData. I have done that now in this iteration of the PR.

@bhandarkar-pranav
Copy link
Contributor Author

@skatrak - I have updated the PR with changes based on your feedback. Could you please take a look?

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Pranav, LGTM. I have just one more comment but no need to wait for another review by me before merging this. Just make sure @Meinersbur is happy with it as well!

@@ -682,6 +682,41 @@ convertOmpTeams(omp::TeamsOp op, llvm::IRBuilderBase &builder,
return bodyGenStatus;
}

static void
buildDependData(Operation *op, LLVM::ModuleTranslation &moduleTranslation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest taking std::optional<ArrayAttr> depends and OperandRange dependVars as arguments instead of Operation *op here. This way you don't need to try casting op to each supported operation below, but rather make the body of this function what the processDepends lambda inside currently has.

dds.emplace_back(dd);
}
}
buildDependData(taskOp.getOperation(), moduleTranslation, dds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to how this call could look if following my comment above:

Suggested change
buildDependData(taskOp.getOperation(), moduleTranslation, dds);
buildDependData(moduleTranslation, taskOp.getDepends(), taskOp.getDependVars(), dds);

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bhandarkar-pranav bhandarkar-pranav merged commit d7e185c into llvm:main Jul 22, 2024
7 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.

The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants