Skip to content

[SYCL][L0][Plugin] Add support for batching copy commands #5155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 20, 2021
Merged

[SYCL][L0][Plugin] Add support for batching copy commands #5155

merged 8 commits into from
Dec 20, 2021

Conversation

asudarsa
Copy link
Contributor

Signed-off-by: Arvind Sudarsanam [email protected]

@asudarsa
Copy link
Contributor Author

/summary:run

pi_uint32 QueueBatchSize = {0};
} command_batch;

command_batch ComputeCommandBatch, CopyCommandBatch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

// be batched together.
// For copy commands, IsCopy is set to 'true'.
// For non-copy commands, IsCopy is set to 'false'.
bool isBatchingAllowed(bool IsCopy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isBatchingAllowed(bool IsCopy);
bool isBatchingAllowed(bool IsCopy) const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as part of bigger commit.

pi_result executeOpenCommandList(bool IsCopy);

// Wrapper function to execute both OpenCommandLists (Copy and Compute).
pi_result executeOpenCommandLists() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more distinct from the executeOpenCommandList:

Suggested change
pi_result executeOpenCommandLists() {
pi_result executeAllOpenCommandLists() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add comment about how one should choose between the 2 versions. I.e. in what circumstances both batches need to be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@@ -2817,6 +2831,15 @@ pi_result piContextRelease(pi_context Context) {
return ContextReleaseHelper(Context);
}

// This function will initialize batch sizes for the compute anc copy command
// lists
static void initializeQueueBatchSizes(pi_queue Queue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move _pi_queue ctor implementation to pi_level_zero.cpp, and remove this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

if (Event->Queue->hasOpenCommandList(false /* IsCopy */) &&
Event->Queue->ComputeCommandBatch.OpenCommandList->first ==
Event->ZeCommandList) {
if (auto Res = Event->Queue->executeOpenCommandList(false /* IsCopy */))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

using IsCopy = bool;

and then you can write arguments like

IsCopy{false}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

Comment on lines 5558 to 5560
bool UseCopyEngine =
(!(Queue->isInOrderQueue()) || UseCopyEngineForInOrderQueue) &&
PreferCopyEngine && Queue->Device->hasCopyEngine();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it into utility pi_queue::useCopyEngine(bool PreferCopyEngine = true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

@@ -5551,7 +5590,8 @@ static pi_result enqueueMemCopyHelper(pi_command_type CommandType,
pi_cast<std::uintptr_t>(ZeEvent));
printZeEventList(WaitList);

if (auto Res = Queue->executeCommandList(CommandList, BlockingWrite))
if (auto Res =
Queue->executeCommandList(CommandList, BlockingWrite, OkToBatch))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the commands that we still don't batch? Maybe we should just make the last argument (OkToBatch) true by default now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have considerable number of call-sites where this argument is 'false' (wait commands, membuffermap, memadvise, etc). So, we don't need to change the default now.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, few minor suggestions

@asudarsa
Copy link
Contributor Author

Thanks Sergey for all your comments. I have tried my best to handle them. Please take a look.

}();

// Static variable that holds batch config info for copy command batching.
static const zeCommandListBatchConfig ZeCommandListBatchCopyConfig = [] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please re-use the initialization lambda from the compute config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean turn lambda into a named function and us it to init both of configs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it....I made the appropriate changes. Thanks

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment to re-use code for env var inititalization

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/doc/EnvironmentVariables.md looks good to me.

@againull againull merged commit 4c3e699 into intel:sycl Dec 20, 2021
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Dec 23, 2021
* upstream/sycl: (1382 commits)
  [SYCL][XPTI] Report memory allocation info from SYCL runtime (intel#5172)
  [CI] Switch labels for OCL x64 job (intel#5185)
  [SYCL] Add basic support for the generic_space address space (intel#5148)
  [CI] Update CODEOWNERS for SYCL printf support passes (intel#5199)
  [SYCL][Matrix] Enable wi_slice for joint_matrix (intel#4979)
  [SYCL][Group algorithms] Move group sort extension to experimental (intel#5169)
  [SYCL] Fix kernel bundles don't really carry kernel IDs (intel#5121)
  [SYCL] Initial printf support for non-constant AS format strings (intel#5069)
  [SYCL][NFC] Fix static code analysis concerns (intel#5189)
  [SYCL][Doc] Fix typos to fix doc build (intel#5190)
  [Driver][SYCL] Turn on -fsycl-dead-args-optimization by default (intel#3004)
  [SYCL][L0][Plugin] Add support for batching copy commands (intel#5155)
  [CI] Add cache checkout script to docker containers (intel#5184)
  [SYCL][Doc] Add HIP backend to the filter selector (intel#5176)
  [Doc] Add documentation for Docker images (intel#4778)
  [LIBCLC] Add functionality for in-kernel asserts for CUDA backend (intel#5174)
  Force opt to use new pass manager in exponential-deferred-inlining test after a8c2ba1
  [SYCL] Add vec and marray support to known_identity type trait (intel#5163)
  Correctly resolve merge conflicts
  Update SPV_INTEL_hw_thread_queries to rev 2
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants