[SYCL] discard_write optimization #2854

cperkinsintel · 2020-12-03T01:29:52Z

When using discard_write with a host accessor, we do not need to keep the buffer updated with the latest changes on the device. Skipping that operation results in a significant speed-up.

When we have host unified memory, then Map/Unmap operations are enqueued. As these must always match, it is up to the backend (or its plugin) to take advantage of any optimization. In the case of Level Zero, this can be be done in the SYCL plugin interface code. This PR includes a fix for LevelZero.
CUDA requires no change. OpenCL will need to be changed in the library itself.

When we are not using host unified memory, instead of Map/Unmap operations, we enqueue basic Read/Write ops. When using discard_write on the host, the Read op is unnecessary. This PR schedules an empty operation instead.

…used, we do not need to perform a memcpy. Can result in significant peformance improvement when using discard_write access mode flag Signed-off-by: Chris Perkins <[email protected]>

…d write membuffer operations are enqueued instead of Map/Unmap. But the read is unecessary. This avoids the redundant op.

sycl/source/detail/scheduler/graph_builder.cpp

Signed-off-by: Chris Perkins <[email protected]>

smaslov-intel · 2020-12-07T09:04:01Z

sycl/source/detail/scheduler/commands.cpp

-        MQueue, MDstReq.MDims, MDstReq.MMemoryRange, MDstReq.MAccessRange,
-        MDstReq.MOffset, MDstReq.MElemSize, std::move(RawEvents), Event);
-  }
+  MemoryManager::copy(


CUDA requires no change. OpenCL will need to be changed in the library itself.

Does it mean that this unconditional copy is expected to slow down OpenCL execution currently?
And why CUDA does not need a change?

Maybe my commit comment could be clearer. There are two sides: unified memory and not.

This particular line of code you are commenting on is for when we do not have unified memory. In that case, the underlying backend doesn't really matter. SYCL itself is scheduling individual mem read/writes to maintain the coherency, and this optimization will avoid the needless mem read.

The comment about CUDA requires no change is in the context of when there is unified memory. Then we are scheduling paired map/unmap operations. Any optimization will have to be performed by the backend (or it's PI interface). In the case of Level Zero, this PR adds the optimization to its PI interface. In the case of CUDA, it looks like the PI interface is already performing the optimization (and microbenchmarks confirm that). For OpenCL, the CL_MAP_WRITE_INVALIDATE_REGION flag is being passed by the PI to the OpenCL plugin, but no optimization seems to be occurring (as tested by a simple benchmark).

In the case of CUDA, it looks like the PI interface is already performing the optimization (and microbenchmarks confirm that)

Could you double check and spot it in the CUDA PI plugin sources?

For OpenCL, the CL_MAP_WRITE_INVALIDATE_REGION flag is being passed by the PI to the OpenCL plugin, but no optimization seems to be occurring (as tested by a simple benchmark).

This is strange that OpenCL doesn't optimize this, are you going to follow up with OpenCL team?

Could you double check and spot it in the CUDA PI plugin sources?

It's at

llvm/sycl/plugins/cuda/pi_cuda.cpp

Line 4120 in 6733c8b

CL_MAP_WRITE_INVALIDATE_REGION))) {

. with a supporting comment, but I never step traced it. But the benchmark confirms the difference.

This is strange that OpenCL doesn't optimize this, are you going to follow up with OpenCL team?

Agreed. That is the plan.

sycl/plugins/level_zero/pi_level_zero.cpp

smaslov-intel

LGTM

sergey-semenov · 2020-12-08T08:43:05Z

@cperkinsintel Please, add a test for the read/write part of the change as a separate PR.

Original commit: KhronosGroup/SPIRV-LLVM-Translator@23b14ed3d88418e

cperkinsintel added 3 commits November 23, 2020 14:01

When discard_write (ie CL_MAP_WRITE_INVALIDATE_REGION) flag is being …

9a1f5f8

…used, we do not need to perform a memcpy. Can result in significant peformance improvement when using discard_write access mode flag Signed-off-by: Chris Perkins <[email protected]>

discard_write optimation. When not using unified host memory, read an…

11cdf0d

…d write membuffer operations are enqueued instead of Map/Unmap. But the read is unecessary. This avoids the redundant op.

removed no longer needed check in MemCpyCommand::enqueueImp.

939d82a

cperkinsintel requested a review from sergey-semenov December 3, 2020 02:38

cperkinsintel commented Dec 3, 2020

View reviewed changes

sycl/source/detail/scheduler/graph_builder.cpp Show resolved Hide resolved

cperkinsintel changed the title ~~[SYCL] discard_write optimization -- DRAFT~~ [SYCL] discard_write optimization Dec 3, 2020

cperkinsintel marked this pull request as ready for review December 3, 2020 20:31

cperkinsintel requested review from smaslov-intel and a team as code owners December 3, 2020 20:31

sergey-semenov reviewed Dec 4, 2020

View reviewed changes

sycl/source/detail/scheduler/graph_builder.cpp Show resolved Hide resolved

sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved

sycl/source/detail/scheduler/graph_builder.cpp Outdated Show resolved Hide resolved

switching to nullptr

bcab2bb

Signed-off-by: Chris Perkins <[email protected]>

cperkinsintel requested a review from sergey-semenov December 4, 2020 22:35

smaslov-intel suggested changes Dec 7, 2020

View reviewed changes

smaslov-intel approved these changes Dec 7, 2020

View reviewed changes

bader merged commit 6733c8b into intel:sycl Dec 7, 2020

jsji added a commit that referenced this pull request Nov 22, 2024

Remove unused var (#2854)

ea9c0c5

Original commit: KhronosGroup/SPIRV-LLVM-Translator@23b14ed3d88418e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] discard_write optimization #2854

[SYCL] discard_write optimization #2854

Uh oh!

cperkinsintel commented Dec 3, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smaslov-intel Dec 7, 2020

Uh oh!

cperkinsintel Dec 7, 2020

Uh oh!

smaslov-intel Dec 7, 2020

Uh oh!

cperkinsintel Dec 7, 2020 •

edited

Loading

Uh oh!

Uh oh!

smaslov-intel left a comment

Uh oh!

sergey-semenov commented Dec 8, 2020

Uh oh!

Uh oh!

[SYCL] discard_write optimization #2854

[SYCL] discard_write optimization #2854

Uh oh!

Conversation

cperkinsintel commented Dec 3, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smaslov-intel Dec 7, 2020

Choose a reason for hiding this comment

Uh oh!

cperkinsintel Dec 7, 2020

Choose a reason for hiding this comment

Uh oh!

smaslov-intel Dec 7, 2020

Choose a reason for hiding this comment

Uh oh!

cperkinsintel Dec 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smaslov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

sergey-semenov commented Dec 8, 2020

Uh oh!

Uh oh!

cperkinsintel Dec 7, 2020 •

edited

Loading