[SYCL] Fix broken uniqueness in make_device #6204

JackAKirk · 2022-05-26T17:39:00Z

This is a general fix for #6055.
CUDA device interop is not available yet but a corresponding fix will be added to the CUDA specialization of make_device in #6202 shortly.

If it is uncontroversial I will add an overload for device::get_devices that takes a backend argument and then only returns devices corresponding to that backend? This would allow removing the dev.get_backend() == Backend check introduced in this PR and make the implementation more efficient.

Note that I expect there could be similar uniqueness issues for other interop types such as context etc.. We will be considering such issues in #6055.

I think that interop tests basically comprising the code sample in #6055 would be useful. I can add such a test for device to intel/llvm-test-suite.

Signed-off-by: JackAKirk [email protected]

Signed-off-by: JackAKirk <[email protected]>

smaslov-intel · 2022-05-30T16:19:15Z

Why is this not relying on the existing device-cache? I would expect that the new "device_impl" would not be created because of this code:

llvm/sycl/source/detail/platform_impl.cpp

Line 219 in f6420c7

// If we've already seen this device, return the impl

If a device is created via the interop API it will always create a new device_impl even when there already y exist a device_impl with the same backend handle.

I do not say it is not a problem, but where in the SYCL2020 OpenCL backend spec is it saying what should be returned by "make_device"? Do we need to update it?

Note that I expect there could be similar uniqueness issues for other interop types such as context etc.. We will be considering such issues in #6055.

Why context? I think only platform and device should be such, no?
Others would create unique objects.

I think that interop tests basically comprising the code sample in #6055 would be useful. I can add such a test for device to intel/llvm-test-suite.

Yes, please add such test(s).

JackAKirk · 2022-05-31T10:51:16Z

Why is this not relying on the existing device-cache? I would expect that the new "device_impl" would not be created because of this code:

llvm/sycl/source/detail/platform_impl.cpp

Line 219 in f6420c7

// If we've already seen this device, return the impl

If a device is created via the interop API it will always create a new device_impl even when there already y exist a device_impl with the same backend handle.

I do not say it is not a problem, but where in the SYCL2020 OpenCL backend spec is it saying what should be returned by "make_device"? Do we need to update it?

Note that I expect there could be similar uniqueness issues for other interop types such as context etc.. We will be considering such issues in #6055.

Why context? I think only platform and device should be such, no? Others would create unique objects.

I think that interop tests basically comprising the code sample in #6055 would be useful. I can add such a test for device to intel/llvm-test-suite.

Yes, please add such test(s).

It is relying on device-cache in the platform class since this patch.
You can see here:

llvm/sycl/source/backend.cpp

Line 64 in 3c1d342

std::make_shared<device_impl>(PiDevice, Plugin));

that createSyclObjFromImpl<device> does not call the Platform->getOrMakeDeviceImpl method that checks the device-cache. This is why we needed to call device::get_devices that does check the device-cache and as far as I can tell is the only existing route that checks the device cache of all platforms within a backend.

I think you're right that this use-case and expected behaviour is not described in the OpenCL backend spec. However currently the runtime is allowing the creation of a second shared_ptr that takes ownership of the native object in the sample code here: #6055.
It doesn't seem to be a good idea to allow multiple independent shared_ptrs own the same object if it can be helped. What do you think?

I don't see any difference between the above motivation for device and the context case: however a sycl::context maps to multiple cuda contexts, so that I'm not sure we can easily do much about this issue for context at least for the CUDA backend (the same is true of queue and event). We are still finalizing the correct behaviour of interop within the CUDA backend. The most straightforward interop case is device for the CUDA backend.

smaslov-intel · 2022-06-01T14:00:36Z

that createSyclObjFromImpl<device> does not call the Platform->getOrMakeDeviceImpl method that checks the device-cache. This is why we needed to call device::get_devices that does check the device-cache and as far as I can tell is the only existing route that checks the device cache of all platforms within a backend.

That's why I am asking should we rather change the "make_device" to be aware of the device-cache instead of using get_devices where there was no need to read all the devices?

I think you're right that this use-case and expected behaviour is not described in the OpenCL backend spec.

Would you spawn a spec issue to clarify this, please?

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2022-06-02T15:54:16Z

that createSyclObjFromImpl<device> does not call the Platform->getOrMakeDeviceImpl method that checks the device-cache. This is why we needed to call device::get_devices that does check the device-cache and as far as I can tell is the only existing route that checks the device cache of all platforms within a backend.

That's why I am asking should we rather change the "make_device" to be aware of the device-cache instead of using get_devices where there was no need to read all the devices?

MDeviceCache is a private member of platform_impl so I thought it could be controversial to let make_device access it directly? It is true that if make_device were to access it directly the impl could be made a bit more efficient, if it is worth it.
Unless you meant just iterating over only the devices that are formed from platforms within the specified backend directly within make_device, as I have done in the latest commit, rather than overloading device::get_devices to take a backend argument?

I think you're right that this use-case and expected behaviour is not described in the OpenCL backend spec.

Would you spawn a spec issue to clarify this, please?

OK

Signed-off-by: JackAKirk <[email protected]>

smaslov-intel · 2022-06-05T14:23:13Z

MDeviceCache is a private member of platform_impl so I thought it could be controversial to let make_device access it directly? It is true that if make_device were to access it directly the impl could be made a bit more efficient, if it is worth it.
Unless you meant just iterating over only the devices that are formed from platforms within the specified backend directly within make_device, as I have done in the latest commit, rather than overloading device::get_devices to take a backend argument?

The platform.get_devices() uses platform_impl.getOrMakeDeviceImpl to do the caching. I think having the make_device to similarly obtain a device impl through platform_impl.getOrMakeDeviceImpl makes it a more consistent (and performant) implementation.

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2022-07-14T15:18:34Z

MDeviceCache is a private member of platform_impl so I thought it could be controversial to let make_device access it directly? It is true that if make_device were to access it directly the impl could be made a bit more efficient, if it is worth it.
Unless you meant just iterating over only the devices that are formed from platforms within the specified backend directly within make_device, as I have done in the latest commit, rather than overloading device::get_devices to take a backend argument?

The platform.get_devices() uses platform_impl.getOrMakeDeviceImpl to do the caching. I think having the make_device to similarly obtain a device impl through platform_impl.getOrMakeDeviceImpl makes it a more consistent (and performant) implementation.

I don't think that we can directly use platform_impl.getOrMakeDeviceImpl because a backend can contain more than one platform. Therefore if we do not find the device in MDeviceCache we do not want to call:

  // Otherwise make the impl
  std::shared_ptr<device_impl> Result =
      std::make_shared<device_impl>(PiDevice, PlatformImpl);
  MDeviceCache.emplace_back(Result);

until we have checked all platforms in the backend.

I've followed through with your suggestion with a rough draft implementation using a new member: platform_impl.getDeviceImpl which only performs the first part of platform_impl.getOrMakeDeviceImpl. My latest commit will fix the assert fail in:

#include <sycl/sycl.hpp>

int main() {
  sycl::device sdev(sycl::default_selector{});
  auto ocl_dev = sycl::get_native<sycl::backend::opencl>(sdev);
  sycl::device dev(sycl::make_device<sycl::backend::opencl>(ocl_dev));
  assert(sdev == dev);
}

However if we have something like this (perhaps unusual) case:

#include "cuda_helper.hpp"
#define SYCL_EXT_ONEAPI_BACKEND_CUDA_EXPERIMENTAL 1
#include <sycl/ext/oneapi/experimental/backend/cuda.hpp>
#include <sycl/sycl.hpp>

int main() {
  sycl::device sdev(sycl::default_selector{});
  CUdevice m_cu_device;
  cuDeviceGet(&m_cu_device, 0);
  CUDA_CHECK(cuDeviceGet(&m_cu_device, 0));
  sycl::device dev1(
      sycl::make_device<sycl::backend::ext_oneapi_cuda>(m_cu_device));

  auto m_cu_device_clone =
      sycl::get_native<sycl::backend::ext_oneapi_cuda>(dev1);
  sycl::device dev1_clone(
      sycl::make_device<sycl::backend::ext_oneapi_cuda>(m_cu_device_clone));

  assert(dev1 == dev1_clone);
}

we will hit the assert because we have not updated the MDeviceCache (first we will have to find the platform associated with the device_impl that was made by the plugin). So we will need to create another route to find the correct platform and update platform_impl.MDeviceCache.
This change would also mean that platform_impl.getOrMakeDeviceImpl becomes redundant, and we should call platform_impl.getDeviceImpl instead from platform_impl::get_devices and remove platform_impl.getOrMakeDeviceImpl.

I'm not sure that make_device is performance critical code since I don't imagine it will be called many times per application; perhaps I am wrong? This solution could be improving the performance of non performance critical code at the expense of a reduction of the encapsulation of platform_impl, additional class methods to platform_impl and probably worse code readability.
I want to check that I did not misunderstand what you meant and you have a better solution? Or that this is still the correct direction? Or whether my original solution that made use of platform::get_devices (that I have added to the cuda implementation in the last commit for reference: beginning from:

llvm/sycl/include/sycl/ext/oneapi/experimental/backend/cuda.hpp

Line 78 in 6df9b9a

auto plts = platform::get_platforms();

) is better?

Thanks

JackAKirk · 2022-08-01T16:25:09Z

@smaslov-intel What do you think about my above comment/questions?

AerialMantis · 2022-09-06T12:25:19Z

@smaslov-intel friendly ping on this, what are your thoughts on @JackAKirk's comments above?

If sycl device already exists make_device returns it.

dd6f090

Signed-off-by: JackAKirk <[email protected]>

JackAKirk requested a review from a team as a code owner May 26, 2022 17:39

JackAKirk requested a review from smaslov-intel May 26, 2022 17:39

JackAKirk mentioned this pull request May 27, 2022

sycl::devices created via interop API are not uniqued #6055

Open

example only iter backend devices.

8c7d856

Signed-off-by: JackAKirk <[email protected]>

fix typo

8e846e0

Signed-off-by: JackAKirk <[email protected]>

JackAKirk mentioned this pull request Jun 13, 2022

OpenCL make_device ownership of native device should be uniqued? KhronosGroup/SYCL-Docs#266

Open

JackAKirk added 2 commits July 13, 2022 11:45

Merge branch 'sycl' into unique-device

d3af00e

Example implementation using getDeviceImpl.

6df9b9a

Signed-off-by: JackAKirk <[email protected]>

JackAKirk marked this pull request as draft July 14, 2022 14:08

JackAKirk closed this Nov 28, 2022

JackAKirk mentioned this pull request Nov 28, 2022

[SYCL][CUDA] ext_oneapi_cuda make_device no longer duplicates sycl::device #7550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Fix broken uniqueness in make_device #6204

[SYCL] Fix broken uniqueness in make_device #6204

Uh oh!

JackAKirk commented May 26, 2022

Uh oh!

smaslov-intel commented May 30, 2022

Uh oh!

JackAKirk commented May 31, 2022 •

edited

Loading

Uh oh!

smaslov-intel commented Jun 1, 2022

Uh oh!

JackAKirk commented Jun 2, 2022 •

edited

Loading

Uh oh!

smaslov-intel commented Jun 5, 2022

Uh oh!

JackAKirk commented Jul 14, 2022 •

edited

Loading

Uh oh!

JackAKirk commented Aug 1, 2022

Uh oh!

AerialMantis commented Sep 6, 2022

Uh oh!

Uh oh!

[SYCL] Fix broken uniqueness in make_device #6204

[SYCL] Fix broken uniqueness in make_device #6204

Uh oh!

Conversation

JackAKirk commented May 26, 2022

Uh oh!

smaslov-intel commented May 30, 2022

Uh oh!

JackAKirk commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smaslov-intel commented Jun 1, 2022

Uh oh!

JackAKirk commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smaslov-intel commented Jun 5, 2022

Uh oh!

JackAKirk commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Aug 1, 2022

Uh oh!

AerialMantis commented Sep 6, 2022

Uh oh!

Uh oh!

JackAKirk commented May 31, 2022 •

edited

Loading

JackAKirk commented Jun 2, 2022 •

edited

Loading

JackAKirk commented Jul 14, 2022 •

edited

Loading