Skip to content

[SYCL][CUDA][HIP] Add support for querying device architecture #10573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 16, 2023

Conversation

MartinWehking
Copy link
Contributor

Implements sycl_ext_oneapi_device_architecture for AMD and NVIDIA.
The device architecture can be queried through
dev.get_info<sycl::ext::oneapi::experimental::info::device::architecture>()
( dev is a sycl::device)

@MartinWehking MartinWehking requested a review from a team as a code owner July 26, 2023 10:12
@MartinWehking MartinWehking requested a review from bso-intel July 26, 2023 10:12
@MartinWehking MartinWehking changed the title Device info [SYCL][CUDA][HIP] Add support for querying device architecture Jul 26, 2023
@MartinWehking MartinWehking temporarily deployed to aws July 26, 2023 10:28 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws July 26, 2023 16:34 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws August 1, 2023 13:36 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws August 1, 2023 13:37 — with GitHub Actions Inactive
@gmlueck
Copy link
Contributor

gmlueck commented Aug 1, 2023

Can you also update the extension specification to remove the following limitation:

The two APIs device::ext_oneapi_architecture_is and the ext::oneapi::experimental::info::device::architecture device descriptor are currently supported only for Intel devices (both GPU and CPU). There is no support yet for Nvidia or AMD devices.

@MartinWehking MartinWehking requested a review from a team as a code owner August 3, 2023 13:29
@MartinWehking MartinWehking temporarily deployed to aws August 3, 2023 14:19 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws August 3, 2023 15:01 — with GitHub Actions Inactive
Copy link
Contributor

@dm-vodopyanov dm-vodopyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few stylistic comments should be fixed.

@@ -568,64 +568,108 @@ struct get_device_info_impl<range<Dimensions>,
}
};

// This macro is only for AMD, NVIDIA and Intel GPU architectures
#define ARCHES(X) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done in a separate PR later during some refactoring but we can have two std::unordered_maps: one for string keys and another for uint32_t keys: this way in case of Intel platforms we will not convert number to string and compare strings - it will be faster. For AMD and NVIDIA archs with std::unordered_map the access time will be O(1) in a good case and O(N) in a bad case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the way the uint32_t keys were converted was wrong anyways. The converted strings were not in hexadecimal form anymore and that's also why the CI failed.
I've added another macro for now to avoid the conversion and leave the DeviceIP as an integer.

@MartinWehking MartinWehking temporarily deployed to aws August 3, 2023 17:06 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws August 3, 2023 18:09 — with GitHub Actions Inactive
@MartinWehking MartinWehking temporarily deployed to aws August 4, 2023 09:08 — with GitHub Actions Inactive
Copy link
Contributor

@dm-vodopyanov dm-vodopyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MartinWehking MartinWehking temporarily deployed to aws August 4, 2023 09:48 — with GitHub Actions Inactive
@dm-vodopyanov
Copy link
Contributor

@aelovikov-intel re-triggering of SYCL Pre Commit on Linux / test (AMD/HIP and Intel) didn't help, how it can be fixed? Possibly, some problem with CI machines.

@dm-vodopyanov
Copy link
Contributor

@aelovikov-intel re-triggering of SYCL Pre Commit on Linux / test (AMD/HIP and Intel) didn't help, how it can be fixed? Possibly, some problem with CI machines.

Looks like a common problem at least with Intel devices: #9845 (comment)

@dm-vodopyanov
Copy link
Contributor

Restarted the checks, it was reported that issue with Intel devices should be fixed now.

@dm-vodopyanov
Copy link
Contributor

dm-vodopyanov commented Aug 16, 2023

Unrelated failure on HIP:

Failed Tests (1):
  SYCL :: AtomicRef/assignment_atomic64_generic.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants