Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #8431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 23, 2023

Conversation

whitneywhtsang
Copy link
Contributor

#8395 merged sycl branch to sycl-mlir branch.
On of the upstream commit 505aa7d causes build failures, so was temporarily reverted in sycl-mlir branch. The build failures are fixed by #8411, so we can add 505aa7d back.

Please do not squash and merge this PR.

Brox Chen and others added 25 commits February 17, 2023 15:16
Extract alignment information from compile-time properties.

Read the alignment decoration applied on a pointer
using`__sycl_detail__::add_ir_annotations_member("sycl-alignment", "")`
and transform the information on the load/store instructions which use
this pointer.

This patch includes:
     1. create a utility function "parseSYCLPropertiesString"
2. parse string in llvm.ptr.annotation and get the alignment information
out and apply on load/store instruction
Reduce the amount of testing being performed in the single intelfpga
test by breaking out the 'link' specific tests that cover -fsycl-link
behaviors.
CUDA math provides a series of simd intrinsic:
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html#group__CUDA__MATH__INTRINSIC__SIMD
We provided corresponding APIs in SYCL libdevice which emulates the
behaviors of these CUDA simd intrinsic.
The PR adds these APIs to sycl_ext_intel_math header file, so users can
invoke them when porting CUDA code to SYCL.

---------

Signed-off-by: jinge90 <[email protected]>
This commit adds a design document for the `any_device_has` and
`all_devices_have` SYCL 2020 traits.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
Using ubuntu-latest still causes long delays due to missing runners.
For x86 target, vector types (both result and arguments) can be coerced
to scalars of the same size, e.g:

      define zeroext i1 @_Z18convert_ulong4_rteDv4_t(<4 x i16> %x)
      ; becomes
      define zeroext i1 @_Z18convert_ulong4_rteDv4_t(i64 %x.coerced)

Such behavior is completely valid for x86, but the backend vectorizer
cannot work with scalars instead of vectors.

With this patch, argument and result types will be leaved unchanged in
the CodeGen.

New option fopencl-force-vector-abi is also added to force-disables
vector to scalar coercion when provided.

---------

Co-authored-by: Wenju He <[email protected]>
Co-authored-by: Alexey Bader <[email protected]>
Using ubuntu-latest still causes long delays due to missing runners.
…uide (intel#8411)

The use of deduction guides in the `ReducerAccess` helper class causes
problems when building with a compiler that does not support them. This
commit changes the implementation to use a helper function instead.

Signed-off-by: Larsen, Steffen <[email protected]>
Addresses to support host-device memcpy2D copies
…YCL (intel#8257)

This PR addresses an issue where if we use `__CUDA_ARCH__` causes
intrinsics not to be defined in the CUDA include files.
- Replace `__CUDA_ARCH__` with `__SYCL_CUDA_ARCH__` for SYCL device
- Update the `sycl-macro.cpp` test to check the appropriate macro.

---

As far as I could find the original issue was introduced from PR
[intel#6524](intel@7b47ebb)
for enabling the bfloat16 support moving it from the experimental
extension, and it breaks some codebases with CUDA interop calls.
Current reports include github issues
[intel#7722](intel#7722),
[intel#8133](intel#8133) and
[uxlfoundation/oneMath#257](uxlfoundation/oneMath#257).

For that reason we define a unique `__SYCL_CUDA_ARCH__` macro and use it
instead for SYCL device targets and leave `__CUDA_ARCH__` as before for
CUDA targets.
The test can fail if wokring directory where the test was launched has a
`error` substring in its path.
… on (intel#8374)

Fixes two bug in CUDA PI and HIP PI that can cause waiting for events to
do nothing:
- The first one is an off-by-one error when checking if an event needs
to be waited on
- The second one is `last_sync_compute_streams_` /
`last_sync_transfer_streams_` to a new value before checking the streams
which can read these variables, expecting the old values.

Both of these are synchronization related and therefore hard to test
for.
When a pointer to be promoted is stored, internalization is no longer
safe to perform. In this case, simply bail out and do not promote the
given pointer.

Signed-off-by: Victor Perez <[email protected]>
Co-authored-by: Alexey Bader <[email protected]>
This commit fixes and issue where an integer conversion happening inside
an assert would cause the conversion to not happen when assertions were
disabled.

Signed-off-by: Larsen, Steffen <[email protected]>
This commit implements the copy and memcpy operations to and from
device_global. If the device_global does not have device_image_scope the
memory operation will be on the underlying USM memory, while if the
operation is on a device_global with device_image_scope the runtime will
try to find a suitable program in the program cache or build a new
program using the image using it.

---------

Signed-off-by: Steffen Larsen <[email protected]>
Co-authored-by: Alexey Bader <[email protected]>
…intel#8264)

Support of this extension was removed from translator in favor of
Khronos extension in KhronosGroup/SPIRV-LLVM-Translator@ac7a7596 As long
as it was never used in compiler pipeline, the design can be removed.

Specification of SPV_EXT_relaxed_printf_string_address_space:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_relaxed_printf_string_address_space.asciidoc

Signed-off-by: Maksimova, Viktoria <[email protected]>
One of the patches missing in intel/llvm

Signed-off-by: Sidorov, Dmitry [email protected]

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@c5b3c8e3283b

Second attempt
Updated the document to use up-to-date extension template.
Added revision 2 of the extension, which adds ability for user to
construct `sub_group_mask` from specific values.

---------

Co-authored-by: John Pennycook <[email protected]>
…el#8419)

When using -fsycl-targets=intel_gpu* -Xsycl-target-backend=intel_gpu*
"opts" be sure to pass "opts" to the ocloc call. These specially handled
target values imply spir64_gen but were not processed properly to be
able to scrutinize various target possibilities.
This commit does the following for marray:
1. Add overloads on all binary operators with scalars as the left
operand.
 2. Allow half, float, double in && and || operators.

fixes intel#8331

---------

Signed-off-by: Larsen, Steffen <[email protected]>
@whitneywhtsang whitneywhtsang added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Feb 22, 2023
@whitneywhtsang whitneywhtsang self-assigned this Feb 22, 2023
@whitneywhtsang whitneywhtsang linked an issue Feb 22, 2023 that may be closed by this pull request
@whitneywhtsang whitneywhtsang merged commit d226c17 into intel:sycl-mlir Feb 23, 2023
@whitneywhtsang whitneywhtsang deleted the merge branch February 23, 2023 15:03
whitneywhtsang added a commit to intel/llvm-test-suite that referenced this pull request Mar 1, 2023
Fixed by intel/llvm#8431.

Signed-off-by: Tsang, Whitney <[email protected]>
whitneywhtsang added a commit to intel/llvm-test-suite that referenced this pull request Mar 27, 2023
Fixed by intel/llvm#8431.

Signed-off-by: Tsang, Whitney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SYCL-MLIR] Investigate build failures from upstream commit