Skip to content

[Driver][SYCL]Emit an error if c compilation is forced using -x c or -x c-header when -fsycl mode is used #1416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4,290 commits into from

Conversation

hchilama
Copy link
Contributor

No description provided.

Fznamznon and others added 30 commits February 19, 2020 11:53
  CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp
This patch improves the tool's diagnostic upon finding a
SPIR kernel within an LLVM module. Despite that the tool's
only current use is within the SYCL FPGA flow, it's important
to make the message target-agnostic, so that the tool is not
tied to a particular device BE.
A related commit to the Clang driver has extended these diagnostics
with SYCL FPGA specifics without affecting the tool itself.

This patch also introduces testing for the return code value. For
example, this should allow the Clang driver users/developers to
differentiate between the two possible causes of llvm-no-spir-kernel
failure.

Signed-off-by: Artem Gindinson <[email protected]>
Move internal headers from include/CL/sycl to source directory to
prevent implementation details leak to user application and enforce
stable ABI.

A few more changes were applied to make the movement possible:

- addHostAccessorAndWait functions in accessor to avoid calls to RT
  internals from header file
- Removed getImageInfo
- Move buffer size acquisition from buffer constructor to SYCLMemObjT
  cpp to avoid calls to PI
- getPluginFromContext function in context
- Standard containers replaced with SYCL variants in sycl_mem_obj_i.hpp.
  Unique ptr replaced with shared
- A few implementations moved from queue.hpp to queue.cpp
- Some LIT tests temporarily include implementaion specific headers.
  They will be converted to unit tests later.

Signed-off-by: Alexander Batashev <[email protected]>
intel#1144)

Since we really just want to be able to memcpy the type to the device,
'is-trivially-copyable' is not the correct trait. Since CWG1734, If we want
to support trivially copyable types, we would be required to create 1 of 4
different mechanisms for having a type on the device (depending on the
way the type is structured). Additionally, 2 of these ways require us to
ALSO have the type be default constructible.

This patch transitions to trivially-copy-constructible , so that we can
simply memcpy from the existing one into new memory.

Signed-off-by: Erich Keane <[email protected]>
LowerWGScope pass performs required transformations to enable
hierarchical parallelism semantics. This pass should not be skipped even
if optimizations are disabled.

Also some typos in the comments are fixed.

Signed-off-by: Artur Gainullin <[email protected]>
…el#1156)

After intel#1068 has included the Demangle header, this fix to CMakeLists
should guarantee successful builds in all configurations

Signed-off-by: Artem Gindinson <[email protected]>
SPIR-V OpGroupBroadcast accepts three forms of local ID:
- scalar integer
- vector integer with 2 components
- vector integer with 3 components

Signed-off-by: John Pennycook <[email protected]>
Also remove idle semicolon.

Signed-off-by: Alexey Bader <[email protected]>
…#1162)

Fix the cl_device_unified_shared_memory_capabilities_intel bitfield type
name.

Signed-off-by: Alexey Bader <[email protected]>
* [SYCL][LIBCLC] Additional libclc builtins to support SYCL work

Adds builtins to libclc to support the CUDA backend for SYCL.

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] CMake and lit support for SYCL CUDA backend

Adds defines CMake and lit variables used for SYCL CUDA backend
development and test

Contributors
Alexander Johnston <[email protected]>
Bjoern Knafla <[email protected]>
Ruyman Reyes <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Local Accessor Support for CUDA

Provides the LocalAccessorToSharedMemory compiler pass required
for supporting SYCL local accessors in CUDA.

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Change __spirv_BuiltIn.. to functions

Changes the following builtins to functions

__spirv_BuiltInGlobalSize
__spirv_BuiltInWorkgroupSize
__spirv_BuiltInNumWorkgroups
__spirv_BuiltInLocalInvocationId
__spirv_BuiltInWorkgroupId
__spirv_BuiltInGlobalOffset

Contributors
David Wood <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Add SYCL CUDA support to clang driver

Adds CUDA support for sycl compilation in the clang driver

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Initial Implementation of the CUDA backend

Contributors
Alan Forbes <[email protected]>
Alexander Johnston <[email protected]>
Bjoern Knafla <[email protected]>
Daniel Soutar <[email protected]>
David Wood <[email protected]>
Kumudha Narasimhan <[email protected]>
Mehdi Goli <[email protected]>
Przemek Malon <[email protected]>
Ruyman Reyes <[email protected]>
Stuart Adams <[email protected]>
Svetlozar Georgiev <[email protected]>
Steffen Larsen <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Update libclc install rules

Have libclc install clc-* and libspirv-* to lib and share

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Inline cl namespace to simplify SYCL API usage

Synchronise the CUDA backend with the general SYCL changes from intel#974.

Signed-off-by: Andrea Bocci <[email protected]>

* Added missing flags for device-side builtins

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Removing unnecessary tool from the tree

Acked-by: Victor Lomuller <[email protected]>
Signed-off-by: Ruyman <[email protected]>

* [SYCL][PI] Fix kernel group info parameter conversion

Signed-off-by: Steffen Larsen <[email protected]>

* [SYCL][CUDA] Refactor __SYCL_INLINE macro

Synchronise the CUDA backend with the general SYCL changes from intel#1121.

Signed-off-by: Andrea Bocci <[email protected]>

* [SYCL] Have default_selector consider SYCL_BE

Have the default_selector consider the env var SYCL_BE when rating
device scores to make choosing a backend easier.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Select GlobalPlugin based on SYCL_BE

Rather than choose the last found plugin as GlobalPlugin, select
it depending on the SYCL_BE env var.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Improve default device selection checks

Better checks for CUDA and OpenCL devices to match with SYCL_BE in the
default device selection, based on the platform version info.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Formatting update for device_selector.cpp

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Changed CUDA unit tests to call through plugin

Signed-off-by: Steffen Larsen <[email protected]>

* [SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl

To ensure that the check-sycl targets test OpenCL devices, pass
SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which
passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which
device is tested by check-sycl.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Remove PI_CUDA specific details from clang

Removes PI_CUDA specific code paths and tests from clang, opting to
always enable them.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Further fixes to CUDA device selection

Fix platform string comparison for CUDA platform detection.
Fix device info platform query so that it uses the device's plugin,
rather than the GlobalPlugin.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Code style and cleanup to CUDA support

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Enable asserts in all buildbot builds

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Minor test and build configuration

Fix minor test and build configuration issues introduced in the
development of the CUDA backend.

Signed-off-by: Alexander Johnston <[email protected]>

Co-authored-by: Andrea Bocci <[email protected]>
Co-authored-by: Ruyman <[email protected]>
Co-authored-by: Steffen Larsen <[email protected]>
Signed-off-by: Alexey Bader [email protected]

Co-Authored-By: Alexander Batashev <[email protected]>
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Error was reproducible in two cases:
- using something like `numeric_limits<half>::min()` in within another
  `constexpr`
- not treating SYCL headers as system ones with `-Winvalid-constexpr`
  treated as error

Signed-off-by: Alexey Sachkov <[email protected]>
Event type triggers are misspelled "open"->"opened", etc.
Default event type triggers should work fine.

Signed-off-by: Alexey Bader <[email protected]>
…1053)

We had issue with wrong mangling of s_upsample. I fixed it a long time ago, so we can delete workaround now.

Signed-off-by: Ilya Mashkov <[email protected]>
During the building x64 Debug configuration of Windows using scripts from buildbot folder, there were two issues:
1. OpenCL ICD Loader failed to build because of the missing OpenCL headers
2. Fatal error C1128: clang\lib\Sema\SemaTemplateDeduction.cpp : number of sections exceeded object file format limit: compile with /bigobj

Signed-off-by: Dmitry Vodopyanov <[email protected]>
It turns out that my original implementation was correct and I just
mis-understand the double dot commit range description from ProGit
https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection.

Signed-off-by: Alexey Bader <[email protected]>
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Naghasan and others added 26 commits March 24, 2020 20:35
Define __SPIRV_BUILTIN_DECLARATIONS__ when passing
-fdeclare-spirv-builtins to clang.

Signed-off-by: Victor Lomuller <[email protected]>
Added OpenCL SPIR-V extended set builtins bindings and
part of the core SPIR-V (mostly missing Images and Pipes)

Known vendor extensions are not implemented yet.

Signed-off-by: Victor Lomuller <[email protected]>
Co-Authored-By: Alexey Bader <[email protected]>
…l#1252)

Implementation of piEventSetCallback with tests

GlueEvent uses now the correct plugins

The SYCL RT code for GlueEvent calls now
the right plugin to create the event that triggers the
dependency chain.
Renamed variables to clarify the source code and avoid
confusions between Context and Plugin

Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Stuart Adams <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
…#1376)

NOTE: This flag is not exposed to the driver and not intended for users.
It's added to make experiments and identify issues with optimizations.

Signed-off-by: Alexey Bader <[email protected]>
…#1383)

By emitting the legacy variant of the LLVM IR alongside the newer
representation of the attribute, backwards compatibility with any
existing BE implementation is restored. A smooth transition period
is thus achieved for the aforementiond BE - until it's able to consume
the new LLVM IR, it has an option to simply ignore the unknown metadata.

Signed-off-by: Artem Gindinson <[email protected]>
If found alloca command is not sub-buffer alloca, then
it's parent alloca which has same context

Signed-off-by: Ivan Karachun <[email protected]>
Enable -fdeclare-spirv-builtins for SYCL device compilation mode

For device compilation, SPIR-V builtins are now looked up by
the device compiler. They now longer need to be forward declared.

[SYCL-PTX] Revert manual mangling of some SPIR-V builtins
[SYCL-PTX] Add fmod builtin
[SYCL-PTX] Update Atomic mangling

Signed-off-by: Victor Lomuller <[email protected]>
…<dir> (intel#1346)

When using /Fo<dir> the improper dependency file name was generated, causing
the bundle step to not be able to locate the dependency file when compiling
to object

Signed-off-by: Michael D Toguchi <[email protected]>
This patch introduces the following loop attributes:
- loop_coalesce:
  Indicates that the loop nest should be coalesced into a single loop without
  affecting functionality
- speculated_iterations:
  Specifies the number of concurrent speculated iterations that will be in
  flight for a loop invocation
- disable_loop_pipelining:
  Disables pipelining of the loop data path, causing the loop to be executed
  serially
- max_interleaving:
  Places a maximum limit N on the number of interleaved invocations of an inner
  loop by an outer loop

Signed-off-by: Viktoria Maksimova <[email protected]>
Fixed the buffer constructor called with a pair of iterators.
The current implementation has a problem due to ambiguous spec.
The buffer should never write back data unless there is a call to set_final_data(), but the current implementation does it.
I corrected the spec in KhronosGroup/SYCL-Docs#76.
So, now we can change the buffer implementation according to the clarified spec.

The test case buffer.cpp also needed change because of this change.
The user should not expect the automatic write-back of data upon destruction of buffer.

Signed-off-by: Byoungro So <[email protected]>
Co-authored-by: Ronan Keryell <[email protected]>
A simple library which allows to construct and serialize/deserialize
a sequence of typed property sets, where each property is a <name,typed value>
pair. To be used in offload tools.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
)

The library allows to create, serialize/deserialize tables of strings,
insert/delete/replace/rename columns, add rows. To be used in offload
tools.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
This reverts commit d357add.

Signed-off-by: Vladimir Lazarev <[email protected]>
…for (intel#1348)

The kernel callable being invoked from an nd_range parallel_for is accepting an id argument, while it should be nd_item.

After my analysis, I found we check arguments' type for kernel_parallel_for instead of parallel_for. But that check is useless, because the compiler can still find a candidate for kernel_parallel_for with nd_range and id which is a wrong combination.

In my solution, parallel_for with nd_range calls kernel_parallel_for_nd_range(...) which is only available for nd_item.

Signed-off-by: Bing1 Yu <[email protected]>
Implements a few code simplification/unification for LowerWGScope.

Signed-off-by: Victor Lomuller <[email protected]>
…tel#1405)

For NVPTX target address space inference for kernel arguments and
allocas is happening in the backend (NVPTXLowerArgs and
NVPTXLowerAlloca passes). After frontend these pointers are in LLVM
default address space 0 which is the generic address space for NVPTX
target. Perform address space cast of a pointer to the shadow global
variable from the local to the generic address space before replacing
all usages of a byval argument.

Signed-off-by: Artur Gainullin <[email protected]>
- Adds static members to sub_group class.
- sub_group member functions marked deprecated, to be removed later.
- SPIR-V helpers expanded to convert SYCL group to SPIR-V scope.
- Add workaround for half types

Signed-off-by: John Pennycook <[email protected]>
Whereas it is not possible to generate vector of bools in FE,
we have to change return type for corresponding instructions in SPIRV
translator to vector of bools. SPIRV translator already did this for
some instructions, this patch extends this behaviour to handle more
instructions.
Adding doxygen documentation to PI CUDA backend.
Some code is re-ordered in the file to help sorting the
doxygen.

Co-Authored-By: Alexey Bader <[email protected]>
Co-Authored-By: Alexander Batashev <[email protected]>
Co-Authored-By: Romanov Vlad <[email protected]>

Signed-off-by: Ruyman Reyes <[email protected]>
Based on
https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md

* [SYCL] PI changes:

1. Add specialization constant API to the SYCL RT Plugin Interface.
New PI API added:
pi_result piProgramSetSpecializationConstant(pi_program prog, pi_uint32 spec_id,
                                             size_t spec_size,
                                             const void *spec_value);
2. Add property set fields to the binary image descriptor, bump PI version.
This change breaks backward binary compatibility of device binary image descriptors.
3. Add convenience C++ wrappers for PI binary image hierarchy objects.

* [SYCL] Support device binary properties and file tables in the offload wrapper.

1. New option - "-properties=<file>". <file> must be a property set registry
file, as defined by llvm/Support/PropertySetIO.h. The wrapper will add the
property sets to the binary image descriptor and the them available to the
runtime.

2. New options - "-batch". With this option the only input can be a file table,
as defined by llvm/Support/SimpleTable.h. Column names are a part of interface
between this tool and the sycl-post-link, which produces the file table.

3. Binary image descriptor LLVM type updated to resemble changes in Plugin
Interface v1.2.

* [SYCL] Specialization constants support in the Front End.

1. Detect kernel lambda object captures corresponding to specialization
constants and (a) don't create kernel arguments for them (b) generate
specializations of the SpecConstantInfo structure into the integration
header.

2. Recognize the __unique_stable_name intrinsic and replace
it with a string literal uniquely identifying the type of the typename
template parameter to this intrinsic.

3. FE-related changes in the runtime:
- new SpecConstantInfo templated struct for type->name translation for
  specialization constants used by integration header
- define the __sycl_fe_getStableUniqueTypeName intrinsic

* [SYCL] Add specialization constant support in SYCL runtime.

1. Define SYCL API (sycl/include/CL/sycl/experimental/spec_constant.hpp)
2. Add convenience C++ wrappers for PI device binary structures and refactor
   runtime to use the wrappers. Get rid of custom deleters for binary images.
3. Implement SYCL spec constant APIs in program an program manager.

* [SYCL] Use file-table-tform in SYCL offload processing in clang driver.

Clang driver's design can't handily model
(1) multiple inputs/outputs in the action graph. Because of that, for
example, sycl-post-link tool is invoked twice - once to to split the code
and produce multiple bitcode files, and secondly - to generate symbol
files for the split modules.
(2) "Clusters" of inputs/outputs, when subsets of inputs/outputs are
associated and describe different aspects of the same data. Example of
such clustering is the split module + its symbol file above. Clustering
would require support both in the driver and the tools invoked in
response to actions.

This commit moves SYCL offload processing to the "file table concept."
sycl-post-link instead of
(1) being invoked n times, once per each output type requested (once for
    device split and once for symbol file generation)
(2) outputting multiple file lists each listing outputs from the
    corresponding invocation above
is now invoked once and produces single file table output. E.g.
  [Code|Symbols|Properties]
  a_0.bc|a_0.sym|a_0.props
  a_1.bc|a_1.sym|a_1.props
This solves both problems - multiple input/output and clustering.
Combined with the file-table-tform tool, this allows for efficent handling
of multiple clusters of files (each represented as a row in the table file)
in the clang driver infrastructure.
For example, there is a real offload processing problem:
step1. sycl-post-link outputs N clusters of files
step2. "Code" file of each cluster resuilting from step1 ({a_0.bc, a_1.bc}
       in the example above) must undergo further transformations -
       translation to SPIRV and optional ahead-of-time compilation.
step3. In each cluster resulting from step1 the "Code" file needs to be
       replaced with the result of step2
step4. All the clusters are processed by the ClangOffloadWrapper tool, which
       needs to know how files are distributed into clusters and what is
       the roles of each file in a cluster - whether it is "Code", "Symbol"
       or "Properties".
To solve this, the following action graph is constructed in the clang driver:

                        column:"Code"
t1 -> [file-table-tform:extract column] -> t1a -> [for-each:] -> t1b
                                                  llvm-spirv
                                                   aot-comp
t1
   \                  column:"Code"
    [file-table-tform:replace column] -> t2 -> [ClangOffloadWrapper]
   /
t1b

where t1b is ["Code"]  and t2 is  [Code|Symbols|Properties]
              a_0.bin             a_0.bin|a_0.sym|a_0.props
              a_1.bin             a_1.bin|a_1.sym|a_1.props

Note that the graph does not change with growing number of clusters, neither
it changes when more files are added to each cluster (e.g. a "Manifest" file).

* [SYCL] Process specialization constants in sycl-post-link tool.

Add a spec constant lowering pass to sycl-post-link tool. Support
file table output format.

* [SYCL] Temporarily disable spec_const_hw.cpp on CPU.

CPU OpenCL Runtime on build machines is not updated yet.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
@hchilama hchilama closed this Mar 28, 2020
@hchilama hchilama deleted the intel_llvm branch November 19, 2021 19:37
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Feb 23, 2023
Test integration of kernel fusion into the SYCL runtime scheduler.
    
Check that cancellation of the fusion happens if required by synchronization rules, as described in the [extension proposal](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc#synchronization-in-the-sycl-application).

Spec: intel#7098
Implementation: intel#7531

Signed-off-by: Lukas Sommer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.