Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW47 2024) #16165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1,946 commits into from
Dec 3, 2024
Merged

Conversation

iclsrc
Copy link
Contributor

@iclsrc iclsrc commented Nov 22, 2024

ldionne and others added 30 commits November 5, 2024 13:18
…4988)

[5.2:625:17] The syntax of the DESTROY clause on the DEPOBJ construct
with no argument was deprecated.
See #105195 as well as the big comment in DynamicRecursiveASTVisitor.cpp
for more context.
…ate handlers. NFC.

Cleanup the SHLI/SRLI/SRAI handlers to be more consistent - prep for a future patch.
…andedBits on SSE shift-by-immediate nodes.

Attempt to peek through multiple-use SHLI/SRLI/SRAI source vectors.
Some tests were including LibcTest.h directly. Instead you should
include Test.h which does proper indirection for other test frameworks
we support (zxtest, gtest). Also added some license headers to tests
that were missing them.
- create a clang built-in in Builtins.td
- link dot4add_i8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OPSDot
is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpSDot in applicable SPIRV version
or if SPV_KHR_integer_dot_product is enabled
- add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to
DXIL.td op Dot4AddI8Packed

- add tests for HLSL intrinsic lowering to dx/spv intrinsic in
dot4add_i8packed.hlsl
- add tests for sema checks in dot4add_i8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_i8packed.ll
- add test to dxil lowering in DirectX/dot4add_i8packed.ll
    
 Resolves #99220
This retries the PR 113521 skipping a test in a remote environment.
When you set a "next branch breakpoint" and run to it while stepping,
you have to claim the stop at that breakpoint to be the top of the
inlined call stack, or you will seem to "step in" and then plans might
try to step back out again.

This records the PrefferedLineEntry for next branch breakpoints and adds
a test to make sure this works.
For GFX10+, image_gather4 instructions that have v[254:255] as dst reg
and the d16 bit on can be assembled correctly but the generated binary
fails to disassemble (e.g. image_gather4 v[254:255], v[1:2], s[8:15], s[12:15]
 dmask:0x8 dim:SQ_RSRC_IMG_2D d16).  This patch fixes this problem.
Until now, suppression of `DT_DEBUG` has been hardcoded as a downstream
patch in lld. This can instead be achieved by passing `-z rodynamic`.
Have the driver do this so that the private patch can be removed.

If the scope of lld's `-z rodynamic` is broadened (within reason) to do
more in future, that's likely to be fine as `PT_DYNAMIC` isn't writable
on PlayStation.

PS5 only. On PS4, the equivalent hardcoded configuration will remain in
the proprietary linker.

SIE tracker: TOOLCHAIN-16704
LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.

One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
The assumed-rank array are represented by DIGenericSubrange in debug
metadata. We have to provide 2 things.

1. Expression to get rank value at the runtime from descriptor.

2. Assuming the dimension number for which we want the array information
has been put on the DWARF expression stack, expressions which will
extract the lowerBound, count and stride information from the descriptor
for the said dimension.

With this patch in place, this is how I see an assumed_rank variable
being evaluated by GDB.

```
function mean(x) result(y)
integer, intent(in) :: x(..)
...
end

program main
use mod
implicit none
integer :: x1,xvec(3),xmat(3,3),xtens(3,3,3)
x1 = 5
xvec = 6
xmat = 7
xtens = 8
print *,mean(xvec), mean(xmat), mean(xtens), mean(x1)
end program main

(gdb) p x
$1 = (6, 6, 6)

(gdb) p x
$2 = ((7, 7, 7) (7, 7, 7) (7, 7, 7))

(gdb) p x
$3 = (((8, 8, 8) (8, 8, 8) (8, 8, 8)) ((8, 8, 8) (8, 8, 8) (8, 8, 8)) ((8, 8, 8) (8, 8, 8) (8, 8, 8)))

(gdb) p x
$4 = 5
```
…114559)

Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
    %pack = tensor.pack %src
      padding_value(%pad : i32)
      inner_dims_pos = [0, 1]
      inner_tiles = [%c8, 1]
      into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
    %res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
```
as (note the static trailing dim in the result and dynamic tile
dimension that corresponds to that):
```mlir
    %res = tensor.pack %src
      padding_value(%pad : i32)
      inner_dims_pos = [0, 1]
      inner_tiles = [%c8, 1]
      into %cast : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
```

This triggers an Op verification failure and is due to the fact that the
folder does not update the inner tile sizes in the pack Op. This PR
addresses that.

Note, supporting other Ops with size-like attributes is left as a TODO.
This patch fixes:

  mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:4781:17: error: unused
  variable 'tileSize' [-Werror,-Wunused-variable]
Finish hooking up ClangIR code gen into the Clang control flow,
initializing enough that basic code gen is possible.

Add an almost empty `cir.func` op to the ClangIR dialect. Currently the
only property of the function is its name. Add the code necessary to
code gen a cir.func op.

Create essentially empty files
clang/lib/CIR/Dialect/IR/{CIRAttrs.cpp,CIRTypes.cpp}. These will be
filled in later as attributes and types are defined in the ClangIR
dialect.

(Part of upstreaming the ClangIR incubator project into LLVM.)
I have to check for the sc list size being changed by the call-site
search, not just that it had more than one element.

Added a test for multiple CU's with the same name in a given module,
which would have caught this mistake.

We were also doing all the work to find call sites when the found decl
and specified decl's only difference was a column, but the incoming
specification hadn't specified a column (column number == 0).
…e"" (#115034)

In C++ it's UB to use undeclared values as enum.
And there is support __ATOMIC_HLE_ACQUIRE and
__ATOMIC_HLE_RELEASE need such values.

So use `int` in TSAN interface, and mask out
irrelevant bits and cast to enum ASAP.

`ThreadSanitizer.cpp` already declare morder parameterd
in these functions as `i32`.

This may looks like a slight change, as we
previously didn't mask out additional bits for `fmo`,
and `NoTsanAtomic` call. But from implementation
it's clear that they are expecting exact enum.


Reverts llvm/llvm-project#115032
Reapply llvm/llvm-project#114724
…5023)

Data transfer from a variable with a descriptor to a pointer. We create
a descriptor for the pointer so we can use the flang runtime to perform
the transfer. The Assign function handles all corner cases. We add a new
entry points `CUFDataTransferDescDescNoRealloc` to avoid reallocation
since the variable on the LHS is not an allocatable.
I plan to remove s32 as a legal type to match SelectionDAG
and to remove i32 from the GPR regclass on RV64.
The lit test fmuladd-soft-float.ll only specifies s390x as platform,
but the test is Linux specific, causing problems when run on z/OS.
This change updates the triple to fix this.
Adds the runtime support routines for XRay on SystemZ. Only function
entry/exit is implemented.
Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET
into a small instruction sequence which calls into the XRay library.
…groups (#113751)

0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.
With the support for xray for SystemZ in place, the option can now be
enabled in clang.
…4436)

Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
@jsji
Copy link
Contributor

jsji commented Dec 2, 2024

This is ready for review.

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libclc LGTM

@jsji
Copy link
Contributor

jsji commented Dec 3, 2024

Ping @intel/llvm-reviewers-runtime @intel/dpcpp-spirv-reviewers @intel/llvm-reviewers-cuda

// CHECK-SPIRV: Name [[#InvokeFunc3:]] "__device_side_enqueue_block_invoke_3_kernel"
// CHECK-SPIRV: Name [[#InvokeFunc4:]] "__device_side_enqueue_block_invoke_4_kernel"
// CHECK-SPIRV: Name [[#InvokeFunc5:]] "__device_side_enqueue_block_invoke_5_kernel"
// CHECK-SPIRV: Name [[#InvokeFunc1:]] "__device_side_enqueue_block_invoke"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different from https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/enqueue_kernel.cl . I'm ok to merge it as is, I've created internal tracker to investigate this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MrSidims

Copy link
Contributor

@uditagarwal97 uditagarwal97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in sycl/test/check_device_code/vector/vector_math_ops.cpp due to use of splat LGTM.

@jsji
Copy link
Contributor

jsji commented Dec 3, 2024

@intel/llvm-gatekeepers I think this is ready for merge. Thanks.

@sarnex
Copy link
Contributor

sarnex commented Dec 3, 2024

/merge

@sarnex
Copy link
Contributor

sarnex commented Dec 3, 2024

Is the command wrong or is the bot broken?

@jsji
Copy link
Contributor

jsji commented Dec 3, 2024

Is the command wrong or is the bot broken?

Looks like the bot is failing again due to old git version. @DoyleLi

@sarnex sarnex merged commit d7b2605 into sycl Dec 3, 2024
24 of 30 checks passed
@bb-sycl
Copy link
Contributor

bb-sycl commented Dec 4, 2024

Wed 04 Dec 2024 01:18:19 AM UTC --- Merge failed with error: PR is not clean for merge. Please examine ci check status before merge.

@DoyleLi
Copy link
Contributor

DoyleLi commented Dec 4, 2024

Is the command wrong or is the bot broken?

Looks like the bot is failing again due to old git version. @DoyleLi

Hi, The issue is resolved. Will take effect in next pulldown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.