LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

dwoodwor-intel · 2022-06-24T15:44:34Z

LLVM: llvm/llvm-project@9aaba9d
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@ae90401

`std::function` has been deprecated for a few releases now. Remove it with an option to opt-back-in with a note that this option will be removed in LLVM 16. Reviewed By: ldionne, #libc Spies: #libc_vendors, EricWF, jloser, libcxx-commits Differential Revision: https://reviews.llvm.org/D127908

Fixes #37402 Reviewed By: ldionne Spies: EricWF, avogelsgesang, libcxx-commits, arphaman Differential Revision: https://reviews.llvm.org/D124346

Previously `#pragma STDC FENV_ACCESS ON` always set dynamic rounding mode and strict exception handling. It is not correct in the presence of other pragmas that also modify rounding mode and exception handling. For example, the effect of previous pragma FENV_ROUND could be cancelled, which is not conformant with the C standard. Also `#pragma STDC FENV_ACCESS OFF` turned off only FEnvAccess flag, leaving rounding mode and exception handling unchanged, which is incorrect in general case. Concrete rounding and exception mode depend on a combination of several factors like various pragmas and command-line options. During the review of this patch an idea was proposed that the semantic actions associated with such pragmas should only set appropriate flags. Actual rounding mode and exception handling should be calculated taking into account the state of all relevant options. In such implementation the pragma FENV_ACCESS should not override properties set by other pragmas but should set them if such setting is absent. To implement this approach the following main changes are made: - Field `FPRoundingMode` is removed from `LangOptions`. Actually there are no options that set it to arbitrary rounding mode, the choice was only `dynamic` or `tonearest`. Instead, a new boolean flag `RoundingMath` is added, with the same meaning as the corresponding command-line option. - Type `FPExceptionModeKind` now has possible value `FPE_Default`. It does not represent any particular exception mode but indicates that such mode was not set and default value should be used. It allows to distinguish the case: { #pragma STDC FENV_ACCESS ON ... } where the pragma must set FPE_Strict, from the case: { #pragma clang fp exceptions(ignore) #pragma STDC FENV_ACCESS ON ... } where exception mode should remain `FPE_Ignore`. - Class `FPOptions` has now methods `getRoundingMode` and `getExceptionMode`, which calculates the respective properties from other specified FP properties. - Class `LangOptions` has now methods `getDefaultRoundingMode` and `getDefaultExceptionMode`, which calculates default modes from the specified options and should be used instead of `getRoundingMode` and `getFPExceptionMode` of the same class. Differential Revision: https://reviews.llvm.org/D126364

…One methods This will allow implementing state-dependent behavior in the future. Differential Revision: https://reviews.llvm.org/D128327

As branch on undef is immediate undefined behavior, there is no need to mark one of the edges as feasible. We can leave all the edges non-feasible. In IPSCCP, we can replace the branch with an unreachable terminator. Differential Revision: https://reviews.llvm.org/D126962

…of SFINAE For an lvalue reference to a move only view x, views::all(x) gives hard error because the expression inside noexcept is not well formed and it is not SFINAE friendly. Given a move only view type `V`, and a concept ``` template <class R> concept can_all = requires { std::views::all(std::declval<R>()); }; ``` The expression `can_all<V&>` returns libstdc++: false msvc stl : false libc++ : error: static_cast from 'V' to 'typename decay<decltype((std::forward<V &>(__t)))>::type' (aka 'V') uses deleted function noexcept(noexcept(_LIBCPP_AUTO_CAST(std::forward<_Tp>(__t)))) The standard spec has its own problem, the spec says it is expression equivalent to `decay-copy(E)` but the spec of `decay-copy` does not have any constraint, which means the expression `decay-copy(declval<V&>())` is well-formed and the concept `can_all<V&>` should return true and should error when instantiating the function body of decay-copy. This is clearly wrong behaviour in the spec and we will probably create an LWG issue. But the libc++'s behaviour is clearly not correct. The `noexcept` is an "extension" in libc++ which is not in the spec, but the expression inside `noexpect` triggers hard error, which is not right. Reviewed By: #libc, ldionne, var-const Differential Revision: https://reviews.llvm.org/D128281

This patch adds support for: * Querying the PSTATE.SM state with @llvm.aarch64.sme.get.pstatesm * Reading/writing the TPIDR2 register with new @llvm.aarch64.sme.get.tpidr2 and @llvm.aarch64.sme.set.tpidr2 intrinsics. Tests added here: CodeGen/AArch64/sme-get-pstatesm.ll CodeGen/AArch64/sme-read-write-tpidr2.ll Differential Revision: https://reviews.llvm.org/D127957

Regenerate the test with current O2 result and only run CGP.

This fixes the combining of constant vector GEP operands in the optimization of MVE gather/scatter addresses, when opaque pointers are enabled. As opaque pointers reduce the number of bitcasts between geps, more can be folded than before. This can cause problems if the index types are now different between the two geps. This fixes that by making sure each constant is scaled appropriately, which has the effect of transforming the geps to have a scale of 1, changing [r0, q0, uxtw intel#1] gathers to [r0, q0] with a larger q0. This helps use a simpler instruction that doesn't need the extra uxtw. Differential Revision: https://reviews.llvm.org/D127733

The symbol table starts with all the C_FILE symbols. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D126623

Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 This time without any manual fixup.

Update performed using (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 memcpy-discriminator.ll was fixed up to use named instructions and drop the no longer needed bitcasts.

This patch is a subpart of D125768 intented to make the review easier. The `SizedOp` struct represents operations to be performed on a certain number of bytes. It is responsible for breaking them down into platform types and forwarded to the `Backend`. The `Backend` struct represents a lower level abstraction that works only on types (`uint8_t`, `__m128i`, ...). It is similar to instruction selection. Differential Revision: https://reviews.llvm.org/D126768

Tests were updated with (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp CONFLICT (content): Merge conflict in llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h

Previously left these behind due to the required instruction renumbering, drop them now. This more accurately represents opaque pointer input IR. Also drop duplicate opaque pointer check lines in one SROA test.

This is a reland of D126773 / b2a9ea4. The removal of `-mllvm -combiner-global-alias-analysis` has landed separately in D128051 / 7b73f53. And the removal of `-mllvm --tail-merge-threshold=0` is scheduled for removal in a subsequent patch.

Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).

…ndedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.

This helps to preserve the debug information of global variables. Differential Revision: https://reviews.llvm.org/D127510

The misc-unused-parameters check would trigger false positive warnings about the parameter being unused when the parameter declaration was invalid. No longer issue the warning in that case on the assumption that most parameters are used in practice, so the extra diagnostic is most likely a false positive. Fixes #56152

This comment became outdated in 053eb35 (but was moved along); that commit moved the code and the comment to a separate function, with a separate local variable `num_of_bytes_read`. On error, the possibly garbage value is never copied back to the caller's reference, thus the comment is no longer relevant (and slightly confusing as is). Differential Revision: https://reviews.llvm.org/D128226

Run the test through -instnamer, which makes it simpler to modify it.

The linker wrapper currently eagerly extracts all identified offloading binaries to a file. This isn't ideal because we will soon open these files again to examine their symbols for LTO and other things. Additionally, we may not use every extracted file in the case of static libraries. This would be very noisy in the case of static libraries that may contain code for several targets not participating in the current link. Recent changes allow us to treat an Offloading binary as a standard binary class. So that allows us to use an OwningBinary to model the file. Now we keep it in memory and only write it once we know which files will be participating in the final link job. This also reworks a lot of the structure around how we handle this by removing the old DeviceFile class. The main benefit from this is that the following doesn't output 32+ files and instead will only output a single temp file for the linked module. ``` $ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps ``` Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127246

This patch updates the `--[no-]offload-arch` command line arguments to allow the user to pass in many architectures in a single argument rather than specifying it multiple times. This means that the following command line, ``` clang foo.cu --offload-arch=sm_70 --offload-arch=sm_80 ``` can become: ``` clang foo.cu --offload-arch=sm_70,sm_80 ``` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D128206

This patch adds a new transferToOtherSystem helper that tries to transfer information from signed predicates to the unsigned system and vice versa. The initial version adds A >=u B for A >=s B && B >=s 0 https://alive2.llvm.org/ce/z/8b6F9i

Add support for mapping the SPV_KHR_subgroup_rotate extension operations to and from SPIR-V friendly IR, as well as to and from the cl_khr_subgroup_rotate OpenCL extension. Specifications: - https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.html - https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_subgroup_rotate Original commit: KhronosGroup/SPIRV-LLVM-Translator@a49de2a

The metadata had 7 entries whereas the function only has 6 parameters, causing foreachKernelArgMD to read past the end of an std::vector. Original commit: KhronosGroup/SPIRV-LLVM-Translator@14b843b

When passing `spirv-ext=,`, SPVExt will contain empty strings. Calling `front` on an empty string is undefined behavior, so first ensure that the string is not empty. Original commit: KhronosGroup/SPIRV-LLVM-Translator@30fb15c

This ensures the llvm-spirv tool is added to the "all" target, so that it is included when installing the project. Original commit: KhronosGroup/SPIRV-LLVM-Translator@ae90401

dwoodwor-intel · 2022-06-24T21:14:15Z

These test failures seem to all be caused by 4d3c010, but I'm not sure how to update the tests for this. I'll be getting back to this on Monday, but of course fixes from people more familiar with the cuda target and/or these tests are welcome in the meantime.

pvchupin · 2022-06-27T22:32:24Z

@AerialMantis, could somebody take a look please?

npmiller · 2022-06-28T17:27:54Z

Hello @dwoodwor-intel so I had a quick look, I don't have a full patch yet but the main problem of 4d3c010 seems to be in this part of the patch: 4d3c010#diff-6ffb509f195719a3d311315a823f7f9b222b273a58e6d5a80d251773eead7026L696

You see beforehand it getInputFilename was only using the cubin for OpenMP, and everything else still got the standard .o, but now it is also doing it for SYCL.

Where the problems seem to come from is that in the SYCL toolchain we're by-passing this getInputFilename function and simply getting the filename from the action, which means that the extension gets translated to cubin in the CUDA toolchain when building ptxas commands and such, but it doesn't get translated in the SYCL toolchain when building the llvm-foreach commands.

So I think what we need to do here is to make sure that the SYCL toolchain always does, for example TC.getInputFilename(II) rather than II.getFilename(), that way we should get the correctly transformed filename. And the toolchain class as a default implementation of getInputFilename that simply returns II.getFilename() for toolchains that don't make use of this.

For example this patch seems to fix the issue for llvm-foreach commands built by the SYCL toolchain:

diff --git a/clang/lib/Driver/ToolChains/SYCL.cpp b/clang/lib/Driver/ToolChains/SYCL.cpp
index 272462ff2260..b4462c8bf219 100644
--- a/clang/lib/Driver/ToolChains/SYCL.cpp
+++ b/clang/lib/Driver/ToolChains/SYCL.cpp
@@ -68,10 +68,10 @@ void SYCL::constructLLVMForeachCommand(Compilation &C, const JobAction &JA,
   // The llvm-foreach command looks like this:
   // llvm-foreach --in-file-list=a.list --in-replace='{}' -- echo '{}'
   ArgStringList ForeachArgs;
-  std::string OutputFileName(Output.getFilename());
+  std::string OutputFileName(T->getToolChain().getInputFilename(Output));
   ForeachArgs.push_back(C.getArgs().MakeArgString("--out-ext=" + Ext));
   for (auto &I : InputFiles) {
-    std::string Filename(I.getFilename());
+    std::string Filename(T->getToolChain().getInputFilename(I));
     ForeachArgs.push_back(
         C.getArgs().MakeArgString("--in-file-list=" + Filename));
     ForeachArgs.push_back(

I need to finish fixing this and test it a bit more but hopefully I should be able to get you a full patch for this tomorrow.

npmiller · 2022-06-29T15:48:25Z

Okay, so I think this should be the minimal patch to make it work:

npmiller@ccea66d

I've tested it a bit on AMD to see if there wasn't any obvious issues with other toolchains but it seems fine.

I'm curious to see how it goes in the CI, please feel free to ping me if the CI flags further issues after this patch is applied.

In a recent patch clang was aligned with OpenMP to rename CUDA fat-binaries to `.cubin` files rather than `.o` files. To get the correct filenames use `getInputFilename` from the toolchain object.

dwoodwor-intel · 2022-06-29T17:12:29Z

Thanks for the patch, @npmiller! I am seeing a new fail in Clang :: Driver/sycl-device-lib.cpp; can you take a look at it?

npmiller · 2022-06-29T17:57:27Z

Aaah of course, this should fix it:

npmiller@0f72d92

`getInputFilename` returns a `std::string` so just taking the `c_str` without keeping track of the string object doesn't work.

dwoodwor-intel · 2022-06-29T18:51:20Z

@npmiller, it looks like there is one problem on AMD in SYCL :: Reduction/reduction_span_pack.cpp

npmiller · 2022-06-30T09:18:50Z

I had a quick look, it's not related to the other patch as far as I can tell.

I also couldn't reproduce it on MI100 so I'm not too sure what's going on. We've had such cases in the past, MI100 and the CI GPU are pretty different (32 threads vs 64 threads wavefronts for example, #5386), and I don't have access to a GPU similar to what the CI uses at the moment, so it's difficult to investigate.

Would you be able to try track down which revision broke it like you did for the other bugs? That might give us a lead on what's going on.

dwoodwor-intel · 2022-07-01T21:59:57Z

I'm having a lot of trouble reproducing this too; @pvchupin also tried re-running the CI but we're seeing the same test failure and also a few new cuda ones. I'll have to get back to this on Tuesday.

dwoodwor-intel · 2022-07-06T18:42:41Z

The SYCL :: Reduction/reduction_span_pack.cpp failure is being tracked in #6409 and has been temporarily diabled in intel/llvm-test-suite#1079; all other tests are passing. It should be safe to merge this now.

dwoodwor-intel · 2022-07-06T18:43:30Z

/merge

bb-sycl · 2022-07-06T18:43:59Z

Wed 06 Jul 2022 06:43:59 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

bb-sycl · 2022-07-06T18:47:39Z

Wed 06 Jul 2022 06:47:39 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

philnik777 and others added 30 commits June 22, 2022 10:02

Merge from 'sycl' to 'sycl-web'

1a60e5f

[libc++] Complete the implementation of N4190

681cde7

Fixes #37402 Reviewed By: ldionne Spies: EricWF, avogelsgesang, libcxx-commits, arphaman Differential Revision: https://reviews.llvm.org/D124346

[mlir][Transform] NFC - Pass TransformState as an argument to applyTo…

74f0660

…One methods This will allow implementing state-dependent behavior in the future. Differential Revision: https://reviews.llvm.org/D128327

[X86] Don't run O2 pipeline in codegen test (NFC)

f986976

Regenerate the test with current O2 result and only run CGP.

[XCOFF] write the real source file name in C_FILE symbol.

d29e986

The symbol table starts with all the C_FILE symbols. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D126623

[ADCE] Update tests to use opaque pointers (NFC)

563cf93

Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 This time without any manual fixup.

[AggressiveInstCombine] Update tests to use opaque pointers (NFC)

7c802f9

Update performed using (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

[AddDiscriminators] Migrate tests to opaque pointers (NFC)

fc655a9

Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 memcpy-discriminator.ll was fixed up to use named instructions and drop the no longer needed bitcasts.

[AlignmentFromAssumptions] Migrate tests to opaque pointers (NFC)

615efc3

Tests were updated with (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

Merge from 'main' to 'sycl-web' (235 commits)

9065cb2

CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp CONFLICT (content): Merge conflict in llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h

Add -opaque-pointers for failed clang test

4e83986

[Tests] Remove unnecessary bitcasts from opaque pointer tests (NFC)

60a3215

Previously left these behind due to the required instruction renumbering, drop them now. This more accurately represents opaque pointer input IR. Also drop duplicate opaque pointer check lines in one SROA test.

[DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDema…

2c3a4a9

…ndedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.

[NVPTX] Keep metadata attached to module-scope variables

8958e70

This helps to preserve the debug information of global variables. Differential Revision: https://reviews.llvm.org/D127510

[NFC] Remove dead code

9803db8

[X86] Name instructions in test (NFC)

7e6af44

Run the test through -instnamer, which makes it simpler to modify it.

svenvh added 4 commits June 24, 2022 11:32

Fix excess kernel_arg_type_qual entry in test

99af174

The metadata had 7 entries whereas the function only has 6 parameters, causing foreachKernelArgMD to read past the end of an std::vector. Original commit: KhronosGroup/SPIRV-LLVM-Translator@14b843b

Fix UB when validating spirv-ext option

272c12b

When passing `spirv-ext=,`, SPVExt will contain empty strings. Calling `front` on an empty string is undefined behavior, so first ensure that the string is not empty. Original commit: KhronosGroup/SPIRV-LLVM-Translator@30fb15c

Enable LLVM_BUILD_TOOLS for out-of-tree builds

04fed93

This ensures the llvm-spirv tool is added to the "all" target, so that it is included when installing the project. Original commit: KhronosGroup/SPIRV-LLVM-Translator@ae90401

dwoodwor-intel added the disable-lint Skip linter check step and proceed with build jobs label Jun 24, 2022

dwoodwor-intel requested review from a team and pvchupin as code owners June 24, 2022 15:44

[SYCL] Fix file names in SYCL toolchain

2c217cf

In a recent patch clang was aligned with OpenMP to rename CUDA fat-binaries to `.cubin` files rather than `.o` files. To get the correct filenames use `getInputFilename` from the toolchain object.

[SYCL] Fix filename string data going out of scope

f6667e1

`getInputFilename` returns a `std::string` so just taking the `c_str` without keeping track of the string object doesn't work.

dwoodwor-intel mentioned this pull request Jul 6, 2022

SYCL :: Reduction/reduction_span_pack.cpp fails on hip/amd after pulldown #6409

Closed

bb-sycl approved these changes Jul 6, 2022

View reviewed changes

bb-sycl merged commit 5331441 into intel:sycl Jul 6, 2022

pvchupin mentioned this pull request Jul 7, 2022

[SYCL][L0] Fix piEnqueueEventsWaitWithBarrier for L0 #6359

Merged

pvchupin mentioned this pull request Aug 5, 2022

LLVM and SPIRV-LLVM-Translator pulldown (WW33) #6536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

Uh oh!

dwoodwor-intel commented Jun 24, 2022

Uh oh!

dwoodwor-intel commented Jun 24, 2022

Uh oh!

pvchupin commented Jun 27, 2022

Uh oh!

npmiller commented Jun 28, 2022

Uh oh!

npmiller commented Jun 29, 2022

Uh oh!

dwoodwor-intel commented Jun 29, 2022

Uh oh!

npmiller commented Jun 29, 2022

Uh oh!

dwoodwor-intel commented Jun 29, 2022

Uh oh!

npmiller commented Jun 30, 2022

Uh oh!

dwoodwor-intel commented Jul 1, 2022

Uh oh!

dwoodwor-intel commented Jul 6, 2022

Uh oh!

dwoodwor-intel commented Jul 6, 2022

Uh oh!

bb-sycl commented Jul 6, 2022

Uh oh!

bb-sycl commented Jul 6, 2022

Uh oh!

Uh oh!

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

Uh oh!

Conversation

dwoodwor-intel commented Jun 24, 2022

Uh oh!

dwoodwor-intel commented Jun 24, 2022

Uh oh!

pvchupin commented Jun 27, 2022

Uh oh!

npmiller commented Jun 28, 2022

Uh oh!

npmiller commented Jun 29, 2022

Uh oh!

dwoodwor-intel commented Jun 29, 2022

Uh oh!

npmiller commented Jun 29, 2022

Uh oh!

dwoodwor-intel commented Jun 29, 2022

Uh oh!

npmiller commented Jun 30, 2022

Uh oh!

dwoodwor-intel commented Jul 1, 2022

Uh oh!

dwoodwor-intel commented Jul 6, 2022

Uh oh!

dwoodwor-intel commented Jul 6, 2022

Uh oh!

bb-sycl commented Jul 6, 2022

Uh oh!

bb-sycl commented Jul 6, 2022

Uh oh!

Uh oh!