Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 692 commits into from
Jul 6, 2022

Conversation

dwoodwor-intel
Copy link
Contributor

philnik777 and others added 30 commits June 22, 2022 10:02
`std::function` has been deprecated for a few releases now. Remove it with an option to opt-back-in with a note that this option will be removed in LLVM 16.

Reviewed By: ldionne, #libc

Spies: #libc_vendors, EricWF, jloser, libcxx-commits

Differential Revision: https://reviews.llvm.org/D127908
Fixes #37402

Reviewed By: ldionne

Spies: EricWF, avogelsgesang, libcxx-commits, arphaman

Differential Revision: https://reviews.llvm.org/D124346
Previously `#pragma STDC FENV_ACCESS ON` always set dynamic rounding
mode and strict exception handling. It is not correct in the presence
of other pragmas that also modify rounding mode and exception handling.
For example, the effect of previous pragma FENV_ROUND could be
cancelled, which is not conformant with the C standard. Also
`#pragma STDC FENV_ACCESS OFF` turned off only FEnvAccess flag, leaving
rounding mode and exception handling unchanged, which is incorrect in
general case.

Concrete rounding and exception mode depend on a combination of several
factors like various pragmas and command-line options. During the review
of this patch an idea was proposed that the semantic actions associated
with such pragmas should only set appropriate flags. Actual rounding
mode and exception handling should be calculated taking into account the
state of all relevant options. In such implementation the pragma
FENV_ACCESS should not override properties set by other pragmas but
should set them if such setting is absent.

To implement this approach the following main changes are made:

- Field `FPRoundingMode` is removed from `LangOptions`. Actually there
  are no options that set it to arbitrary rounding mode, the choice was
  only `dynamic` or `tonearest`. Instead, a new boolean flag
  `RoundingMath` is added, with the same meaning as the corresponding
  command-line option.

- Type `FPExceptionModeKind` now has possible value `FPE_Default`. It
  does not represent any particular exception mode but indicates that
  such mode was not set and default value should be used. It allows to
  distinguish the case:

    {
        #pragma STDC FENV_ACCESS ON
	...
    }

  where the pragma must set FPE_Strict, from the case:

    {
        #pragma clang fp exceptions(ignore)
        #pragma STDC FENV_ACCESS ON
        ...
    }

  where exception mode should remain `FPE_Ignore`.

  - Class `FPOptions` has now methods `getRoundingMode` and
  `getExceptionMode`, which calculates the respective properties from
  other specified FP properties.

  - Class `LangOptions` has now methods `getDefaultRoundingMode` and
  `getDefaultExceptionMode`, which calculates default modes from the
  specified options and should be used instead of `getRoundingMode` and
  `getFPExceptionMode` of the same class.

Differential Revision: https://reviews.llvm.org/D126364
…One methods

This will allow implementing state-dependent behavior in the future.

Differential Revision: https://reviews.llvm.org/D128327
As branch on undef is immediate undefined behavior, there is no need
to mark one of the edges as feasible. We can leave all the edges
non-feasible. In IPSCCP, we can replace the branch with an unreachable
terminator.

Differential Revision: https://reviews.llvm.org/D126962
…of SFINAE

For an lvalue reference to a move only view x, views::all(x) gives hard error because the expression inside noexcept is not well formed and it is not SFINAE friendly.

Given a move only view type `V`, and a concept

```
template <class R>
concept can_all = requires {
    std::views::all(std::declval<R>());
};
```

The expression `can_all<V&>` returns
libstdc++: false
msvc stl : false
libc++   : error: static_cast from 'V' to 'typename decay<decltype((std::forward<V &>(__t)))>::type' (aka 'V') uses deleted function
      noexcept(noexcept(_LIBCPP_AUTO_CAST(std::forward<_Tp>(__t))))

The standard spec has its own problem, the spec says it is expression equivalent to `decay-copy(E)` but the spec of `decay-copy` does not have any constraint, which means the expression `decay-copy(declval<V&>())` is well-formed and the concept `can_all<V&>` should return true and should error when instantiating the function body of decay-copy. This is clearly wrong behaviour in the spec and we will probably create an LWG issue. But the libc++'s behaviour is clearly not correct. The `noexcept` is an "extension" in libc++ which is not in the spec, but the expression inside `noexpect` triggers hard error, which is not right.

Reviewed By: #libc, ldionne, var-const

Differential Revision: https://reviews.llvm.org/D128281
This patch adds support for:

* Querying the PSTATE.SM state with @llvm.aarch64.sme.get.pstatesm
* Reading/writing the TPIDR2 register with new
@llvm.aarch64.sme.get.tpidr2 and @llvm.aarch64.sme.set.tpidr2
intrinsics.

Tests added here:

  CodeGen/AArch64/sme-get-pstatesm.ll
  CodeGen/AArch64/sme-read-write-tpidr2.ll

Differential Revision: https://reviews.llvm.org/D127957
Regenerate the test with current O2 result and only run CGP.
This fixes the combining of constant vector GEP operands in the
optimization of MVE gather/scatter addresses, when opaque pointers are
enabled. As opaque pointers reduce the number of bitcasts between geps,
more can be folded than before. This can cause problems if the index
types are now different between the two geps.

This fixes that by making sure each constant is scaled appropriately,
which has the effect of transforming the geps to have a scale of 1,
changing [r0, q0, uxtw intel#1] gathers to [r0, q0] with a larger q0. This
helps use a simpler instruction that doesn't need the extra uxtw.

Differential Revision: https://reviews.llvm.org/D127733
The symbol table starts with all the C_FILE symbols.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D126623
Update performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

memcpy-discriminator.ll was fixed up to use named instructions
and drop the no longer needed bitcasts.
This patch is a subpart of D125768 intented to make the review easier.

The `SizedOp` struct represents operations to be performed on a certain number of bytes.
It is responsible for breaking them down into platform types and forwarded to the `Backend`.

The `Backend` struct represents a lower level abstraction that works only on types (`uint8_t`, `__m128i`, ...).
It is similar to instruction selection.

Differential Revision: https://reviews.llvm.org/D126768
  CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
  CONFLICT (content): Merge conflict in llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h
Previously left these behind due to the required instruction
renumbering, drop them now. This more accurately represents
opaque pointer input IR.

Also drop duplicate opaque pointer check lines in one SROA test.
This is a reland of D126773 / b2a9ea4.

The removal of `-mllvm -combiner-global-alias-analysis` has landed separately
in D128051 / 7b73f53.

And the removal of `-mllvm --tail-merge-threshold=0` is scheduled for
removal in a subsequent patch.
Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test passed without
further modification (which is almost all of them, as the backend
is largely pointer-type agnostic).
…ndedBits

Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
This helps to preserve the debug information of global variables.

Differential Revision: https://reviews.llvm.org/D127510
The misc-unused-parameters check would trigger false positive warnings
about the parameter being unused when the parameter declaration was
invalid. No longer issue the warning in that case on the assumption
that most parameters are used in practice, so the extra diagnostic is
most likely a false positive.

Fixes #56152
This comment became outdated in 053eb35
(but was moved along); that commit moved the code and the comment
to a separate function, with a separate local variable
`num_of_bytes_read`. On error, the possibly garbage value is never
copied back to the caller's reference, thus the comment is no longer
relevant (and slightly confusing as is).

Differential Revision: https://reviews.llvm.org/D128226
Run the test through -instnamer, which makes it simpler to modify
it.
The linker wrapper currently eagerly extracts all identified offloading
binaries to a file. This isn't ideal because we will soon open these
files again to examine their symbols for LTO and other things.
Additionally, we may not use every extracted file in the case of static
libraries. This would be very noisy in the case of static libraries that
may contain code for several targets not participating in the current
link.

Recent changes allow us to treat an Offloading binary as a standard
binary class. So that allows us to use an OwningBinary to model the
file. Now we keep it in memory and only write it once we know which
files will be participating in the final link job. This also reworks a
lot of the structure around how we handle this by removing the old
DeviceFile class.

The main benefit from this is that the following doesn't output 32+ files and
instead will only output a single temp file for the linked module.
```
$ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps
```

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127246
This patch updates the `--[no-]offload-arch` command line arguments to
allow the user to pass in many architectures in a single argument rather
than specifying it multiple times. This means that the following command
line,
```
clang foo.cu --offload-arch=sm_70 --offload-arch=sm_80
```
can become:
```
clang foo.cu --offload-arch=sm_70,sm_80
```

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D128206
This patch adds a new transferToOtherSystem helper that tries to
transfer information from signed predicates to the unsigned system and
vice versa.

The initial version adds A >=u B for A >=s B && B >=s 0

https://alive2.llvm.org/ce/z/8b6F9i
svenvh added 4 commits June 24, 2022 11:32
Add support for mapping the SPV_KHR_subgroup_rotate extension
operations to and from SPIR-V friendly IR, as well as to and from the
cl_khr_subgroup_rotate OpenCL extension.

Specifications:
 - https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.html
 - https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_subgroup_rotate

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@a49de2a
The metadata had 7 entries whereas the function only has 6 parameters,
causing foreachKernelArgMD to read past the end of an std::vector.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@14b843b
When passing `spirv-ext=,`, SPVExt will contain empty strings.
Calling `front` on an empty string is undefined behavior, so first
ensure that the string is not empty.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@30fb15c
This ensures the llvm-spirv tool is added to the "all" target, so that
it is included when installing the project.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@ae90401
@dwoodwor-intel dwoodwor-intel added the disable-lint Skip linter check step and proceed with build jobs label Jun 24, 2022
@dwoodwor-intel dwoodwor-intel requested review from a team and pvchupin as code owners June 24, 2022 15:44
@dwoodwor-intel
Copy link
Contributor Author

These test failures seem to all be caused by 4d3c010, but I'm not sure how to update the tests for this. I'll be getting back to this on Monday, but of course fixes from people more familiar with the cuda target and/or these tests are welcome in the meantime.

@pvchupin
Copy link
Contributor

@AerialMantis, could somebody take a look please?

@npmiller
Copy link
Contributor

Hello @dwoodwor-intel so I had a quick look, I don't have a full patch yet but the main problem of 4d3c010 seems to be in this part of the patch: 4d3c010#diff-6ffb509f195719a3d311315a823f7f9b222b273a58e6d5a80d251773eead7026L696

You see beforehand it getInputFilename was only using the cubin for OpenMP, and everything else still got the standard .o, but now it is also doing it for SYCL.

Where the problems seem to come from is that in the SYCL toolchain we're by-passing this getInputFilename function and simply getting the filename from the action, which means that the extension gets translated to cubin in the CUDA toolchain when building ptxas commands and such, but it doesn't get translated in the SYCL toolchain when building the llvm-foreach commands.

So I think what we need to do here is to make sure that the SYCL toolchain always does, for example TC.getInputFilename(II) rather than II.getFilename(), that way we should get the correctly transformed filename. And the toolchain class as a default implementation of getInputFilename that simply returns II.getFilename() for toolchains that don't make use of this.

For example this patch seems to fix the issue for llvm-foreach commands built by the SYCL toolchain:

diff --git a/clang/lib/Driver/ToolChains/SYCL.cpp b/clang/lib/Driver/ToolChains/SYCL.cpp
index 272462ff2260..b4462c8bf219 100644
--- a/clang/lib/Driver/ToolChains/SYCL.cpp
+++ b/clang/lib/Driver/ToolChains/SYCL.cpp
@@ -68,10 +68,10 @@ void SYCL::constructLLVMForeachCommand(Compilation &C, const JobAction &JA,
   // The llvm-foreach command looks like this:
   // llvm-foreach --in-file-list=a.list --in-replace='{}' -- echo '{}'
   ArgStringList ForeachArgs;
-  std::string OutputFileName(Output.getFilename());
+  std::string OutputFileName(T->getToolChain().getInputFilename(Output));
   ForeachArgs.push_back(C.getArgs().MakeArgString("--out-ext=" + Ext));
   for (auto &I : InputFiles) {
-    std::string Filename(I.getFilename());
+    std::string Filename(T->getToolChain().getInputFilename(I));
     ForeachArgs.push_back(
         C.getArgs().MakeArgString("--in-file-list=" + Filename));
     ForeachArgs.push_back(

I need to finish fixing this and test it a bit more but hopefully I should be able to get you a full patch for this tomorrow.

@npmiller
Copy link
Contributor

Okay, so I think this should be the minimal patch to make it work:

I've tested it a bit on AMD to see if there wasn't any obvious issues with other toolchains but it seems fine.

I'm curious to see how it goes in the CI, please feel free to ping me if the CI flags further issues after this patch is applied.

In a recent patch clang was aligned with OpenMP to rename CUDA
fat-binaries to `.cubin` files rather than `.o` files.

To get the correct filenames use `getInputFilename` from the toolchain
object.
@dwoodwor-intel
Copy link
Contributor Author

Thanks for the patch, @npmiller! I am seeing a new fail in Clang :: Driver/sycl-device-lib.cpp; can you take a look at it?

@npmiller
Copy link
Contributor

Aaah of course, this should fix it:

`getInputFilename` returns a `std::string` so just taking the `c_str`
without keeping track of the string object doesn't work.
@dwoodwor-intel
Copy link
Contributor Author

@npmiller, it looks like there is one problem on AMD in SYCL :: Reduction/reduction_span_pack.cpp

@npmiller
Copy link
Contributor

I had a quick look, it's not related to the other patch as far as I can tell.

I also couldn't reproduce it on MI100 so I'm not too sure what's going on. We've had such cases in the past, MI100 and the CI GPU are pretty different (32 threads vs 64 threads wavefronts for example, #5386), and I don't have access to a GPU similar to what the CI uses at the moment, so it's difficult to investigate.

Would you be able to try track down which revision broke it like you did for the other bugs? That might give us a lead on what's going on.

@dwoodwor-intel
Copy link
Contributor Author

I'm having a lot of trouble reproducing this too; @pvchupin also tried re-running the CI but we're seeing the same test failure and also a few new cuda ones. I'll have to get back to this on Tuesday.

@dwoodwor-intel
Copy link
Contributor Author

The SYCL :: Reduction/reduction_span_pack.cpp failure is being tracked in #6409 and has been temporarily diabled in intel/llvm-test-suite#1079; all other tests are passing. It should be safe to merge this now.

@dwoodwor-intel
Copy link
Contributor Author

/merge

@bb-sycl
Copy link
Contributor

bb-sycl commented Jul 6, 2022

Wed 06 Jul 2022 06:43:59 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

@bb-sycl
Copy link
Contributor

bb-sycl commented Jul 6, 2022

Wed 06 Jul 2022 06:47:39 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.