-
Notifications
You must be signed in to change notification settings - Fork 787
LLVM and SPIRV-LLVM-Translator pulldown (WW27) #6358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`std::function` has been deprecated for a few releases now. Remove it with an option to opt-back-in with a note that this option will be removed in LLVM 16. Reviewed By: ldionne, #libc Spies: #libc_vendors, EricWF, jloser, libcxx-commits Differential Revision: https://reviews.llvm.org/D127908
Fixes #37402 Reviewed By: ldionne Spies: EricWF, avogelsgesang, libcxx-commits, arphaman Differential Revision: https://reviews.llvm.org/D124346
Previously `#pragma STDC FENV_ACCESS ON` always set dynamic rounding mode and strict exception handling. It is not correct in the presence of other pragmas that also modify rounding mode and exception handling. For example, the effect of previous pragma FENV_ROUND could be cancelled, which is not conformant with the C standard. Also `#pragma STDC FENV_ACCESS OFF` turned off only FEnvAccess flag, leaving rounding mode and exception handling unchanged, which is incorrect in general case. Concrete rounding and exception mode depend on a combination of several factors like various pragmas and command-line options. During the review of this patch an idea was proposed that the semantic actions associated with such pragmas should only set appropriate flags. Actual rounding mode and exception handling should be calculated taking into account the state of all relevant options. In such implementation the pragma FENV_ACCESS should not override properties set by other pragmas but should set them if such setting is absent. To implement this approach the following main changes are made: - Field `FPRoundingMode` is removed from `LangOptions`. Actually there are no options that set it to arbitrary rounding mode, the choice was only `dynamic` or `tonearest`. Instead, a new boolean flag `RoundingMath` is added, with the same meaning as the corresponding command-line option. - Type `FPExceptionModeKind` now has possible value `FPE_Default`. It does not represent any particular exception mode but indicates that such mode was not set and default value should be used. It allows to distinguish the case: { #pragma STDC FENV_ACCESS ON ... } where the pragma must set FPE_Strict, from the case: { #pragma clang fp exceptions(ignore) #pragma STDC FENV_ACCESS ON ... } where exception mode should remain `FPE_Ignore`. - Class `FPOptions` has now methods `getRoundingMode` and `getExceptionMode`, which calculates the respective properties from other specified FP properties. - Class `LangOptions` has now methods `getDefaultRoundingMode` and `getDefaultExceptionMode`, which calculates default modes from the specified options and should be used instead of `getRoundingMode` and `getFPExceptionMode` of the same class. Differential Revision: https://reviews.llvm.org/D126364
…One methods This will allow implementing state-dependent behavior in the future. Differential Revision: https://reviews.llvm.org/D128327
As branch on undef is immediate undefined behavior, there is no need to mark one of the edges as feasible. We can leave all the edges non-feasible. In IPSCCP, we can replace the branch with an unreachable terminator. Differential Revision: https://reviews.llvm.org/D126962
…of SFINAE For an lvalue reference to a move only view x, views::all(x) gives hard error because the expression inside noexcept is not well formed and it is not SFINAE friendly. Given a move only view type `V`, and a concept ``` template <class R> concept can_all = requires { std::views::all(std::declval<R>()); }; ``` The expression `can_all<V&>` returns libstdc++: false msvc stl : false libc++ : error: static_cast from 'V' to 'typename decay<decltype((std::forward<V &>(__t)))>::type' (aka 'V') uses deleted function noexcept(noexcept(_LIBCPP_AUTO_CAST(std::forward<_Tp>(__t)))) The standard spec has its own problem, the spec says it is expression equivalent to `decay-copy(E)` but the spec of `decay-copy` does not have any constraint, which means the expression `decay-copy(declval<V&>())` is well-formed and the concept `can_all<V&>` should return true and should error when instantiating the function body of decay-copy. This is clearly wrong behaviour in the spec and we will probably create an LWG issue. But the libc++'s behaviour is clearly not correct. The `noexcept` is an "extension" in libc++ which is not in the spec, but the expression inside `noexpect` triggers hard error, which is not right. Reviewed By: #libc, ldionne, var-const Differential Revision: https://reviews.llvm.org/D128281
This patch adds support for: * Querying the PSTATE.SM state with @llvm.aarch64.sme.get.pstatesm * Reading/writing the TPIDR2 register with new @llvm.aarch64.sme.get.tpidr2 and @llvm.aarch64.sme.set.tpidr2 intrinsics. Tests added here: CodeGen/AArch64/sme-get-pstatesm.ll CodeGen/AArch64/sme-read-write-tpidr2.ll Differential Revision: https://reviews.llvm.org/D127957
Regenerate the test with current O2 result and only run CGP.
This fixes the combining of constant vector GEP operands in the optimization of MVE gather/scatter addresses, when opaque pointers are enabled. As opaque pointers reduce the number of bitcasts between geps, more can be folded than before. This can cause problems if the index types are now different between the two geps. This fixes that by making sure each constant is scaled appropriately, which has the effect of transforming the geps to have a scale of 1, changing [r0, q0, uxtw intel#1] gathers to [r0, q0] with a larger q0. This helps use a simpler instruction that doesn't need the extra uxtw. Differential Revision: https://reviews.llvm.org/D127733
The symbol table starts with all the C_FILE symbols. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D126623
Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 This time without any manual fixup.
Update performed using (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
Update performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 memcpy-discriminator.ll was fixed up to use named instructions and drop the no longer needed bitcasts.
This patch is a subpart of D125768 intented to make the review easier. The `SizedOp` struct represents operations to be performed on a certain number of bytes. It is responsible for breaking them down into platform types and forwarded to the `Backend`. The `Backend` struct represents a lower level abstraction that works only on types (`uint8_t`, `__m128i`, ...). It is similar to instruction selection. Differential Revision: https://reviews.llvm.org/D126768
Tests were updated with (without manual fixup): https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
CONFLICT (content): Merge conflict in llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp CONFLICT (content): Merge conflict in llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h
Previously left these behind due to the required instruction renumbering, drop them now. This more accurately represents opaque pointer input IR. Also drop duplicate opaque pointer check lines in one SROA test.
Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).
…ndedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
This helps to preserve the debug information of global variables. Differential Revision: https://reviews.llvm.org/D127510
The misc-unused-parameters check would trigger false positive warnings about the parameter being unused when the parameter declaration was invalid. No longer issue the warning in that case on the assumption that most parameters are used in practice, so the extra diagnostic is most likely a false positive. Fixes #56152
This comment became outdated in 053eb35 (but was moved along); that commit moved the code and the comment to a separate function, with a separate local variable `num_of_bytes_read`. On error, the possibly garbage value is never copied back to the caller's reference, thus the comment is no longer relevant (and slightly confusing as is). Differential Revision: https://reviews.llvm.org/D128226
Run the test through -instnamer, which makes it simpler to modify it.
The linker wrapper currently eagerly extracts all identified offloading binaries to a file. This isn't ideal because we will soon open these files again to examine their symbols for LTO and other things. Additionally, we may not use every extracted file in the case of static libraries. This would be very noisy in the case of static libraries that may contain code for several targets not participating in the current link. Recent changes allow us to treat an Offloading binary as a standard binary class. So that allows us to use an OwningBinary to model the file. Now we keep it in memory and only write it once we know which files will be participating in the final link job. This also reworks a lot of the structure around how we handle this by removing the old DeviceFile class. The main benefit from this is that the following doesn't output 32+ files and instead will only output a single temp file for the linked module. ``` $ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps ``` Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127246
This patch updates the `--[no-]offload-arch` command line arguments to allow the user to pass in many architectures in a single argument rather than specifying it multiple times. This means that the following command line, ``` clang foo.cu --offload-arch=sm_70 --offload-arch=sm_80 ``` can become: ``` clang foo.cu --offload-arch=sm_70,sm_80 ``` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D128206
This patch adds a new transferToOtherSystem helper that tries to transfer information from signed predicates to the unsigned system and vice versa. The initial version adds A >=u B for A >=s B && B >=s 0 https://alive2.llvm.org/ce/z/8b6F9i
Add support for mapping the SPV_KHR_subgroup_rotate extension operations to and from SPIR-V friendly IR, as well as to and from the cl_khr_subgroup_rotate OpenCL extension. Specifications: - https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.html - https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_subgroup_rotate Original commit: KhronosGroup/SPIRV-LLVM-Translator@a49de2a
The metadata had 7 entries whereas the function only has 6 parameters, causing foreachKernelArgMD to read past the end of an std::vector. Original commit: KhronosGroup/SPIRV-LLVM-Translator@14b843b
When passing `spirv-ext=,`, SPVExt will contain empty strings. Calling `front` on an empty string is undefined behavior, so first ensure that the string is not empty. Original commit: KhronosGroup/SPIRV-LLVM-Translator@30fb15c
This ensures the llvm-spirv tool is added to the "all" target, so that it is included when installing the project. Original commit: KhronosGroup/SPIRV-LLVM-Translator@ae90401
These test failures seem to all be caused by 4d3c010, but I'm not sure how to update the tests for this. I'll be getting back to this on Monday, but of course fixes from people more familiar with the cuda target and/or these tests are welcome in the meantime. |
@AerialMantis, could somebody take a look please? |
Hello @dwoodwor-intel so I had a quick look, I don't have a full patch yet but the main problem of 4d3c010 seems to be in this part of the patch: 4d3c010#diff-6ffb509f195719a3d311315a823f7f9b222b273a58e6d5a80d251773eead7026L696 You see beforehand it Where the problems seem to come from is that in the SYCL toolchain we're by-passing this So I think what we need to do here is to make sure that the SYCL toolchain always does, for example For example this patch seems to fix the issue for diff --git a/clang/lib/Driver/ToolChains/SYCL.cpp b/clang/lib/Driver/ToolChains/SYCL.cpp
index 272462ff2260..b4462c8bf219 100644
--- a/clang/lib/Driver/ToolChains/SYCL.cpp
+++ b/clang/lib/Driver/ToolChains/SYCL.cpp
@@ -68,10 +68,10 @@ void SYCL::constructLLVMForeachCommand(Compilation &C, const JobAction &JA,
// The llvm-foreach command looks like this:
// llvm-foreach --in-file-list=a.list --in-replace='{}' -- echo '{}'
ArgStringList ForeachArgs;
- std::string OutputFileName(Output.getFilename());
+ std::string OutputFileName(T->getToolChain().getInputFilename(Output));
ForeachArgs.push_back(C.getArgs().MakeArgString("--out-ext=" + Ext));
for (auto &I : InputFiles) {
- std::string Filename(I.getFilename());
+ std::string Filename(T->getToolChain().getInputFilename(I));
ForeachArgs.push_back(
C.getArgs().MakeArgString("--in-file-list=" + Filename));
ForeachArgs.push_back( I need to finish fixing this and test it a bit more but hopefully I should be able to get you a full patch for this tomorrow. |
Okay, so I think this should be the minimal patch to make it work: I've tested it a bit on AMD to see if there wasn't any obvious issues with other toolchains but it seems fine. I'm curious to see how it goes in the CI, please feel free to ping me if the CI flags further issues after this patch is applied. |
In a recent patch clang was aligned with OpenMP to rename CUDA fat-binaries to `.cubin` files rather than `.o` files. To get the correct filenames use `getInputFilename` from the toolchain object.
Thanks for the patch, @npmiller! I am seeing a new fail in |
Aaah of course, this should fix it: |
`getInputFilename` returns a `std::string` so just taking the `c_str` without keeping track of the string object doesn't work.
@npmiller, it looks like there is one problem on AMD in |
I had a quick look, it's not related to the other patch as far as I can tell. I also couldn't reproduce it on MI100 so I'm not too sure what's going on. We've had such cases in the past, MI100 and the CI GPU are pretty different (32 threads vs 64 threads wavefronts for example, #5386), and I don't have access to a GPU similar to what the CI uses at the moment, so it's difficult to investigate. Would you be able to try track down which revision broke it like you did for the other bugs? That might give us a lead on what's going on. |
I'm having a lot of trouble reproducing this too; @pvchupin also tried re-running the CI but we're seeing the same test failure and also a few new cuda ones. I'll have to get back to this on Tuesday. |
The |
/merge |
Wed 06 Jul 2022 06:43:59 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes. |
Wed 06 Jul 2022 06:47:39 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later. |
LLVM: llvm/llvm-project@9aaba9d
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@ae90401