Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW27) #4006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 610 commits into from
Jun 29, 2021

Conversation

vmaksimo
Copy link
Contributor

@vmaksimo vmaksimo commented Jun 28, 2021

RosieSumpter and others added 30 commits June 24, 2021 12:02
OR, XOR and AND entries are added to the cost table. An extra cost
is added when vector splitting occurs.

This is done to address the issue of a missed SLP vectorization
opportunity due to unreasonably high costs being attributed to the vector
Or reduction (see: https://bugs.llvm.org/show_bug.cgi?id=44593).

Differential Revision: https://reviews.llvm.org/D104538
This patch generalizes MatchBinaryAddToConst to support matching
(A + C1), (A + C2), instead of just matching (A + C1), A.

The existing cases can be handled by treating non-add expressions A as
A + 0.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D104634
It looks like the fold introduced in 63f3383 can cause crashes
if the type of the bitcasted value is not a valid vector element type,
like x86_mmx.

To resolve the crash, reject invalid vector element types. The way it is
done in the patch is a bit clunky. Perhaps there's a better way to
check?

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104792
Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.

Differential Revision: https://reviews.llvm.org/D104846
As a minor adjustment to the existing lowering of offset scatters, this
extends any smaller-than-legal vectors into full vectors using a zext,
so that the truncating scatters can be used. Due to the way MVE
legalizes the vectors this should be cheap in most situations, and will
prevent the vector from being scalarized.

Differential Revision: https://reviews.llvm.org/D103704
This patch enables the salvaging of debug values that may be calculated
from more than one SSA value, such as with binary operators that do not
use a constant argument. The actual functionality for this behaviour is
added in a previous commit (c727056), but with the ability to actually
emit the resulting debug values switched off.

The reason for this is that the prior patch has been reverted several
times due to issues discovered downstream, some time after the actual
landing of the patch. The patch in question is rather large and touches
several widely used header files, and all issues discovered are more
related to the handling of variadic debug values as a whole rather than
the details of the patch itself. Therefore, to minimize the build time
impact and risk of conflicts involved in any potential future
revert/reapply of that patch, this significantly smaller patch (that
touches no header files) will instead be used as the capstone to enable
variadic debug value salvaging.

The review linked to this patch is mostly implemented by the previous
commit, c727056, but also contains the changes in this patch.

Differential Revision: https://reviews.llvm.org/D91722
Extend the yaml code generation to support the index attributes that https://reviews.llvm.org/D104711 added to the OpDSL.

Differential Revision: https://reviews.llvm.org/D104712
This patch handles sinking a replicate region after another replicate
region. In that case, we can connect the sink region after the target
region. This properly handles the case for which an assertion has been
added in 337d765.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D103514
This adds the MemoryTagManager class and a specialisation
of that class for AArch64 MTE tags. It provides a generic
interface for various tagging operations.
Adding/removing tags, diffing tagged pointers, etc.

Later patches will use this manager to handle memory tags
in generic code in both lldb and lldb-server.
Since it will be used in both, the base class header is in
lldb/Target.
(MemoryRegionInfo is another example of this pattern)

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D97281
Add an index_dim annotation to specify the shape to loop mapping of shape-only tensors. A shape-only tensor serves is not accessed withing the body of the operation but is required to span the iteration space of certain operations such as pooling.

Differential Revision: https://reviews.llvm.org/D104767
Everything includes clang/Config/config.h by qualified "clang/Config/config.h"
path, so there's no need for `-Igen/clang/include/clang/Config/clang/include`.

No behavior change.
This feature "memory-tagging+" indicates that lldb-server
supports memory tagging packets. (added in a later patch)

We check HWCAP2_MTE to decide whether to enable this
feature for Linux.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D97282
Use %zu to print size_t vars.
…ion (7/n)

scf::ForOp bufferization analysis proceeds just like for any other op (including FuncOp) at its boundaries; i.e. if:

1. The tensor operand is inplaceable.
2. The matching result has no subsequent read (i.e. all reads dominate the scf::ForOp).
3. In  and does not create a RAW interference.

then it can bufferize inplace.

Still there are a few differences:

1. bbArgs for an scf::ForOp are always considered inplaceable when seen from ops inside the body. This is because a) either the matching tensor operand is not inplaceable and an alloc will be inserted (which makes bbArg itself inplaceable); or b) the tensor operand and bbArg are both already inplaceable.
2. Bufferization within the scf::ForOp body has implications to the outside world : the scf.yield terminator may well ping-pong values of the same type. This muddies the water for alias analysis and is not supported atm. Such cases result in a pass failure.

Differential revision: https://reviews.llvm.org/D104490
This adds memory tag reading using the new "qMemTags"
packet and ptrace on AArch64 Linux.

This new packet is following the one used by GDB.
(https://sourceware.org/gdb/current/onlinedocs/gdb/General-Query-Packets.html)

On AArch64 Linux we use ptrace's PEEKMTETAGS to read
tags and we assume that lldb has already checked that the
memory region actually has tagging enabled.

We do not assume that lldb has expanded the requested range
to granules and expand it again to be sure.
(although lldb will be sending aligned ranges because it happens
to need them client side anyway)
Also we don't assume untagged addresses. So for AArch64 we'll
remove the top byte before using them. (the top byte includes
MTE and other non address data)

To do the ptrace read NativeProcessLinux will ask the native
register context for a memory tag manager based on the
type in the packet. This also gives you the ptrace numbers you need.
(it's called a register context but it also has non register data,
so it saves adding another per platform sub class)

The only supported platform for this is AArch64 Linux and the only
supported tag type is MTE allocation tags. Anything else will
error.

Ptrace can return a partial result but for lldb-server we will
be treating that as an error. To succeed we need to get all the tags
we expect.

(Note that the protocol leaves room for logical tags to be
read via qMemTags but this is not going to be implemented for lldb
at this time.)

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D95601
This commit moves the type translator from LLVM to MLIR to a public header for use by external projects or other code.

Unlike a previous attempt (https://reviews.llvm.org/D104726), this patch moves the type conversion into separate files which remedies the linker error which was only caught by CI.

Differential Revision: https://reviews.llvm.org/D104834
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".

This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.

Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D104080
This adds GDB client support for the qMemTags packet
which reads memory tags. Following the design
which was recently committed to GDB.

https://sourceware.org/gdb/current/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets
(look for qMemTags)

lldb commands will use the new Process methods
GetMemoryTagManager and ReadMemoryTags.

The former takes a range and checks that:
* The current process architecture has an architecture plugin
* That plugin provides a MemoryTagManager
* That the range of memory requested lies in a tagged range
  (it will expand it to granules for you)

If all that was true you get a MemoryTagManager you
can give to ReadMemoryTags.

This two step process is done to allow commands to get the
tag manager without having to read tags as well. For example
you might just want to remove a logical tag, or error early
if a range with tagged addresses is inverted.

Note that getting a MemoryTagManager doesn't mean that the process
or a specific memory range is tagged. Those are seperate checks.
Having a tag manager just means this architecture *could* have
a tagging feature enabled.

An architecture plugin has been added for AArch64 which
will return a MemoryTagManagerAArch64MTE, which was added in a
previous patch.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D95602
This new command looks much like "memory read"
and mirrors its basic behaviour.

(lldb) memory tag read new_buf_ptr new_buf_ptr+32
Logical tag: 0x9
Allocation tags:
[0x900fffff7ffa000, 0x900fffff7ffa010): 0x9
[0x900fffff7ffa010, 0x900fffff7ffa020): 0x0

Important proprties:
* The end address is optional and defaults to reading
  1 tag if ommitted
* It is an error to try to read tags if the architecture
  or process doesn't support it, or if the range asked
  for is not tagged.
* It is an error to read an inverted range (end < begin)
  (logical tags are removed for this check so you can
  pass tagged addresses here)
* The range will be expanded to fit the tagging granule,
  so you can get more tags than simply (end-begin)/granule size.
  Whatever you get back will always cover the original range.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D97285
Updates Bazel build files to match
llvm/llvm-project@929189a499

Differential Revision: https://reviews.llvm.org/D104864
- Currently, the emitting of labels in the parsePrimaryExpr function is case independent. It just takes the identifier and emits it.
- However, for HLASM the emitting of labels is case independent. We are emitting them in the upper case only, to enforce case independency. So we need to ensure that at the time of parsing the label we are emitting the upper case (in `parseAsHLASMLabel`), but also, when we are processing a PC-relative relocatable expression, we need to ensure we emit it in upper case (in `parsePrimaryExpr`)
- To achieve this a new MCAsmInfo attribute has been introduced which corresponding targets can override if needed.

Reviewed By: abhina.sreeskantharajan, uweigand

Differential Revision: https://reviews.llvm.org/D104715
This is a follow up to D102732 which also expands the logic to Darwin.

Differential Revision: https://reviews.llvm.org/D104764
…:ShrinkDemandedConstant.

We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.

The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D104832
vmaksimo and others added 14 commits June 28, 2021 09:00
There were 3 assumptions made during translation of llvm.loop metadata
into LoopMerge instruction:
1. A latch block for 'for' and 'while' loop ends with a unconditional
   branch instruction;
2. A latch block for 'do-while' loop ends with a conditional branch
   instruction;
3. For a 'conditional' latch basic block it assumes, that the first
   successor is the loop header and the second one is the exit block.

All three of them can be violated in a case of an optimized IR is passed
to the translator. In this case LoopMerge:
a. can be placed in a wrong basic block;
b. have continue target and merge block parameters sharing the same id.

This patch makes the translator to assume less and do more checks via
LLVM LoopInfo infrustructure.

Signed-off-by: Dmitry Sidorov <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@7670633
Signed-off-by: Dmitry Sidorov <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@2b8b4a8
There are 3 metadata, which map on LoopControlLoopCountINTELMask. Mask
has 3 parameters. Default values are -1.

Signed-off-by: Leonid Pauzin <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@b4d6f0b
The SYCL/OpenCL specifications require that 3-element vectors
are sized equally to the 4-element ones (see 4.10.2.1 and 4.10.2.6,
khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf). Meanwhile, the
logic for translating SPIR-V debug info into LLVM IR obeys a
trivial `vec_size = num_elems * elem_size` formula, which fails
for the 3-element edge case.

Example LLVM IR pre-translation:
```
!0 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, size: 128, flags: DIFlagVector, elements: !1)
!1 = !{!2}
!2 = !DISubrange(count: 3)
!3 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
```
Faulty LLVM IR after bi-directional translation (note the size in `!0`):
```
!0 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, size: 96, flags: DIFlagVector, elements: !1)
!1 = !{!2}
!2 = !DISubrange(count: 3)
!3 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
```
Until SPIR-V DI instructions are re-designed to store the array
size information, handle the 3-element case explicitly to favor
OpenCL/SYCL requirements.

Signed-off-by: Artem Gindinson <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@ddb5c96
It should be a valid case if alias.scope and noalias mask/decoration
are applied to the same instruction.

Signed-off-by: Dmitry Sidorov <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@d5573a0
This refactoring phase comes down to moving the translation algorithm
out of the already-cluttered `SPIRVToLLVM::setLLVMLoopMetadata()` body.

For now, a static member-only class is employed: it provides encapsulation
for helper functions while avoiding the unnecessary complexity that "true"
class entities would bring. A simple `LoopsEmitted` set is used to guard
against duplication of metadata for a particular loop by the callers.

Signed-off-by: Artem Gindinson <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@348d3cf
Since there's a dedicated `// Enums` section in the `OCLUtil.h` header,
move the enum definition there.

Signed-off-by: Artem Gindinson <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@f8a2e49
Some LLVM optimizations started generation of ascast -> gep -> load
sequence, this patch adds support for it.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@fc71302
@vmaksimo
Copy link
Contributor Author

/summary:run

svenvh and others added 4 commits June 28, 2021 18:29
Bring spirv.hpp in sync with f95c3b3 ("Merge pull request intel#219 from
cmarcelo/SPV_EXT_shader_atomic_float16_add", 2021-06-23) from
github.com/KhronosGroup/SPIRV-Headers .

Notably, this brings the SPV_KHR_integer_dot_product extension enum
values and CPP_for_OpenCL source language enum value, together with
other additions.  There are some reorderings too.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@6847551
Regenerate SPIRVIsValidEnum.h and SPIRVNameMapEnum.h using
tools/spirv-tool/gen_spirv.bash after syncing spirv.hpp and manually
fix the following:

 - internal:: values are not handled, so they had to be added manually
   again.  Move all internal values to the end of the generated
   enum/function, so that they are together.

 - NameMap entries for IOPipesINTEL and FuncParamIOKindINTEL do not
   follow the convention of the other values, so they had to be
   changed back to their original values.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@c4c9b0c
…insic

The correct translation of this case will be done later, once the spec
is updated.

Signed-off-by: Dmitry Sidorov <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@a84f589
@vmaksimo
Copy link
Contributor Author

/summary:run

@vmaksimo vmaksimo marked this pull request as ready for review June 29, 2021 08:08
@vladimirlaz vladimirlaz merged commit e946a0f into intel:sycl Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.