Commit 92a0654

authored

[LowerMemIntrinsics] Lower llvm.memmove to wide memory accesses (#100122)

So far, the IR-level lowering of llvm.memmove intrinsics generates loops that copy each byte individually. This can be wasteful for targets that provide wider memory access operations. This patch makes the memmove lowering more similar to the lowering of memcpy with unknown length. TargetTransformInfo::getMemcpyLoopLoweringType() is queried for an adequate type for the memory accesses, and if it is wider than a single byte, the greatest multiple of the type's size that is less than or equal to the length is copied with corresponding wide memory accesses. A residual loop with byte-wise accesses (or a sequence of suitable memory accesses in case the length is statically known) is introduced for the remaining bytes. For memmove, this construct is required in two variants: one for copying forward and one for copying backwards, to handle overlapping memory ranges. For the backwards case, the residual code still covers the bytes at the end of the copied region and is therefore executed before the wide main loop. This implementation choice is based on the assumption that we are more likely to encounter memory ranges whose start aligns with the access width than ones whose end does. In microbenchmarks on gfx1030 (AMDGPU), this change yields speedups up to 16x for memmoves with variable or large constant lengths. Part of SWDEV-455845.

1 parent 0057a96 commit 92a0654Copy full SHA for 92a0654

4 files changed

+1121

-286

lines changed

llvm
- lib/Transforms/Utils
  - LowerMemIntrinsics.cpp
- test/CodeGen
  - AMDGPU
    - GlobalISel
      - llvm.memmove.ll
    - lower-mem-intrinsics.ll
  - NVPTX
    - lower-aggr-copies.ll

4 files changed

+1121

-286

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 92a0654

4 files changed

4 files changed

File tree

4 files changed

4 files changed

0 commit comments