Skip to content
This repository was archived by the owner on Apr 23, 2020. It is now read-only.

Commit 8b25542

Browse files
committed
[docs][PerformanceTips] Add text on allocas and alignment
This summarizes two recent llvm-dev discussions. Most of the text provided by David Chisnall and Benoit Belley with minor editting by me. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247301 91177308-0d34-0410-b5e6-96231b3b80d8
1 parent a8d8dba commit 8b25542

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

docs/Frontend/PerformanceTips.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,22 @@ The Basics
4646
perform badly with confronted with such structures. The only exception to
4747
this guidance is that a unified return block with high in-degree is fine.
4848

49+
Use of allocas
50+
^^^^^^^^^^^^^^
51+
52+
An alloca instruction can be used to represent a function scoped stack slot,
53+
but can also represent dynamic frame expansion. When representing function
54+
scoped variables or locations, placing alloca instructions at the beginning of
55+
the entry block should be preferred. In particular, place them before any
56+
call instructions. Call instructions might get inlined and replaced with
57+
multiple basic blocks. The end result is that a following alloca instruction
58+
would no longer be in the entry basic block afterward.
59+
60+
The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt
61+
to eliminate alloca instructions that are in the entry basic block. Given
62+
SSA is the canonical form expected by much of the optimizer; if allocas can
63+
not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less
64+
effective than it could be.
4965

5066
Avoid loads and stores of large aggregate type
5167
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -79,6 +95,31 @@ operations for safety. If your source language provides information about
7995
the range of the index, you may wish to manually extend indices to machine
8096
register width using a zext instruction.
8197

98+
When to specify alignment
99+
^^^^^^^^^^^^^^^^^^^^^^^^^^
100+
LLVM will always generate correct code if you don’t specify alignment, but may
101+
generate inefficient code. For example, if you are targeting MIPS (or older
102+
ARM ISAs) then the hardware does not handle unaligned loads and stores, and
103+
so you will enter a trap-and-emulate path if you do a load or store with
104+
lower-than-natural alignment. To avoid this, LLVM will emit a slower
105+
sequence of loads, shifts and masks (or load-right + load-left on MIPS) for
106+
all cases where the load / store does not have a sufficiently high alignment
107+
in the IR.
108+
109+
The alignment is used to guarantee the alignment on allocas and globals,
110+
though in most cases this is unnecessary (most targets have a sufficiently
111+
high default alignment that they’ll be fine). It is also used to provide a
112+
contract to the back end saying ‘either this load/store has this alignment, or
113+
it is undefined behavior’. This means that the back end is free to emit
114+
instructions that rely on that alignment (and mid-level optimizers are free to
115+
perform transforms that require that alignment). For x86, it doesn’t make
116+
much difference, as almost all instructions are alignment-independent. For
117+
MIPS, it can make a big difference.
118+
119+
Note that if your loads and stores are atomic, the backend will be unable to
120+
lower an under aligned access into a sequence of natively aligned accesses.
121+
As a result, alignment is mandatory for atomic loads and stores.
122+
82123
Other Things to Consider
83124
^^^^^^^^^^^^^^^^^^^^^^^^
84125

0 commit comments

Comments
 (0)