@@ -46,6 +46,22 @@ The Basics
46
46
perform badly with confronted with such structures. The only exception to
47
47
this guidance is that a unified return block with high in-degree is fine.
48
48
49
+ Use of allocas
50
+ ^^^^^^^^^^^^^^
51
+
52
+ An alloca instruction can be used to represent a function scoped stack slot,
53
+ but can also represent dynamic frame expansion. When representing function
54
+ scoped variables or locations, placing alloca instructions at the beginning of
55
+ the entry block should be preferred. In particular, place them before any
56
+ call instructions. Call instructions might get inlined and replaced with
57
+ multiple basic blocks. The end result is that a following alloca instruction
58
+ would no longer be in the entry basic block afterward.
59
+
60
+ The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt
61
+ to eliminate alloca instructions that are in the entry basic block. Given
62
+ SSA is the canonical form expected by much of the optimizer; if allocas can
63
+ not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less
64
+ effective than it could be.
49
65
50
66
Avoid loads and stores of large aggregate type
51
67
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -79,6 +95,31 @@ operations for safety. If your source language provides information about
79
95
the range of the index, you may wish to manually extend indices to machine
80
96
register width using a zext instruction.
81
97
98
+ When to specify alignment
99
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
100
+ LLVM will always generate correct code if you don’t specify alignment, but may
101
+ generate inefficient code. For example, if you are targeting MIPS (or older
102
+ ARM ISAs) then the hardware does not handle unaligned loads and stores, and
103
+ so you will enter a trap-and-emulate path if you do a load or store with
104
+ lower-than-natural alignment. To avoid this, LLVM will emit a slower
105
+ sequence of loads, shifts and masks (or load-right + load-left on MIPS) for
106
+ all cases where the load / store does not have a sufficiently high alignment
107
+ in the IR.
108
+
109
+ The alignment is used to guarantee the alignment on allocas and globals,
110
+ though in most cases this is unnecessary (most targets have a sufficiently
111
+ high default alignment that they’ll be fine). It is also used to provide a
112
+ contract to the back end saying ‘either this load/store has this alignment, or
113
+ it is undefined behavior’. This means that the back end is free to emit
114
+ instructions that rely on that alignment (and mid-level optimizers are free to
115
+ perform transforms that require that alignment). For x86, it doesn’t make
116
+ much difference, as almost all instructions are alignment-independent. For
117
+ MIPS, it can make a big difference.
118
+
119
+ Note that if your loads and stores are atomic, the backend will be unable to
120
+ lower an under aligned access into a sequence of natively aligned accesses.
121
+ As a result, alignment is mandatory for atomic loads and stores.
122
+
82
123
Other Things to Consider
83
124
^^^^^^^^^^^^^^^^^^^^^^^^
84
125
0 commit comments