Skip to content

Commit b723d5a

Browse files
committed
[llvm-exegesis][x86] Add option to prevent use of xmm8-xmm15 upper SSE registers
Noticed while trying to use llvm-exegesis to get some accurate capture numbers on some old Atom/Silverment hardware as part of the work with D103695. These targets' frontends are particularly poor and the use of the xmm8-xmm15 SSE registers results in longer instruction encodings which were affecting the latency/throughput estimates. Thanks to @lebedev.ri for the --skip-measurements command line argument which made testing much easier! Differential Revision: https://reviews.llvm.org/D138832
1 parent 80e8f2b commit b723d5a

File tree

3 files changed

+36
-0
lines changed

3 files changed

+36
-0
lines changed

llvm/docs/CommandGuide/llvm-exegesis.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,15 @@ OPTIONS
213213
could occur if the sampling is too frequent. A prime number should be used to
214214
avoid consistently skipping certain blocks.
215215

216+
.. option:: -x86-disable-upper-sse-registers
217+
218+
Using the upper xmm registers (xmm8-xmm15) forces a longer instruction encoding
219+
which may put greater pressure on the frontend fetch and decode stages,
220+
potentially reducing the rate that instructions are dispatched to the backend,
221+
particularly on older hardware. Comparing baseline results with this mode
222+
enabled can help determine the effects of the frontend and can be used to
223+
improve latency and throughput estimates.
224+
216225
.. option:: -repetition-mode=[duplicate|loop|min]
217226

218227
Specify the repetition mode. `duplicate` will create a large, straight line
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# RUN: llvm-exegesis -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mode=inverse_throughput --skip-measurements -x86-disable-upper-sse-registers -opcode-name=ADDPSrr -repetition-mode=loop | FileCheck %s
2+
3+
CHECK: ---
4+
CHECK-NEXT: mode: inverse_throughput
5+
CHECK-NEXT: key:
6+
CHECK-NEXT: instructions:
7+
CHECK-NEXT: - 'ADDPSrr [[LHS0:XMM[0-7]]] [[LHS0]] [[RHS0:XMM[0-7]]]'
8+
CHECK-NEXT: - 'ADDPSrr [[LHS1:XMM[0-7]]] [[LHS1]] [[RHS1:XMM[0-7]]]'
9+
CHECK-NEXT: - 'ADDPSrr [[LHS2:XMM[0-7]]] [[LHS2]] [[RHS2:XMM[0-7]]]'
10+
CHECK-NEXT: - 'ADDPSrr [[LHS3:XMM[0-7]]] [[LHS3]] [[RHS3:XMM[0-7]]]'

llvm/tools/llvm-exegesis/lib/X86/Target.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,11 @@ static cl::opt<unsigned> LbrSamplingPeriod(
5454
cl::desc("The sample period (nbranches/sample), used for LBR sampling"),
5555
cl::cat(BenchmarkOptions), cl::init(0));
5656

57+
static cl::opt<bool>
58+
DisableUpperSSERegisters("x86-disable-upper-sse-registers",
59+
cl::desc("Disable XMM8-XMM15 register usage"),
60+
cl::cat(BenchmarkOptions), cl::init(false));
61+
5762
// FIXME: Validates that repetition-mode is loop if LBR is requested.
5863

5964
// Returns a non-null reason if we cannot handle the memory references in this
@@ -708,6 +713,11 @@ class ExegesisX86Target : public ExegesisTarget {
708713
const APInt &Value) const override;
709714

710715
ArrayRef<unsigned> getUnavailableRegisters() const override {
716+
if (DisableUpperSSERegisters)
717+
return makeArrayRef(kUnavailableRegistersSSE,
718+
sizeof(kUnavailableRegistersSSE) /
719+
sizeof(kUnavailableRegistersSSE[0]));
720+
711721
return makeArrayRef(kUnavailableRegisters,
712722
std::size(kUnavailableRegisters));
713723
}
@@ -772,13 +782,20 @@ class ExegesisX86Target : public ExegesisTarget {
772782
}
773783

774784
static const unsigned kUnavailableRegisters[4];
785+
static const unsigned kUnavailableRegistersSSE[12];
775786
};
776787

777788
// We disable a few registers that cannot be encoded on instructions with a REX
778789
// prefix.
779790
const unsigned ExegesisX86Target::kUnavailableRegisters[4] = {X86::AH, X86::BH,
780791
X86::CH, X86::DH};
781792

793+
// Optionally, also disable the upper (x86_64) SSE registers to reduce frontend
794+
// decoder load.
795+
const unsigned ExegesisX86Target::kUnavailableRegistersSSE[12] = {
796+
X86::AH, X86::BH, X86::CH, X86::DH, X86::XMM8, X86::XMM9,
797+
X86::XMM10, X86::XMM11, X86::XMM12, X86::XMM13, X86::XMM14, X86::XMM15};
798+
782799
// We're using one of R8-R15 because these registers are never hardcoded in
783800
// instructions (e.g. MOVS writes to EDI, ESI, EDX), so they have less
784801
// conflicts.

0 commit comments

Comments
 (0)