Skip to content

Commit 46e7823

Browse files
authored
[lldb][debugserver] Read/write SME registers on arm64 (#119171)
**Note:** The register reading and writing depends on new register flavor support in thread_get_state/thread_set_state in the kernel, which will be first available in macOS 15.4. The Apple M4 line of cores includes the Scalable Matrix Extension (SME) feature. The M4s do not implement Scalable Vector Extension (SVE), although the processor is in Streaming SVE Mode when the SME is being used. The most obvious side effects of being in SSVE Mode are that (on the M4 cores) NEON instructions cannot be used, and watchpoints may get false positives, the address comparisons are done at a lowered granularity. When SSVE mode is enabled, the kernel will provide the Streaming Vector Length register, which is a maximum of 64 bytes with the M4. Also provided are SVCR (with bits indicating if SSVE mode and SME mode are enabled), TPIDR2, SVL. Then the SVE registers Z0..31 (SVL bytes long), P0..15 (SVL/8 bytes), the ZA matrix register (SVL*SVL bytes), and the M4 supports SME2, so the ZT0 register (64 bytes). When SSVE/SME are disabled, none of these registers are provided by the kernel - reads and writes of them will fail. Unlike Linux, lldb cannot modify the SVL through a thread_set_state call, or change the processor state's SSVE/SME status. There is also no way for a process to request a lowered SVL size today, so the work that David did to handle VL/SVL changing while stepping through a process is not an issue on Darwin today. But debugserver should be providing everything necessary so we can reuse all of David's work on resizing the register contexts in lldb if it happens in the future. debugbserver sends svl, svcr, and tpidr2 in the expedited registers when a thread stops, if SSVE|SME mode are enabled (if the kernel allows it to read the ARM_SME_STATE register set). While the maximum SVL is 64 bytes on M4, the AArch64 maximum possible SVL is 256; this would give us a 64k ZA register. If debugserver sized all of its register contexts assuming the largest possible SVL, we could easily use 2MB more memory for the register contexts of all threads in a process -- and on iOS et al, processes must run within a small memory allotment and this would push us over that. Much of the work in debugserver was changing the arm64 register context from being a static compile-time array of register sets, to being initialized at runtime if debugserver is running on a machine with SME. The ZA is only created to the machine's actual maximum SVL. The size of the 32 SVE Z registers is less significant so I am statically allocating those to the architecturally largest possible SVL value today. Also, debugserver includes information about registers that share the same part of the register file. e.g. S0 and D0 are the lower parts of the NEON 128-bit V0 register. And when running on an SME machine, v0 is the lower 128 bits of the SVE Z0 register. So the register maps used when defining the VFP registers must differ depending on the capabilities of the cpu at runtime. I also changed register reading in debugserver, where formerly when debugserver was asked to read a register, and the thread_get_state read of that register failed, it would return all zero's. This is necessary when constructing a `g` packet that gets all registers - because there is no separation between register bytes, the offsets are fixed. But when we are asking for a single register (e.g. Z0) when not in SSVE/SME mode, this should return an error. This does mean that when you're running on an SME capabable machine, but not in SME mode, and do `register read -a`, lldb will report that 48 SVE registers were unavailable and 5 SME registers were unavailable. But that's only when `-a` is used. The register reading and writing depends on new register flavor support in thread_get_state/thread_set_state in the kernel, which is not yet in a release. The test case I wrote is skipped on current OSes. I pilfered the SME register setup from some of David's existing SME test files; there were a few Linux specific details in those tests that they weren't easy to reuse on Darwin. rdar://121608074
1 parent 254ba78 commit 46e7823

File tree

12 files changed

+1401
-271
lines changed

12 files changed

+1401
-271
lines changed

lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,25 @@ bool ArchitectureAArch64::ReconfigureRegisterInfo(DynamicRegisterInfo &reg_info,
100100
if (reg_value != fail_value && reg_value <= 32)
101101
svg_reg_value = reg_value;
102102
}
103+
if (!svg_reg_value) {
104+
const RegisterInfo *darwin_svg_reg_info = reg_info.GetRegisterInfo("svl");
105+
if (darwin_svg_reg_info) {
106+
uint32_t svg_reg_num = darwin_svg_reg_info->kinds[eRegisterKindLLDB];
107+
uint64_t reg_value =
108+
reg_context.ReadRegisterAsUnsigned(svg_reg_num, fail_value);
109+
// UpdateARM64SVERegistersInfos and UpdateARM64SMERegistersInfos
110+
// expect the number of 8-byte granules; darwin provides number of
111+
// bytes.
112+
if (reg_value != fail_value && reg_value <= 256) {
113+
svg_reg_value = reg_value / 8;
114+
// Apple hardware only implements Streaming SVE mode, so
115+
// the non-streaming Vector Length is not reported by the
116+
// kernel. Set both svg and vg to this svl value.
117+
if (!vg_reg_value)
118+
vg_reg_value = reg_value / 8;
119+
}
120+
}
121+
}
103122

104123
if (!vg_reg_value && !svg_reg_value)
105124
return false;

lldb/test/API/commands/register/register/register_command/TestRegisters.py

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,24 @@ def tearDown(self):
2121
self.dbg.GetSelectedTarget().GetProcess().Destroy()
2222
TestBase.tearDown(self)
2323

24+
# on macOS, detect if the current machine is arm64 and supports SME
25+
def get_sme_available(self):
26+
if self.getArchitecture() != "arm64":
27+
return None
28+
try:
29+
sysctl_output = subprocess.check_output(
30+
["sysctl", "hw.optional.arm.FEAT_SME"]
31+
).decode("utf-8")
32+
except subprocess.CalledProcessError:
33+
return None
34+
m = re.match(r"hw\.optional\.arm\.FEAT_SME: (\w+)", sysctl_output)
35+
if m:
36+
if int(m.group(1)) == 1:
37+
return True
38+
else:
39+
return False
40+
return None
41+
2442
@skipIfiOSSimulator
2543
@skipIf(archs=no_match(["amd64", "arm", "i386", "x86_64"]))
2644
@expectedFailureAll(oslist=["freebsd", "netbsd"], bugnumber="llvm.org/pr48371")
@@ -32,11 +50,19 @@ def test_register_commands(self):
3250
# verify that logging does not assert
3351
self.log_enable("registers")
3452

53+
error_str_matched = False
54+
if self.get_sme_available() and self.platformIsDarwin():
55+
# On Darwin AArch64 SME machines, we will have unavailable
56+
# registers when not in Streaming SVE Mode/SME, so
57+
# `register read -a` will report that some registers
58+
# could not be read. This is expected.
59+
error_str_matched = True
60+
3561
self.expect(
3662
"register read -a",
3763
MISSING_EXPECTED_REGISTERS,
3864
substrs=["registers were unavailable"],
39-
matching=False,
65+
matching=error_str_matched,
4066
)
4167

4268
all_registers = self.res.GetOutput()
@@ -60,7 +86,7 @@ def test_register_commands(self):
6086
self.runCmd("register read q15") # may be available
6187

6288
self.expect(
63-
"register read -s 4", substrs=["invalid register set index: 4"], error=True
89+
"register read -s 8", substrs=["invalid register set index: 8"], error=True
6490
)
6591

6692
@skipIfiOSSimulator
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
C_SOURCES := main.c
2+
3+
CFLAGS_EXTRAS := -mcpu=apple-m4
4+
5+
include Makefile.rules
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
import lldb
2+
from lldbsuite.test.lldbtest import *
3+
from lldbsuite.test.decorators import *
4+
import lldbsuite.test.lldbutil as lldbutil
5+
import os
6+
7+
8+
class TestSMERegistersDarwin(TestBase):
9+
NO_DEBUG_INFO_TESTCASE = True
10+
mydir = TestBase.compute_mydir(__file__)
11+
12+
@skipIfRemote
13+
@skipUnlessDarwin
14+
@skipUnlessFeature("hw.optional.arm.FEAT_SME")
15+
@skipUnlessFeature("hw.optional.arm.FEAT_SME2")
16+
# thread_set_state/thread_get_state only avail in macOS 15.4+
17+
@skipIf(macos_version=["<", "15.4"])
18+
def test(self):
19+
"""Test that we can read the contents of the SME/SVE registers on Darwin"""
20+
self.build()
21+
(target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
22+
self, "break before sme", lldb.SBFileSpec("main.c")
23+
)
24+
frame = thread.GetFrameAtIndex(0)
25+
self.assertTrue(frame.IsValid())
26+
27+
self.assertTrue(
28+
target.BreakpointCreateBySourceRegex(
29+
"break while sme", lldb.SBFileSpec("main.c")
30+
).IsValid()
31+
)
32+
self.assertTrue(
33+
target.BreakpointCreateBySourceRegex(
34+
"break after sme", lldb.SBFileSpec("main.c")
35+
).IsValid()
36+
)
37+
38+
if self.TraceOn():
39+
self.runCmd("reg read -a")
40+
41+
self.assertTrue(frame.register["svl"].GetError().Fail())
42+
self.assertTrue(frame.register["z0"].GetError().Fail())
43+
self.assertTrue(frame.register["p0"].GetError().Fail())
44+
self.assertTrue(frame.register["za"].GetError().Fail())
45+
self.assertTrue(frame.register["zt0"].GetError().Fail())
46+
47+
process.Continue()
48+
frame = thread.GetFrameAtIndex(0)
49+
self.assertEqual(thread.GetStopReason(), lldb.eStopReasonBreakpoint)
50+
51+
# Now in SME enabled mode
52+
self.assertTrue(frame.register["svl"].GetError().Success())
53+
self.assertTrue(frame.register["z0"].GetError().Success())
54+
self.assertTrue(frame.register["p0"].GetError().Success())
55+
self.assertTrue(frame.register["za"].GetError().Success())
56+
self.assertTrue(frame.register["zt0"].GetError().Success())
57+
58+
# SSVE and SME modes should be enabled (reflecting PSTATE.SM and PSTATE.ZA)
59+
svcr = frame.register["svcr"]
60+
self.assertEqual(svcr.GetValueAsUnsigned(), 3)
61+
62+
svl_reg = frame.register["svl"]
63+
svl = svl_reg.GetValueAsUnsigned()
64+
65+
z0 = frame.register["z0"]
66+
self.assertEqual(z0.GetNumChildren(), svl)
67+
self.assertEqual(z0.GetChildAtIndex(0).GetValueAsUnsigned(), 0x1)
68+
self.assertEqual(z0.GetChildAtIndex(svl - 1).GetValueAsUnsigned(), 0x1)
69+
70+
z31 = frame.register["z31"]
71+
self.assertEqual(z31.GetNumChildren(), svl)
72+
self.assertEqual(z31.GetChildAtIndex(0).GetValueAsUnsigned(), 32)
73+
self.assertEqual(z31.GetChildAtIndex(svl - 1).GetValueAsUnsigned(), 32)
74+
75+
p0 = frame.register["p0"]
76+
self.assertEqual(p0.GetNumChildren(), svl / 8)
77+
self.assertEqual(p0.GetChildAtIndex(0).GetValueAsUnsigned(), 0xFF)
78+
self.assertEqual(
79+
p0.GetChildAtIndex(p0.GetNumChildren() - 1).GetValueAsUnsigned(), 0xFF
80+
)
81+
82+
p15 = frame.register["p15"]
83+
self.assertEqual(p15.GetNumChildren(), svl / 8)
84+
self.assertEqual(p15.GetChildAtIndex(0).GetValueAsUnsigned(), 0xFF)
85+
self.assertEqual(
86+
p15.GetChildAtIndex(p15.GetNumChildren() - 1).GetValueAsUnsigned(), 0xFF
87+
)
88+
89+
za = frame.register["za"]
90+
self.assertEqual(za.GetNumChildren(), (svl * svl))
91+
za_0 = za.GetChildAtIndex(0)
92+
self.assertEqual(za_0.GetValueAsUnsigned(), 4)
93+
za_final = za.GetChildAtIndex(za.GetNumChildren() - 1)
94+
self.assertEqual(za_final.GetValueAsUnsigned(), 67)
95+
96+
zt0 = frame.register["zt0"]
97+
self.assertEqual(zt0.GetNumChildren(), 64)
98+
zt0_0 = zt0.GetChildAtIndex(0)
99+
self.assertEqual(zt0_0.GetValueAsUnsigned(), 0)
100+
zt0_final = zt0.GetChildAtIndex(63)
101+
self.assertEqual(zt0_final.GetValueAsUnsigned(), 63)
102+
103+
# Modify all of the registers, instruction step, confirm that the
104+
# registers have the new values. Without the instruction step, it's
105+
# possible debugserver or lldb could lie about the write succeeding.
106+
107+
z0_old_values = []
108+
z0_new_values = []
109+
z0_new_str = '"{'
110+
for i in range(svl):
111+
z0_old_values.append(z0.GetChildAtIndex(i).GetValueAsUnsigned())
112+
z0_new_values.append(z0_old_values[i] + 5)
113+
z0_new_str = z0_new_str + ("0x%02x " % z0_new_values[i])
114+
z0_new_str = z0_new_str + '}"'
115+
self.runCmd("reg write z0 %s" % z0_new_str)
116+
117+
z31_old_values = []
118+
z31_new_values = []
119+
z31_new_str = '"{'
120+
for i in range(svl):
121+
z31_old_values.append(z31.GetChildAtIndex(i).GetValueAsUnsigned())
122+
z31_new_values.append(z31_old_values[i] + 3)
123+
z31_new_str = z31_new_str + ("0x%02x " % z31_new_values[i])
124+
z31_new_str = z31_new_str + '}"'
125+
self.runCmd("reg write z31 %s" % z31_new_str)
126+
127+
p0_old_values = []
128+
p0_new_values = []
129+
p0_new_str = '"{'
130+
for i in range(int(svl / 8)):
131+
p0_old_values.append(p0.GetChildAtIndex(i).GetValueAsUnsigned())
132+
p0_new_values.append(p0_old_values[i] - 5)
133+
p0_new_str = p0_new_str + ("0x%02x " % p0_new_values[i])
134+
p0_new_str = p0_new_str + '}"'
135+
self.runCmd("reg write p0 %s" % p0_new_str)
136+
137+
p15_old_values = []
138+
p15_new_values = []
139+
p15_new_str = '"{'
140+
for i in range(int(svl / 8)):
141+
p15_old_values.append(p15.GetChildAtIndex(i).GetValueAsUnsigned())
142+
p15_new_values.append(p15_old_values[i] - 8)
143+
p15_new_str = p15_new_str + ("0x%02x " % p15_new_values[i])
144+
p15_new_str = p15_new_str + '}"'
145+
self.runCmd("reg write p15 %s" % p15_new_str)
146+
147+
za_old_values = []
148+
za_new_values = []
149+
za_new_str = '"{'
150+
for i in range(svl * svl):
151+
za_old_values.append(za.GetChildAtIndex(i).GetValueAsUnsigned())
152+
za_new_values.append(za_old_values[i] + 7)
153+
za_new_str = za_new_str + ("0x%02x " % za_new_values[i])
154+
za_new_str = za_new_str + '}"'
155+
self.runCmd("reg write za %s" % za_new_str)
156+
157+
zt0_old_values = []
158+
zt0_new_values = []
159+
zt0_new_str = '"{'
160+
for i in range(64):
161+
zt0_old_values.append(zt0.GetChildAtIndex(i).GetValueAsUnsigned())
162+
zt0_new_values.append(zt0_old_values[i] + 2)
163+
zt0_new_str = zt0_new_str + ("0x%02x " % zt0_new_values[i])
164+
zt0_new_str = zt0_new_str + '}"'
165+
self.runCmd("reg write zt0 %s" % zt0_new_str)
166+
167+
thread.StepInstruction(False)
168+
frame = thread.GetFrameAtIndex(0)
169+
170+
if self.TraceOn():
171+
self.runCmd("reg read -a")
172+
173+
z0 = frame.register["z0"]
174+
for i in range(z0.GetNumChildren()):
175+
self.assertEqual(
176+
z0_new_values[i], z0.GetChildAtIndex(i).GetValueAsUnsigned()
177+
)
178+
179+
z31 = frame.register["z31"]
180+
for i in range(z31.GetNumChildren()):
181+
self.assertEqual(
182+
z31_new_values[i], z31.GetChildAtIndex(i).GetValueAsUnsigned()
183+
)
184+
185+
p0 = frame.register["p0"]
186+
for i in range(p0.GetNumChildren()):
187+
self.assertEqual(
188+
p0_new_values[i], p0.GetChildAtIndex(i).GetValueAsUnsigned()
189+
)
190+
191+
p15 = frame.register["p15"]
192+
for i in range(p15.GetNumChildren()):
193+
self.assertEqual(
194+
p15_new_values[i], p15.GetChildAtIndex(i).GetValueAsUnsigned()
195+
)
196+
197+
za = frame.register["za"]
198+
for i in range(za.GetNumChildren()):
199+
self.assertEqual(
200+
za_new_values[i], za.GetChildAtIndex(i).GetValueAsUnsigned()
201+
)
202+
203+
zt0 = frame.register["zt0"]
204+
for i in range(zt0.GetNumChildren()):
205+
self.assertEqual(
206+
zt0_new_values[i], zt0.GetChildAtIndex(i).GetValueAsUnsigned()
207+
)
208+
209+
process.Continue()
210+
frame = thread.GetFrameAtIndex(0)
211+
self.assertEqual(thread.GetStopReason(), lldb.eStopReasonBreakpoint)
212+
213+
self.assertTrue(frame.register["svl"].GetError().Fail())
214+
self.assertTrue(frame.register["z0"].GetError().Fail())
215+
self.assertTrue(frame.register["p0"].GetError().Fail())
216+
self.assertTrue(frame.register["za"].GetError().Fail())
217+
self.assertTrue(frame.register["zt0"].GetError().Fail())

0 commit comments

Comments
 (0)