Skip to content

Commit 2bc3907

Browse files
Merge pull request #5850 from TylerMSFT/arm
goes with PR #5090 at https://github.com/MicrosoftDocs/cpp-docs/pull/…
2 parents 8afbc60 + 7c4babe commit 2bc3907

File tree

3 files changed

+40
-22
lines changed

3 files changed

+40
-22
lines changed

docs/build/arm64-windows-abi-conventions.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
description: "Learn more about: Overview of ARM64 ABI conventions"
33
title: "Overview of ARM64 ABI conventions"
4-
ms.date: "03/27/2019"
4+
ms.date: 04/08/2025
55
---
66
# Overview of ARM64 ABI conventions
77

8-
The basic application binary interface (ABI) for Windows when compiled and run on ARM processors in 64-bit mode (ARMv8 or later architectures), for the most part, follows ARM's standard AArch64 EABI. This article highlights some of the key assumptions and changes from what is documented in the EABI. For information about the 32-bit ABI, see [Overview of ARM ABI conventions](overview-of-arm-abi-conventions.md). For more information about the standard ARM EABI, see [Application Binary Interface (ABI) for the ARM Architecture](https://github.com/ARM-software/abi-aa) (external link).
8+
The basic application binary interface (ABI) for Windows when compiled and run on ARM processors in 64-bit mode (ARMv8 or later architectures), usually follows ARM's standard AArch64 EABI. This article highlights some of the key assumptions and changes from what is documented in the EABI. For information about the 32-bit ABI, see [Overview of ARM ABI conventions](overview-of-arm-abi-conventions.md). For more information about the standard ARM EABI, see [Application Binary Interface (ABI) for the ARM Architecture](https://github.com/ARM-software/abi-aa) (external link).
99

1010
## Definitions
1111

12-
With the introduction of 64-bit support, ARM has defined several terms:
12+
With the introduction of 64-bit support, ARM defined several terms:
1313

1414
- **AArch32** – the legacy 32-bit instruction set architecture (ISA) defined by ARM, including Thumb mode execution.
1515
- **AArch64** – the new 64-bit instruction set architecture (ISA) defined by ARM.
@@ -19,8 +19,9 @@ With the introduction of 64-bit support, ARM has defined several terms:
1919
Windows also uses these terms:
2020

2121
- **ARM** – refers to the 32-bit ARM architecture (AArch32), sometimes referred to as WoA (Windows on ARM).
22-
- **ARM32** – same as ARM, above; used in this document for clarity.
22+
- **ARM32** – same as **ARM**. Used in this document for clarity.
2323
- **ARM64** – refers to the 64-bit ARM architecture (AArch64). There's no such thing as WoA64.
24+
- **ARM64EC** - code built as ARM64EC can interoperate with x64 code running under emulation in the same process. The Arm64EC code in the process runs with native performance, while the x64 code runs using emulation.
2425

2526
Finally, when referring to data types, the following definitions from ARM are referenced:
2627

@@ -30,7 +31,7 @@ Finally, when referring to data types, the following definitions from ARM are re
3031

3132
## Base requirements
3233

33-
The ARM64 version of Windows presupposes that it's running on an ARMv8 or later architecture at all times. Both floating-point and NEON support are presumed to be present in hardware.
34+
The ARM64 version of Windows always presupposes that it's running on an ARMv8 or later architecture. Both floating-point and NEON support are presumed to be present in hardware.
3435

3536
The ARMv8 specification describes new optional crypto and CRC helper opcodes for both AArch32 and AArch64. Support for them is currently optional, but recommended. To take advantage of these opcodes, apps should first make runtime checks for their existence.
3637

@@ -74,7 +75,7 @@ The AArch64 architecture supports 32 integer registers:
7475
| x18 | N/A | Reserved platform register: in kernel mode, points to KPCR for the current processor; In user mode, points to TEB |
7576
| x19-x28 | Non-volatile | Scratch registers |
7677
| x29/fp | Non-volatile | Frame pointer |
77-
| x30/lr | Both | Link Register: Callee function must preserve it for its own return, but caller's value will be lost. |
78+
| x30/lr | Both | Link Register: Callee function must preserve it for its own return, but caller's value is lost. |
7879

7980
Each register may be accessed as a full 64-bit value (via x0-x30) or as a 32-bit value (via w0-w30). 32-bit operations zero-extend their results up to 64 bits.
8081

@@ -86,7 +87,7 @@ The frame pointer (x29) is required for compatibility with fast stack walking us
8687

8788
## Floating-point/SIMD registers
8889

89-
The AArch64 architecture also supports 32 floating-point/SIMD registers, summarized below:
90+
The AArch64 architecture also supports these 32 floating-point/SIMD registers:
9091

9192
| Register | Volatility | Role |
9293
| - | - | - |
@@ -118,7 +119,11 @@ Like AArch32, the AArch64 specification provides three system-controlled "thread
118119

119120
## Floating-point exceptions
120121

121-
Support for IEEE floating-point exceptions on AArch64 systems is optional. This can be verified by writing a value that enables exceptions to the `FPCR` register and then reading it back. The bits corresponding to supported exceptions will remain set, while the bits corresponding to unsupported exceptions will be reset by the CPU.
122+
To determine if an ARM CPU supports exceptions, write a value that enables exceptions to the FPCR register and then read it back. If the CPU supports floating-point exceptions, the bits corresponding to supported exceptions remain set, while the CPU resets the bits for unsupported exceptions.
123+
124+
On ARM64, Windows delivers exceptions for processors that support hardware floating-point exceptions.
125+
126+
The [`_set_controlfp`](/cpp/c-runtime-library/reference/controlfp-s) function on ARM platforms correctly changes the FPCR register when unmasking floating-point exceptions. However, instead of raising an unmasked exception, Windows resets the FPCR register to its defaults every time an FP exception is about to be raised.
122127

123128
## Parameter passing
124129

@@ -148,7 +153,7 @@ For each argument in the list, the first matching rule from the following list i
148153

149154
### Stage C – Assignment of arguments to registers and stack
150155

151-
For each argument in the list, the following rules are applied in turn until the argument has been allocated. When an argument is assigned to a register, any unused bits in the register have unspecified value. If an argument is assigned to a stack slot, any unused padding bytes have unspecified value.
156+
For each argument in the list, the following rules are applied in turn until the argument is allocated. When an argument is assigned to a register, any unused bits in the register have unspecified value. If an argument is assigned to a stack slot, any unused padding bytes have unspecified value.
152157

153158
1. If the argument is a Half-, Single-, Double- or Quad-precision Floating-point or Short Vector Type, and the NSRN is less than 8, then the argument is allocated to the least significant bits of register v\[NSRN]. The NSRN is incremented by one. The argument has now been allocated.
154159

@@ -158,7 +163,7 @@ For each argument in the list, the following rules are applied in turn until the
158163

159164
1. If the argument is an HFA, an HVA, a Quad-precision Floating-point or Short Vector Type, then the NSAA is rounded up to the larger of 8 or the Natural Alignment of the argument's type.
160165

161-
1. If the argument is a Half- or Single-precision Floating Point type, then the size of the argument is set to 8 bytes. The effect is as if the argument had been copied to the least significant bits of a 64-bit register, and the remaining bits filled with unspecified values.
166+
1. If the argument is a Half- or Single-precision Floating Point type, then the size of the argument is set to 8 bytes. The effect is as if the argument were copied to the least significant bits of a 64-bit register, and the remaining bits filled with unspecified values.
162167

163168
1. If the argument is an HFA, an HVA, a Half-, Single-, Double-, or Quad-precision Floating-point or Short Vector Type, then the argument is copied to memory at the adjusted NSAA. The NSAA is incremented by the size of the argument. The argument has now been allocated.
164169

@@ -199,7 +204,7 @@ Floating-point values are returned in s0, d0, or v0, as appropriate.
199204
A type is considered to be an HFA or HVA if all of the following hold:
200205

201206
- It's non-empty,
202-
- It doesn't have any non-trivial default or copy constructors, destructors, or assignment operators,
207+
- It doesn't have any nontrivial default or copy constructors, destructors, or assignment operators,
203208
- All of its members have the same HFA or HVA type, or are float, double, or neon types that match the other members' HFA or HVA types.
204209

205210
HVA values with four or fewer elements are returned in s0-s3, d0-d3, or v0-v3, as appropriate.
@@ -210,22 +215,22 @@ Types returned by value are handled differently depending on whether they have c
210215
- they have a trivial copy-assignment operator, and
211216
- they have a trivial destructor,
212217

213-
and are returned by non-member functions or static member functions, use the following return style:
218+
and are returned by nonmember functions or static member functions, use the following return style:
214219

215220
- Types that are HFAs with four or fewer elements are returned in s0-s3, d0-d3, or v0-v3, as appropriate.
216221
- Types less than or equal to 8 bytes are returned in x0.
217222
- Types less than or equal to 16 bytes are returned in x0 and x1, with x0 containing the lower-order 8 bytes.
218-
- For other aggregate types, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine. The callee isn't required to preserve the value stored in x8.
223+
- For other aggregate types, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as another argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine. The callee isn't required to preserve the value stored in x8.
219224

220225
All other types use this convention:
221226

222227
- The caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x0, or x1 if $this is passed in x0. The callee may modify the result memory block at any point during the execution of the subroutine. The callee returns the address of the memory block in x0.
223228

224229
## Stack
225230

226-
Following the ABI put forth by ARM, the stack must remain 16-byte aligned at all times. AArch64 contains a hardware feature that generates stack alignment faults whenever the SP isn't 16-byte aligned and an SP-relative load or store is done. Windows runs with this feature enabled at all times.
231+
Following the ABI put forth by ARM, the stack must always remain 16-byte aligned. AArch64 contains a hardware feature that generates stack alignment faults whenever the SP isn't 16-byte aligned and an SP-relative load or store is done. Windows always runs with this feature enabled.
227232

228-
Functions that allocate 4k or more worth of stack must ensure that each page prior to the final page is touched in order. This action ensures no code can "leap over" the guard pages that Windows uses to expand the stack. Typically the touching is done by the `__chkstk` helper, which has a custom calling convention that passes the total stack allocation divided by 16 in x15.
233+
Functions that allocate 4k or more worth of stack must ensure that each page before the final page is touched in order. This action ensures no code can "leap over" the guard pages that Windows uses to expand the stack. Typically the touching is done by the `__chkstk` helper, which has a custom calling convention that passes the total stack allocation divided by 16 in x15.
229234

230235
## Red zone
231236

@@ -241,7 +246,7 @@ Code within Windows is compiled with frame pointers enabled ([/Oy-](reference/oy
241246

242247
## Exception unwinding
243248

244-
Unwinding during exception handling is assisted through the use of unwind codes. The unwind codes are a sequence of bytes stored in the .xdata section of the executable. They describe the operation of the prologue and epilogue in an abstract manner, such that the effects of a function's prologue can be undone in preparation for backing up to the caller's stack frame. For more information on the unwind codes, see [ARM64 exception handling](arm64-exception-handling.md).
249+
Unwinding during exception handling is assisted by using unwind codes. The unwind codes are a sequence of bytes stored in the .xdata section of the executable. They describe the operation of the prologue and epilogue in an abstract manner, such that the effects of a function's prologue can be undone in preparation for backing up to the caller's stack frame. For more information on the unwind codes, see [ARM64 exception handling](arm64-exception-handling.md).
245250

246251
The ARM EABI also specifies an exception unwinding model that uses unwind codes. However, the specification as presented is insufficient for unwinding in Windows, which must handle cases where the PC is in the middle of a function prologue or epilogue.
247252

@@ -251,7 +256,7 @@ Code that is dynamically generated should be described with dynamic function tab
251256

252257
All ARMv8 CPUs are required to support a cycle counter register, a 64-bit register that Windows configures to be readable at any exception level, including user mode. It can be accessed via the special PMCCNTR_EL0 register, using the MSR opcode in assembly code, or the `_ReadStatusReg` intrinsic in C/C++ code.
253258

254-
The cycle counter here is a true cycle counter, not a wall clock. The counting frequency will vary with the processor frequency. If you feel you must know the frequency of the cycle counter, you shouldn't be using the cycle counter. Instead, you want to measure wall clock time, for which you should use `QueryPerformanceCounter`.
259+
The cycle counter here is a true cycle counter, not a wall clock. The counting frequency varies with the processor frequency. If you feel you must know the frequency of the cycle counter, you shouldn't be using the cycle counter. Instead, you want to measure wall clock time, for which you should use `QueryPerformanceCounter`.
255260

256261
## See also
257262

docs/build/arm64ec-windows-abi-conventions.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,12 @@ Special helper routines like `__chkstk_arm64ec` use custom calling conventions a
9292

9393
ARM64EC follows the same struct packing rules used for x64 to ensure interoperability between ARM64EC code and x64 code. For more information and examples of x64 struct packing, see [Overview of x64 ABI conventions](x64-software-conventions.md).
9494

95+
## Floating-point exceptions
96+
97+
To determine if an ARM CPU supports exceptions, write a value that enables exceptions to the FPCR register and then read it back. If the CPU supports floating-point exceptions, the bits corresponding to supported exceptions remain set, while the CPU resets the bits for unsupported exceptions.
98+
99+
On ARM64EC, Windows catches processor floating-point exceptions and disables them in the FPCR register. This ensures consistent behavior across different processor variants.
100+
95101
## Emulation helper ABI routines
96102

97103
ARM64EC code and [thunks](#thunks) use emulation helper routines to transition between x64 and ARM64EC functions.

docs/c-runtime-library/reference/controlfp-s.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Learn more about: _controlfp_s"
33
title: "_controlfp_s"
4-
ms.date: "4/2/2020"
4+
ms.date: 03/27/2025
55
api_name: ["_controlfp_s", "_o__controlfp_s"]
66
api_location: ["msvcrt.dll", "msvcr80.dll", "msvcr90.dll", "msvcr100.dll", "msvcr100_clr0400.dll", "msvcr110.dll", "msvcr110_clr0400.dll", "msvcr120.dll", "msvcr120_clr0400.dll", "ucrtbase.dll", "api-ms-win-crt-runtime-l1-1-0.dll"]
77
api_type: ["DLLExport"]
@@ -75,16 +75,23 @@ _controlfp_s(&current_word, _DN_FLUSH, _MCW_DN);
7575
// and x64 processors with SSE2 support. Ignored on other x86 platforms.
7676
```
7777

78-
On ARM platforms, the **`_controlfp_s`** function applies to the FPSCR register. On x64 architectures, only the SSE2 control word that's stored in the MXCSR register is affected. On Intel (x86) platforms, **`_controlfp_s`** affects the control words for both the x87 and the SSE2, if present. It's possible for the two control words to be inconsistent with each other (because of a previous call to [`__control87_2`](control87-controlfp-control87-2.md), for example); if there's an inconsistency between the two control words, **`_controlfp_s`** sets the `EM_AMBIGUOUS` flag in *`currentControl`*. It's a warning that the returned control word might not represent the state of both floating-point control words accurately.
78+
This function is ignored when you use [`/clr` (Common Language Runtime Compilation)](../../build/reference/clr-common-language-runtime-compilation.md) to compile because the common language runtime (CLR) only supports the default floating-point precision.
7979

80-
On the ARM and x64 architectures, changing the infinity mode or the floating-point precision isn't supported. If the precision control mask is used on the x64 platform, the function raises an assertion and the invalid parameter handler is invoked, as described in [Parameter validation](../parameter-validation.md).
80+
On x64, only the SSE2 control word stored in the MXCSR register is affected. Changing the infinity mode or the floating-point precision isn't supported. If the precision control mask is used on the x64 platform, the function raises an assertion and the invalid parameter handler is invoked as described in [Parameter validation](../parameter-validation.md).
8181

82-
If the mask isn't set correctly, this function generates an invalid parameter exception, as described in [Parameter validation](../parameter-validation.md). If execution is allowed to continue, this function returns `EINVAL` and sets `errno` to `EINVAL`.
82+
On x86, **`_controlfp_s`** affects the control words for both the x87 and the SSE2, if present. It's possible for the two control words to be inconsistent with each other (because of a previous call to [`__control87_2`](control87-controlfp-control87-2.md), for example); if there's an inconsistency between the two control words, **`_controlfp_s`** sets the `EM_AMBIGUOUS` flag in *`currentControl`*. It's a warning that the returned control word might not represent the state of both floating-point control words accurately.
8383

84-
This function is ignored when you use [`/clr` (Common Language Runtime Compilation)](../../build/reference/clr-common-language-runtime-compilation.md) to compile because the common language runtime (CLR) only supports the default floating-point precision.
84+
If the mask isn't set correctly, this function generates an invalid parameter exception, as described in [Parameter validation](../parameter-validation.md). If execution is allowed to continue, this function returns `EINVAL` and sets `errno` to `EINVAL`.
8585

8686
By default, this function's global state is scoped to the application. To change this behavior, see [Global state in the CRT](../global-state.md).
8787

88+
### Arm platforms
89+
90+
- Changing the infinity mode or the floating-point precision isn't supported. If the precision control mask is used on the x64 platform, the function raises an assertion and the invalid parameter handler is invoked, as described in [Parameter validation](../parameter-validation.md).
91+
- On ARM32 (discontinued), Windows doesn't support FP exceptions.
92+
- On ARM64, unmasking the whole `_MCW_EM` or any bits from it (`_EM_INEXACT`, `_EM_UNDERFLOW`, `_EM_OVERFLOW`, `_EM_ZERODIVIDE`, and `_EM_INVALID`) correctly change the FPCR register. Floating point exceptions raised by standard math functions, like Invalid operation from `std::acos`, are exempt from this behavior and can be ignored or raised properly depending on the FPCR register. For more information, see [Overview of ARM32 ABI Conventions](../../build/overview-of-arm-abi-conventions.md#floating-point-exceptions).
93+
- On ARM64EC, Windows catches processor floating-point exceptions and disables them in the FPCR register. This ensures consistent behavior across different processor variants.
94+
8895
### Mask constants and values
8996

9097
For the `_MCW_EM` mask, clearing it sets the exception, which allows the hardware exception; setting it hides the exception. If a `_EM_UNDERFLOW` or `_EM_OVERFLOW` occurs, no hardware exception is thrown until the next floating-point instruction is executed. To generate a hardware exception immediately after `_EM_UNDERFLOW` or `_EM_OVERFLOW`, call the `FWAIT MASM` instruction.

0 commit comments

Comments
 (0)