Skip to content

Commit b874d4b

Browse files
Float16 (#1120)
* Float16 proposal * Kick off Float16 review
1 parent 01ab53b commit b874d4b

File tree

1 file changed

+125
-0
lines changed

1 file changed

+125
-0
lines changed

proposals/0000-float16.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Float16
2+
3+
* Proposal: [SE-0276](0276-float16.md)
4+
* Author: [Stephen Canon](https://github.com/stephentyrone)
5+
* Review Manager: [Ben Cohen](https://github.com/airspeedswift)
6+
* Pitch thread: [Add Float16](https://forums.swift.org/t/add-float16/33019)
7+
* Implementation: [Implement Float16](https://github.com/apple/swift/pull/21738)
8+
* Status: **Active review (6–17 February 2020)**
9+
10+
## Introduction
11+
12+
Introduce the `Float16` type conforming to the `BinaryFloatingPoint` and `SIMDScalar`
13+
protocols, binding the IEEE 754 *binary16* format (aka *float16*, *half-precision*, or *half*),
14+
and bridged by the compiler to the C `_Float16` type.
15+
16+
* Old pitch thread: [Add `Float16`](https://forums.swift.org/t/add-float16/19370).
17+
18+
## Motivation
19+
20+
The last decade has seen a dramatic increase in the use of floating-point types smaller
21+
than (32-bit) `Float`. The most widely implemented is `Float16`, which is used
22+
extensively on mobile GPUs for computation, as a pixel format for HDR images, and as
23+
a compressed format for weights in ML applications.
24+
25+
Introducing the type to Swift is especially important for interoperability with shader-language
26+
programs; users frequently need to set up data structures on the CPU to
27+
pass to their GPU programs. Without the type available in Swift, they are forced to use
28+
unsafe mechanisms to create these structures.
29+
30+
In addition, C APIs that use these types simply cannot be imported, making them
31+
unavailable in Swift.
32+
33+
## Proposed solution
34+
35+
Add `Float16` to the standard library.
36+
37+
## Detailed design
38+
39+
There is shockingly little to say here. We will add:
40+
```
41+
@frozen
42+
struct Float16: BinaryFloatingPoint, SIMDScalar, CustomStringConvertible { }
43+
```
44+
The entire API falls out from that, with no additional surface outside that provided by those
45+
protocols. `Float16` will provide exactly the operations that `Float` and `Double` and `Float80`
46+
do for their conformance to these protocols.
47+
48+
We also need to ensure that the parameter passing conventions followed by the compiler
49+
for `Float16` are what we want; these values should be passed and returned in the
50+
floating-point registers, and vectors should be passed and returned in SIMD registers.
51+
52+
On platforms that do not have native arithmetic support, we will convert `Float16` to
53+
`Float` and use the hardware support for `Float` to perform the operation. This is
54+
correctly-rounded for every operation except fused multiply-add. A software sequence
55+
will be used to emulate fused multiply-add in these cases (the easiest option is to convert
56+
to `Double`, but other options may be more efficient on some architectures, especially
57+
for vectors).
58+
59+
## Source compatibility
60+
61+
N/A
62+
63+
## Effect on ABI stability
64+
65+
There is no change to existing ABI. We will be introducing a new type, which will have
66+
appropriate availability annotations.
67+
68+
## Effect on API resilience
69+
70+
The `Float16` type would become part of the public API. It will be `@frozen`, so no
71+
further changes will be possible, but its API and layout are entirely constrained by
72+
IEEE 754 and conformance to `BinaryFloatingPoint`, so there are no alternatives
73+
possible anyway.
74+
75+
## Alternatives considered
76+
77+
Q: Why isn't it called `Half`?
78+
79+
A: `FloatN` is the more consistent pattern. Swift already has `Float32`,
80+
`Float64` and `Float80` (with `Float` and `Double` as alternative spellings of `Float32`
81+
and `Float64`, respectively). At some future point we will add `Float128`. Plus, the C
82+
language type that this will bridge to is named `_Float16`.
83+
84+
`Half` is not completely outrageous as an alias, but we shouldn't add aliases unless
85+
there's a really compelling reason.
86+
87+
During the pitch phase, feedback was fairly overwhelmingly in favor of `Float16`, though
88+
there are a few people who would like to have both names available. Unless significantly
89+
more people come forward, however, we should make the "opinionated" choice to have a single
90+
name for the type. An alias could always be added with a subsequent minor proposal if
91+
necessary.
92+
93+
Q: What about ARM's ["alternative half precision"](https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision)?
94+
What about [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)?
95+
96+
A: Alternative half-precision is no longer supported; ARMv8.2-FP16 only uses the IEEE 754
97+
encoding. Bfloat is something that we should *also* support eventually, but it's a separate
98+
proposal. and we should wait a little while before doing it. Conformance to IEEE 754 fully
99+
defines the semantics of `Float16` (and several hardware vendors, including Apple have
100+
been shipping processors that implement those semantics for a few years). By contrast,
101+
there are a few proposals for hardware that implements bfloat16; ARM and Intel designed
102+
different multiply-add units for it, and haven't shipped yet (and haven't defined any
103+
operations *other* than a non-homogeneous multiply-add). Other companies have HW in
104+
use, but haven't (publicly) formally specified their arithmetic. It's a moving target, and it
105+
would be a mistake to attempt to specify language bindings today.
106+
107+
Q: Do we need conformance to `BinaryFloatingPoint`? What if we made it a storage-only format?
108+
109+
A: We could make it a type that can only be used to convert to/from `Float` and `Double`,
110+
forcing all arithmetic to be performed in another format. However, this would mean that
111+
it would be much harder, in some cases, to get the same numerical result on the CPU and
112+
GPU (when GPU computation is performed in half-precision). It's a very nice convenience
113+
to be able to do a computation in the REPL and see what a GPU is going to do.
114+
115+
Q: Why not put it in Swift Numerics?
116+
117+
A: The biggest reason to add the type is for interoperability with C-family and GPU programs
118+
that want to use their analogous types. In order to maximize the support for that interoperability,
119+
and to get the calling conventions that we want to have in the long-run, it makes more sense to put
120+
this type in the standard library.
121+
122+
Q: What about math library support?
123+
124+
A: If this proposal is approved, I will add conformance to `Real` in Swift Numerics, providing
125+
the math library functions (by using the corresponding `Float` implementations initially).

0 commit comments

Comments
 (0)