|
| 1 | +# Float16 |
| 2 | + |
| 3 | +* Proposal: [SE-0276](0276-float16.md) |
| 4 | +* Author: [Stephen Canon](https://github.com/stephentyrone) |
| 5 | +* Review Manager: [Ben Cohen](https://github.com/airspeedswift) |
| 6 | +* Pitch thread: [Add Float16](https://forums.swift.org/t/add-float16/33019) |
| 7 | +* Implementation: [Implement Float16](https://github.com/apple/swift/pull/21738) |
| 8 | +* Status: **Active review (6–17 February 2020)** |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +Introduce the `Float16` type conforming to the `BinaryFloatingPoint` and `SIMDScalar` |
| 13 | +protocols, binding the IEEE 754 *binary16* format (aka *float16*, *half-precision*, or *half*), |
| 14 | +and bridged by the compiler to the C `_Float16` type. |
| 15 | + |
| 16 | +* Old pitch thread: [Add `Float16`](https://forums.swift.org/t/add-float16/19370). |
| 17 | + |
| 18 | +## Motivation |
| 19 | + |
| 20 | +The last decade has seen a dramatic increase in the use of floating-point types smaller |
| 21 | +than (32-bit) `Float`. The most widely implemented is `Float16`, which is used |
| 22 | +extensively on mobile GPUs for computation, as a pixel format for HDR images, and as |
| 23 | +a compressed format for weights in ML applications. |
| 24 | + |
| 25 | +Introducing the type to Swift is especially important for interoperability with shader-language |
| 26 | +programs; users frequently need to set up data structures on the CPU to |
| 27 | +pass to their GPU programs. Without the type available in Swift, they are forced to use |
| 28 | +unsafe mechanisms to create these structures. |
| 29 | + |
| 30 | +In addition, C APIs that use these types simply cannot be imported, making them |
| 31 | +unavailable in Swift. |
| 32 | + |
| 33 | +## Proposed solution |
| 34 | + |
| 35 | +Add `Float16` to the standard library. |
| 36 | + |
| 37 | +## Detailed design |
| 38 | + |
| 39 | +There is shockingly little to say here. We will add: |
| 40 | +``` |
| 41 | +@frozen |
| 42 | +struct Float16: BinaryFloatingPoint, SIMDScalar, CustomStringConvertible { } |
| 43 | +``` |
| 44 | +The entire API falls out from that, with no additional surface outside that provided by those |
| 45 | +protocols. `Float16` will provide exactly the operations that `Float` and `Double` and `Float80` |
| 46 | +do for their conformance to these protocols. |
| 47 | + |
| 48 | +We also need to ensure that the parameter passing conventions followed by the compiler |
| 49 | +for `Float16` are what we want; these values should be passed and returned in the |
| 50 | +floating-point registers, and vectors should be passed and returned in SIMD registers. |
| 51 | + |
| 52 | +On platforms that do not have native arithmetic support, we will convert `Float16` to |
| 53 | +`Float` and use the hardware support for `Float` to perform the operation. This is |
| 54 | +correctly-rounded for every operation except fused multiply-add. A software sequence |
| 55 | +will be used to emulate fused multiply-add in these cases (the easiest option is to convert |
| 56 | +to `Double`, but other options may be more efficient on some architectures, especially |
| 57 | +for vectors). |
| 58 | + |
| 59 | +## Source compatibility |
| 60 | + |
| 61 | +N/A |
| 62 | + |
| 63 | +## Effect on ABI stability |
| 64 | + |
| 65 | +There is no change to existing ABI. We will be introducing a new type, which will have |
| 66 | +appropriate availability annotations. |
| 67 | + |
| 68 | +## Effect on API resilience |
| 69 | + |
| 70 | +The `Float16` type would become part of the public API. It will be `@frozen`, so no |
| 71 | +further changes will be possible, but its API and layout are entirely constrained by |
| 72 | +IEEE 754 and conformance to `BinaryFloatingPoint`, so there are no alternatives |
| 73 | +possible anyway. |
| 74 | + |
| 75 | +## Alternatives considered |
| 76 | + |
| 77 | +Q: Why isn't it called `Half`? |
| 78 | + |
| 79 | +A: `FloatN` is the more consistent pattern. Swift already has `Float32`, |
| 80 | +`Float64` and `Float80` (with `Float` and `Double` as alternative spellings of `Float32` |
| 81 | +and `Float64`, respectively). At some future point we will add `Float128`. Plus, the C |
| 82 | +language type that this will bridge to is named `_Float16`. |
| 83 | + |
| 84 | +`Half` is not completely outrageous as an alias, but we shouldn't add aliases unless |
| 85 | +there's a really compelling reason. |
| 86 | + |
| 87 | +During the pitch phase, feedback was fairly overwhelmingly in favor of `Float16`, though |
| 88 | +there are a few people who would like to have both names available. Unless significantly |
| 89 | +more people come forward, however, we should make the "opinionated" choice to have a single |
| 90 | +name for the type. An alias could always be added with a subsequent minor proposal if |
| 91 | +necessary. |
| 92 | + |
| 93 | +Q: What about ARM's ["alternative half precision"](https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision)? |
| 94 | +What about [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)? |
| 95 | + |
| 96 | +A: Alternative half-precision is no longer supported; ARMv8.2-FP16 only uses the IEEE 754 |
| 97 | +encoding. Bfloat is something that we should *also* support eventually, but it's a separate |
| 98 | +proposal. and we should wait a little while before doing it. Conformance to IEEE 754 fully |
| 99 | +defines the semantics of `Float16` (and several hardware vendors, including Apple have |
| 100 | +been shipping processors that implement those semantics for a few years). By contrast, |
| 101 | +there are a few proposals for hardware that implements bfloat16; ARM and Intel designed |
| 102 | +different multiply-add units for it, and haven't shipped yet (and haven't defined any |
| 103 | +operations *other* than a non-homogeneous multiply-add). Other companies have HW in |
| 104 | +use, but haven't (publicly) formally specified their arithmetic. It's a moving target, and it |
| 105 | +would be a mistake to attempt to specify language bindings today. |
| 106 | + |
| 107 | +Q: Do we need conformance to `BinaryFloatingPoint`? What if we made it a storage-only format? |
| 108 | + |
| 109 | +A: We could make it a type that can only be used to convert to/from `Float` and `Double`, |
| 110 | +forcing all arithmetic to be performed in another format. However, this would mean that |
| 111 | +it would be much harder, in some cases, to get the same numerical result on the CPU and |
| 112 | +GPU (when GPU computation is performed in half-precision). It's a very nice convenience |
| 113 | +to be able to do a computation in the REPL and see what a GPU is going to do. |
| 114 | + |
| 115 | +Q: Why not put it in Swift Numerics? |
| 116 | + |
| 117 | +A: The biggest reason to add the type is for interoperability with C-family and GPU programs |
| 118 | +that want to use their analogous types. In order to maximize the support for that interoperability, |
| 119 | +and to get the calling conventions that we want to have in the long-run, it makes more sense to put |
| 120 | +this type in the standard library. |
| 121 | + |
| 122 | +Q: What about math library support? |
| 123 | + |
| 124 | +A: If this proposal is approved, I will add conformance to `Real` in Swift Numerics, providing |
| 125 | +the math library functions (by using the corresponding `Float` implementations initially). |
0 commit comments