You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch adds an APFloat type for unsigned E8M0 format.
This format is used for representing the "scale-format"
in the MX specification:
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
This format does not support {Inf, denorms, zeroes}.
Like FP32, this format's exponents are 8-bits (all bits here)
and the bias value is 127. However, it differs from IEEE-FP32
in that the minExponent is -127 (instead of -126).
There are updates done in the APFloat utility functions
to handle these constraints for this format.
* The bias calculation is different and convertIEEE* APIs
are updated to handle this.
* Since there are no significand bits, the
isSignificandAll{Zeroes/Ones} methods are updated accordingly.
* Although the format does not have any precision, the precision
bit in the fltSemantics is set to 1 for consistency with
APFloat's internal representation.
* Many utility functions are updated to handle the fact that this
format does not support Zero.
* Provide a separate initFromAPInt() implementation to
handle the quirks of the format.
* Add specific tests to verify the range of values for this format.
Signed-off-by: Durgadoss R <[email protected]>
0 commit comments