Floating-point representation

Paul Zimmermann Paul.Zimmermann at inria.fr
Tue May 18 09:34:03 UTC 2021

       Hi David,

> From: "David M. Warme" <David at Warme.net>
> Date: Thu, 13 May 2021 14:44:35 -0400
> Consider the floating-point representation
>      m * b**e
> where m, b and e are integers and b >= 2 is a fixed constant.
> (The mantissa m could be a GMP integer, and e a fixed-precision
> signed integer.)
> The normalization rule is that the mantissa m is either zero
> or not divisible by b.  (For the usual case of b = 2, the
> mantissa m is either zero or an odd integer.)
> This representation supports add, subtract and multiply with
> no rounding error.
> Q1: Does this representation have a well-defined name (especially
>      for the b = 2 case)?
> Q2: If so, does anyone have a reference in the literature?

I don't know of a name for this representation, however you should have a look
at the concept of "quantum" in the IEEE 754 standard (for decimal formats).

Note that the IEEE 754 normalization (for binary formats) is to always have
the most significant bit of m to be 1 (for non-zero numbers), whereas the
above representation always has the least significant bit to be 1. You could
even have this least significant bit implicit: since m=2k+1, only store k
(of course you would have to detect zero with a special value of the exponent,
like in IEEE 754).

Paul Zimmermann

More information about the gmp-discuss mailing list