# mpn_sqrtrem{1,2} - patch for pure C implem

Marco Bodrato bodrato at mail.dm.unipi.it
Thu May 31 01:11:22 UTC 2018

```Ciao Paul,

Il Mar, 6 Giugno 2017 9:00 am, paul zimmermann ha scritto:
> I'd be happy to see a comparison with the code of .

>  https://members.loria.fr/PZimmermann/papers/mpfr4.pdf

I attach a small program, containing the tables and the initial part of a
macro implementing that code. I took the code from the current MPFR lib.

The code is compared to some new code using a smaller table.
Maximal error is compared, not speed.

Il Gio, 15 Giugno 2017 6:02 pm, Marco Bodrato ha scritto:
> Il Ven, 9 Giugno 2017 1:41 pm, paul zimmermann ha scritto:
>>> Finally we can totally skip the first approximation and directly compute
>>> i' = T1[h(a)] + l(a) * T2[h(a)]
>>> using a piece-wise linear function.

>> Not sure l(a) is better than using the upper (say) 16 bits of a, in
>> case you want a 16-bit approximation i'.

> I'd prefer a 17 or 18-bit approximation ;-)

It seems it is possible :-)

> And I'd like to store both T1[x] and T2[x] into a single 32-bit entry...

The code I took from MPFR uses two tables, with 768 entries each. And
achieves 17 bit of precision using values from the two tables, and a
64x64->64 multiplication.

The new code I propose uses a single table, with only 385 entries, 24 bit
each. And achieves 17 bit of precision using two consecutive values from
the table, and a 16x16->32 multiplication.
I tested it on a 64-bits platform, but it should be easy to port it to 32
bits, or even to 16-bit limbs :-)

>> To compute T1 and T2 you have two choices:
>>
>> 1) either try to minimize the maximal error after the first Newton
iteration
>>    (i.e., on i')
>>
>> 2) or try to minimize the maximal error after k iterations. See the
article
>>    by Jean-Michel Muller and Peter Kornerup:
>>    https://hal.archives-ouvertes.fr/inria-00071899/document

> there is a third choice, the one Niels describe with
> "maintain a known and fixed sign of the error".

I decided for the third choice.

I attach the code.

It will print the range of varibility of the approximation of
inverse_square_root with respect to a reference function. E.g.

dhh = 256 (_i=0) (_d9h=0) gmpfr: [0, 99], gmp: [1, 97]
dhh = 257 (_i=1) (_d9h=0) gmpfr: [0, 100], gmp: [1, 97]
dhh = 258 (_i=2) (_d9h=1) gmpfr: [0, 102], gmp: [1, 97]
dhh = 259 (_i=3) (_d9h=1) gmpfr: [0, 105], gmp: [1, 94]
dhh = 260 (_i=4) (_d9h=2) gmpfr: [0, 109], gmp: [1, 94]
dhh = 261 (_i=5) (_d9h=2) gmpfr: [1, 115], gmp: [1, 92]
dhh = 262 (_i=6) (_d9h=3) gmpfr: [2, 121], gmp: [1, 92]
dhh = 263 (_i=7) (_d9h=3) gmpfr: [3, 130], gmp: [1, 91]
dhh = 264 (_i=8) (_d9h=4) gmpfr: [5, 140], gmp: [1, 91]
dhh = 265 (_i=9) (_d9h=4) gmpfr: [0, 97], gmp: [1, 89]

Here, you can see that for dhh=264, the error of the gmpfr implementation
is maximal and larger than the error by the new (gmp?) implementation. It
is the interval that the paper  cited by Paul cites (page 5) as "the
largest value of e_1...corresponds to d_{10}=265"

With the code, I attach also two pictures: one shows the maximal error of
the function, the other is a detail, where you can again see the peak of
the  error around dhh = 264.

I did not recompute all tables, but some estimates tell me that doubling
the number of intervals, we can get (almost) two more bits of precision.

I.e. with a single table of 769 entries (25 bits each), we should be able
to compute the starting approximation i' for inverse_sqrt, with a single
16x16->32(highest 16 bits) multiplication, gaining a precision of 19 bits.

A table with 1537 entries (26 bits each), and the same strategy with a
single 16x16->32 multiplication, should be enough for a precision of 21
bits.
The code I'm comparing to has two 768-entries tables... and also reading
two consecutive 32-bits entries should be better than reading two 64 bits
entries in two arrays.

comments  are welcome, as usual.

Ĝis,
m

--
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-csrc
Size: 22836 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Eraro.png
Type: image/png
Size: 34947 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Detalo.png
Type: image/png
Size: 56994 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0003.png>
```