mpn_sqrtrem{1,2} - patch for pure C implem

Thu May 31 01:11:22 UTC 2018

Ciao Paul,

Il Mar, 6 Giugno 2017 9:00 am, paul zimmermann ha scritto:
> I'd be happy to see a comparison with the code of [1].

> [1] https://members.loria.fr/PZimmermann/papers/mpfr4.pdf

I attach a small program, containing the tables and the initial part of a
macro implementing that code. I took the code from the current MPFR lib.

The code is compared to some new code using a smaller table.
Maximal error is compared, not speed.

Il Gio, 15 Giugno 2017 6:02 pm, Marco Bodrato ha scritto:
> Il Ven, 9 Giugno 2017 1:41 pm, paul zimmermann ha scritto:
>>> Finally we can totally skip the first approximation and directly compute
>>> i' = T1[h(a)] + l(a) * T2[h(a)]
>>> using a piece-wise linear function.

>> Not sure l(a) is better than using the upper (say) 16 bits of a, in
>> case you want a 16-bit approximation i'.

> I'd prefer a 17 or 18-bit approximation ;-)

It seems it is possible :-)

> And I'd like to store both T1[x] and T2[x] into a single 32-bit entry...

The code I took from MPFR uses two tables, with 768 entries each. And
achieves 17 bit of precision using values from the two tables, and a
64x64->64 multiplication.

The new code I propose uses a single table, with only 385 entries, 24 bit
each. And achieves 17 bit of precision using two consecutive values from
the table, and a 16x16->32 multiplication.
I tested it on a 64-bits platform, but it should be easy to port it to 32
bits, or even to 16-bit limbs :-)

>> To compute T1 and T2 you have two choices:
>>
>> 1) either try to minimize the maximal error after the first Newton
iteration
>>    (i.e., on i')
>>
>> 2) or try to minimize the maximal error after k iterations. See the
article
>>    by Jean-Michel Muller and Peter Kornerup:
>>    https://hal.archives-ouvertes.fr/inria-00071899/document

> there is a third choice, the one Niels describe with
> "maintain a known and fixed sign of the error".

I decided for the third choice.

I attach the code.

It will print the range of varibility of the approximation of
inverse_square_root with respect to a reference function. E.g.

$ gcc InvKvadRad.c -lgmp -o ikr&&./ikr |head
dhh = 256 (_i=0) (_d9h=0) gmpfr: [0, 99], gmp: [1, 97]
dhh = 257 (_i=1) (_d9h=0) gmpfr: [0, 100], gmp: [1, 97]
dhh = 258 (_i=2) (_d9h=1) gmpfr: [0, 102], gmp: [1, 97]
dhh = 259 (_i=3) (_d9h=1) gmpfr: [0, 105], gmp: [1, 94]
dhh = 260 (_i=4) (_d9h=2) gmpfr: [0, 109], gmp: [1, 94]
dhh = 261 (_i=5) (_d9h=2) gmpfr: [1, 115], gmp: [1, 92]
dhh = 262 (_i=6) (_d9h=3) gmpfr: [2, 121], gmp: [1, 92]
dhh = 263 (_i=7) (_d9h=3) gmpfr: [3, 130], gmp: [1, 91]
dhh = 264 (_i=8) (_d9h=4) gmpfr: [5, 140], gmp: [1, 91]
dhh = 265 (_i=9) (_d9h=4) gmpfr: [0, 97], gmp: [1, 89]

Here, you can see that for dhh=264, the error of the gmpfr implementation
is maximal and larger than the error by the new (gmp?) implementation. It
is the interval that the paper [1] cited by Paul cites (page 5) as "the
largest value of e_1...corresponds to d_{10}=265"

With the code, I attach also two pictures: one shows the maximal error of
the function, the other is a detail, where you can again see the peak of
the  error around dhh = 264.

I did not recompute all tables, but some estimates tell me that doubling
the number of intervals, we can get (almost) two more bits of precision.

I.e. with a single table of 769 entries (25 bits each), we should be able
to compute the starting approximation i' for inverse_sqrt, with a single
16x16->32(highest 16 bits) multiplication, gaining a precision of 19 bits.

A table with 1537 entries (26 bits each), and the same strategy with a
single 16x16->32 multiplication, should be enough for a precision of 21
bits.
The code I'm comparing to has two 768-entries tables... and also reading
two consecutive 32-bits entries should be better than reading two 64 bits
entries in two arrays.

comments  are welcome, as usual.

Ĝis,
m

-- 
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InvKvadRad.c
Type: text/x-csrc
Size: 22836 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Eraro.png
Type: image/png
Size: 34947 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Detalo.png
Type: image/png
Size: 56994 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0003.png>