mpn_sqrtrem{1,2} - patch for pure C implem
Marco Bodrato
bodrato at mail.dm.unipi.it
Thu May 31 01:11:22 UTC 2018
Ciao Paul,
Il Mar, 6 Giugno 2017 9:00 am, paul zimmermann ha scritto:
> I'd be happy to see a comparison with the code of [1].
> [1] https://members.loria.fr/PZimmermann/papers/mpfr4.pdf
I attach a small program, containing the tables and the initial part of a
macro implementing that code. I took the code from the current MPFR lib.
The code is compared to some new code using a smaller table.
Maximal error is compared, not speed.
Il Gio, 15 Giugno 2017 6:02 pm, Marco Bodrato ha scritto:
> Il Ven, 9 Giugno 2017 1:41 pm, paul zimmermann ha scritto:
>>> Finally we can totally skip the first approximation and directly compute
>>> i' = T1[h(a)] + l(a) * T2[h(a)]
>>> using a piece-wise linear function.
>> Not sure l(a) is better than using the upper (say) 16 bits of a, in
>> case you want a 16-bit approximation i'.
> I'd prefer a 17 or 18-bit approximation ;-)
It seems it is possible :-)
> And I'd like to store both T1[x] and T2[x] into a single 32-bit entry...
The code I took from MPFR uses two tables, with 768 entries each. And
achieves 17 bit of precision using values from the two tables, and a
64x64->64 multiplication.
The new code I propose uses a single table, with only 385 entries, 24 bit
each. And achieves 17 bit of precision using two consecutive values from
the table, and a 16x16->32 multiplication.
I tested it on a 64-bits platform, but it should be easy to port it to 32
bits, or even to 16-bit limbs :-)
>> To compute T1 and T2 you have two choices:
>>
>> 1) either try to minimize the maximal error after the first Newton
iteration
>> (i.e., on i')
>>
>> 2) or try to minimize the maximal error after k iterations. See the
article
>> by Jean-Michel Muller and Peter Kornerup:
>> https://hal.archives-ouvertes.fr/inria-00071899/document
> there is a third choice, the one Niels describe with
> "maintain a known and fixed sign of the error".
I decided for the third choice.
I attach the code.
It will print the range of varibility of the approximation of
inverse_square_root with respect to a reference function. E.g.
$ gcc InvKvadRad.c -lgmp -o ikr&&./ikr |head
dhh = 256 (_i=0) (_d9h=0) gmpfr: [0, 99], gmp: [1, 97]
dhh = 257 (_i=1) (_d9h=0) gmpfr: [0, 100], gmp: [1, 97]
dhh = 258 (_i=2) (_d9h=1) gmpfr: [0, 102], gmp: [1, 97]
dhh = 259 (_i=3) (_d9h=1) gmpfr: [0, 105], gmp: [1, 94]
dhh = 260 (_i=4) (_d9h=2) gmpfr: [0, 109], gmp: [1, 94]
dhh = 261 (_i=5) (_d9h=2) gmpfr: [1, 115], gmp: [1, 92]
dhh = 262 (_i=6) (_d9h=3) gmpfr: [2, 121], gmp: [1, 92]
dhh = 263 (_i=7) (_d9h=3) gmpfr: [3, 130], gmp: [1, 91]
dhh = 264 (_i=8) (_d9h=4) gmpfr: [5, 140], gmp: [1, 91]
dhh = 265 (_i=9) (_d9h=4) gmpfr: [0, 97], gmp: [1, 89]
Here, you can see that for dhh=264, the error of the gmpfr implementation
is maximal and larger than the error by the new (gmp?) implementation. It
is the interval that the paper [1] cited by Paul cites (page 5) as "the
largest value of e_1...corresponds to d_{10}=265"
With the code, I attach also two pictures: one shows the maximal error of
the function, the other is a detail, where you can again see the peak of
the error around dhh = 264.
I did not recompute all tables, but some estimates tell me that doubling
the number of intervals, we can get (almost) two more bits of precision.
I.e. with a single table of 769 entries (25 bits each), we should be able
to compute the starting approximation i' for inverse_sqrt, with a single
16x16->32(highest 16 bits) multiplication, gaining a precision of 19 bits.
A table with 1537 entries (26 bits each), and the same strategy with a
single 16x16->32 multiplication, should be enough for a precision of 21
bits.
The code I'm comparing to has two 768-entries tables... and also reading
two consecutive 32-bits entries should be better than reading two 64 bits
entries in two arrays.
comments are welcome, as usual.
Ĝis,
m
--
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InvKvadRad.c
Type: text/x-csrc
Size: 22836 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Eraro.png
Type: image/png
Size: 34947 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Detalo.png
Type: image/png
Size: 56994 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180531/49c7656f/attachment-0003.png>
More information about the gmp-devel
mailing list