mpn_sqrtrem{1,2} - patch for pure C implem

Thu Jun 8 22:28:44 UTC 2017

Ciao,

Il Gio, 8 Giugno 2017 9:44 am, paul zimmermann ha scritto:
>        Dear Marco,

> - in the 1-limb routine mpfr_sqrt1(), in the

>   at the end we have u0*2^GMP_NUMB_BITS = r0^2 + rb*2^GMP_NUMB_BITS + sb

I'll look into it.

> - similarly in mpfr_sqrt2() in the else branch we have
>   {np, 4} = {rp, 2}^2 + {tp, 3} thus {rp, 2} is the integer square root

We don't have such a function in GMP :-) I mean, not a special one for the
4-limbs operand.

>> Then I'll try to mix the tricks:
>>  - tabling the cube of the initial approximation to save two
>> multiplications, as the code in MPFR does;
>>  - adding a little perturbation to some iteration to increase precision
>> and keep on using a smaller table.

I didn't have the time, yet, to carefully read MPFR code, but I thought
about this tricks.

I suppose the first i' = i*(3 - a*i*i)/2 iteration is rewritten as

i' = i+i*(1-a*i*i)/2 = i+i/2-a*i*i*i/2 = (i*3/2) - a * (i*i*i/2)

Then (i*3/2) and (i*i*i/2) are tabled (T1 and T2 respectively) and

i' = T1[h(a)] + a * T2[h(a)]

is computed.

The constant (c) I added to the first iteration should be incorporated in
T1, and of course it is not necessary to really use a constant. A
different value can be used for each entry in the table...

Finally we can totally skip the first approximation and directly compute

i' = T1[h(a)] + l(a) * T2[h(a)]

using a piece-wise linear function.

I believe that we can define

h(a) = a >> GMP_NUMB_BITS - 9 /* the highest 9 bits */

l(a) = (a >> GMP_NUMB_BITS - 20) & ((1<<12)-1) /* the next 11 bits */

T1 and T2 two tables of 128+256=384 elements with entries of 16 bits each

Carefully choosing the two table entries (the "linear regression" of our
points in each of the intervals) for each h(a) value, we should be able to
obtain a 17 bit approximation with a single 16x16->32 multiplication.

I should write some script to compute the two tables.

Regards,
m

-- 
http://bodrato.it/