Anomaly in mpn_sqrtrem and mpn_rottrem

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Tue Jun 9 21:25:19 UTC 2015


Ciao,

Il Mar, 9 Giugno 2015 7:02 pm, Torbjörn Granlund ha scritto:
> bodrato at mail.dm.unipi.it writes:

> We should probably use sqr whenever possible, perhaps it is worth having
> a condition for this in rootrem.c, (I believe mpn_pow_1 gets it right,
> and uses sqr whenever possible, since I looked into that as an

mpn_pow_1 correctly uses sqr, the problem within rootrem arises only when
k==2. The two lines of code:
	      wn = mpn_pow_1 (wp, sp, sn, k - 1, qp);
	      mpn_mul (qp, wp, wn, sp, sn);
degenerate with k=2. mpn_pow_1 is used to perform an MPN_COPY so that the
two operands passed to mpn_mul contain the same number...

Maybe an initial condition if (k == 2) return mpn_sqrtrem (....), but the
specialised code for a single k should not get inside the generic rootrem
function, in my opinion.

I wrote the hack only to understand the algorithm... I'll clean up rootrem
somehow, but I'll not insert special code.

>   tune/speed -r -p10000000 -s16-10000000 -f1.438  mpn_sqrt mpn_root.2
> I get about 0.97 on average.

This can be healed using in sqrtrem the same strategy that rootrem uses to
skip the last squaring if the reminder is not required: adding a couple of
zero limbs and compute a slightly larger root...

> And
>   tune/speed -r -p10000000 -s16-10000000 -f1.438  mpn_sqrtrem
> mpn_rootrem.2
> gives around 1.25.

This is not bad, a generic code can be slightly slower than a specialised
one.

>   The last functions called by mpn_sqrtrem when computing the root of a
> 2000
>   limb operand are:
>
>   mpn_divrem nn=1000/dn=500 -> qn=500,rn=500
>   mpn_sqr an=500 ^ 2 -> rn=1000
>
>   Similarly, for mpn_rootrem:
>
>   mpn_div_q nn=1000/dn=501 -> qn=499
>   mpn_sqr an=1000 ^ 2 -> rn=2000
>
> Interesting indeed.  I wonder why they need different multiply sizes.

sqrtrem obtain the reminder with divrem and needs only the b^2 part of
(ax+b)^2 to update it.
rootrem rebuild the reminder from scratch every loop... because the
generic (ax+b)^k has too many components.

> I am not sure that div_q vs divrem matters a whole lot for this usage
> (i.e., 2n/n sizes).

The manual says, about mpn_divrem:
This function is obsolete. Please call mpn_tdiv_qr instead for best
performance.
With a little scratch space passed to mpn_dc_sqrtrem, it's easy to stop
using divrem, but... will the performance really benefit of this?

Regards,
m

-- 
http://bodrato.it/toom-cook/



More information about the gmp-devel mailing list