Anomaly in mpn_sqrtrem and mpn_rottrem
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Tue Jun 9 21:25:19 UTC 2015
Ciao,
Il Mar, 9 Giugno 2015 7:02 pm, Torbjörn Granlund ha scritto:
> bodrato at mail.dm.unipi.it writes:
> We should probably use sqr whenever possible, perhaps it is worth having
> a condition for this in rootrem.c, (I believe mpn_pow_1 gets it right,
> and uses sqr whenever possible, since I looked into that as an
mpn_pow_1 correctly uses sqr, the problem within rootrem arises only when
k==2. The two lines of code:
wn = mpn_pow_1 (wp, sp, sn, k - 1, qp);
mpn_mul (qp, wp, wn, sp, sn);
degenerate with k=2. mpn_pow_1 is used to perform an MPN_COPY so that the
two operands passed to mpn_mul contain the same number...
Maybe an initial condition if (k == 2) return mpn_sqrtrem (....), but the
specialised code for a single k should not get inside the generic rootrem
function, in my opinion.
I wrote the hack only to understand the algorithm... I'll clean up rootrem
somehow, but I'll not insert special code.
> tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrt mpn_root.2
> I get about 0.97 on average.
This can be healed using in sqrtrem the same strategy that rootrem uses to
skip the last squaring if the reminder is not required: adding a couple of
zero limbs and compute a slightly larger root...
> And
> tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrtrem
> mpn_rootrem.2
> gives around 1.25.
This is not bad, a generic code can be slightly slower than a specialised
one.
> The last functions called by mpn_sqrtrem when computing the root of a
> 2000
> limb operand are:
>
> mpn_divrem nn=1000/dn=500 -> qn=500,rn=500
> mpn_sqr an=500 ^ 2 -> rn=1000
>
> Similarly, for mpn_rootrem:
>
> mpn_div_q nn=1000/dn=501 -> qn=499
> mpn_sqr an=1000 ^ 2 -> rn=2000
>
> Interesting indeed. I wonder why they need different multiply sizes.
sqrtrem obtain the reminder with divrem and needs only the b^2 part of
(ax+b)^2 to update it.
rootrem rebuild the reminder from scratch every loop... because the
generic (ax+b)^k has too many components.
> I am not sure that div_q vs divrem matters a whole lot for this usage
> (i.e., 2n/n sizes).
The manual says, about mpn_divrem:
This function is obsolete. Please call mpn_tdiv_qr instead for best
performance.
With a little scratch space passed to mpn_dc_sqrtrem, it's easy to stop
using divrem, but... will the performance really benefit of this?
Regards,
m
--
http://bodrato.it/toom-cook/
More information about the gmp-devel
mailing list