Anomaly in mpn_sqrtrem and mpn_rottrem
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Tue Jun 9 12:17:09 UTC 2015
Ciao,
Il Lun, 8 Giugno 2015 10:15 am, Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
>
> I've had a quick look. Both mpn_dc_sqrtrem and mpn_rootrem_internal
> (which seem to be the work horses for larger operands) do a division in
> the loop. The latter is organized as a newton iteration, the former
> isn't (but I guess the computation is more or less equivalent to a
> Newton iteration).
To compare them I wrote a quick and dirty specialization of the rootrem
algorithm for the k==2 case (use sqr instead of mul, lshift instead of
mul_1...)
Current speed (on shell)
$ tune/speed -p 100000000 -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
mpn_root.2 mpn_sqrt mpn_rootrem.2 mpn_sqrtrem
1 0.000000518 0.000000011 0.000000516 #0.000000009
16 0.000001103 0.000000244 0.000001117 #0.000000244
256 0.000009316 #0.000008321 0.000014106 0.000008374
4096 #0.000633967 0.000659180 0.000894513 0.000654562
65536 #0.026071234 0.026756061 0.033005667 0.026762939
1048576 0.732938835 0.734521311 0.962774086 #0.731488979
With the attached specialised function, rootrem.2 gains a 10% in speed and
is faster than sqrt when the reminder is not needed and size grows.
$ tune/speed -p 100000000 -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
mpn_root.2 mpn_sqrt mpn_rootrem.2 mpn_sqrtrem
1 0.000000376 #0.000000010 0.000000383 0.000000010
16 0.000000884 #0.000000241 0.000000894 0.000000243
256 #0.000007743 0.000008262 0.000012138 0.000008282
4096 #0.000563298 0.000657382 0.000816271 0.000650694
65536 #0.023296001 0.026694774 0.030225570 0.026717529
1048576 #0.654117324 0.736489574 0.832002002 0.733319729
I added some TRACE (printf...) in the code to show the primitives with a
super-linear cost used in computation.
The last functions called by mpn_sqrtrem when computing the root of a 2000
limb operand are:
mpn_divrem nn=500/dn=250 -> qn=250,rn=250
mpn_sqr an=250 ^ 2 -> rn=500
mpn_divrem nn=1000/dn=500 -> qn=500,rn=500
mpn_sqr an=500 ^ 2 -> rn=1000
Similarly, for mpn_rootrem:
mpn_div_q nn=500/dn=251 -> qn=249
mpn_sqr an=501 ^ 2 -> rn=1002
mpn_div_q nn=1000/dn=501 -> qn=499
mpn_sqr an=1000 ^ 2 -> rn=2000
It uses div_q instead of divrem, but squares used by rootrem are doubled
in size...
When we do not ask for the reminder mpn_rootrem (say mpn_root) skips the
last squaring:
[mpn_sqr an=251 ^ 2 -> rn=502]
mpn_div_q nn=501/dn=251 -> qn=250
mpn_sqr an=501 ^ 2 -> rn=1002
mpn_div_q nn=1002/dn=501 -> qn=501
The operation are similar if compared to sqrt (the order changes),
probably div_q is faster than divrem. mpn_sqrt wins only when the huge
speed-up given by the specialised code for the first limb is relevant...
Best regards,
mb
--
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rootrem.c
Type: text/x-csrc
Size: 23291 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20150609/40caf3b1/attachment.bin>
More information about the gmp-devel
mailing list