Anomaly in mpn_sqrtrem and mpn_rottrem
Torbjörn Granlund
tg at gmplib.org
Tue Jun 9 17:02:48 UTC 2015
bodrato at mail.dm.unipi.it writes:
To compare them I wrote a quick and dirty specialization of the rootrem
algorithm for the k==2 case (use sqr instead of mul, lshift instead of
mul_1...)
Cool!
We should probably use sqr whenever possible, perhaps it is worth having
a condition for this in rootrem.c, (I believe mpn_pow_1 gets it right,
and uses sqr whenever possible, since I looked into that as an
explanation.)
$ tune/speed -p 100000000 -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
mpn_root.2 mpn_sqrt mpn_rootrem.2 mpn_sqrtrem
1 0.000000518 0.000000011 0.000000516 #0.000000009
16 0.000001103 0.000000244 0.000001117 #0.000000244
256 0.000009316 #0.000008321 0.000014106 0.000008374
4096 #0.000633967 0.000659180 0.000894513 0.000654562
65536 #0.026071234 0.026756061 0.033005667 0.026762939
1048576 0.732938835 0.734521311 0.962774086 #0.731488979
With the attached specialised function, rootrem.2 gains a 10% in speed and
is faster than sqrt when the reminder is not needed and size grows.
$ tune/speed -p 100000000 -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
mpn_root.2 mpn_sqrt mpn_rootrem.2 mpn_sqrtrem
1 0.000000376 #0.000000010 0.000000383 0.000000010
16 0.000000884 #0.000000241 0.000000894 0.000000243
256 #0.000007743 0.000008262 0.000012138 0.000008282
4096 #0.000563298 0.000657382 0.000816271 0.000650694
65536 #0.023296001 0.026694774 0.030225570 0.026717529
1048576 #0.654117324 0.736489574 0.832002002 0.733319729
Your timing leaps hide some of the actual timing characteristics...
Using
tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrt mpn_root.2
I get about 0.97 on average.
And
tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrtrem mpn_rootrem.2
gives around 1.25.
(In both cases without your rootrem hack.)
I added some TRACE (printf...) in the code to show the primitives with a
super-linear cost used in computation.
The last functions called by mpn_sqrtrem when computing the root of a 2000
limb operand are:
mpn_divrem nn=500/dn=250 -> qn=250,rn=250
mpn_sqr an=250 ^ 2 -> rn=500
mpn_divrem nn=1000/dn=500 -> qn=500,rn=500
mpn_sqr an=500 ^ 2 -> rn=1000
Similarly, for mpn_rootrem:
mpn_div_q nn=500/dn=251 -> qn=249
mpn_sqr an=501 ^ 2 -> rn=1002
mpn_div_q nn=1000/dn=501 -> qn=499
mpn_sqr an=1000 ^ 2 -> rn=2000
Interesting indeed. I wonder why they need different multiply sizes.
It uses div_q instead of divrem, but squares used by rootrem are doubled
in size...
The operation are similar if compared to sqrt (the order changes),
probably div_q is faster than divrem. mpn_sqrt wins only when the huge
speed-up given by the specialised code for the first limb is relevant...
I am not sure that div_q vs divrem matters a whole lot for this usage
(i.e., 2n/n sizes).
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list