Anomaly in mpn_sqrtrem and mpn_rootrem
Torbjörn Granlund
tg at gmplib.org
Tue Aug 18 08:51:49 UTC 2015
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
With two 256-chars tables(too much?), it is possible to estimate 9 bits
(seldom the error is +2, previous code always used a single correction).
512 bytes is in total for rootrem tables is OK, IMHO.
Before we started refining mpn_rootrem (changeset 16672:c86f4fc0aafe) the
timing (on shell.gmplib) was:
I miss some rootrem logs from ChangeLog.
$tune/speed -cs 1-1024 -f3 mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332
overhead 5.84 cycles, precision 100000000 units of 2.86e-10 secs, CPU freq
3500.08 MHz
mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332
1 2120.44 424.16 84.71 #84.70
3 2537.38 2255.13 86.57 #86.53
9 3556.60 4661.17 687.10 #99.87
27 4779.44 6667.10 7156.09 #151.71
81 8768.02 13230.91 31229.99 #6717.58
243 40720.39 #27991.11 72860.77 81112.17
729 135971.53 #112360.63 240474.62 1060633.50
The current repo gives:
mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332
1 1985.55 290.12 #27.20 27.21
3 2373.10 2015.57 27.21 #27.21
9 3359.58 4401.62 411.39 #27.21
27 4626.43 6317.53 6857.23 #27.21
81 8583.51 12816.22 30837.67 #5783.87
243 39055.19 #27236.95 71936.69 78244.04
729 134456.72 #111492.59 235518.65 1054410.50
The code I'm experimenting with, reduces timings for small results:
mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332
1 634.94 193.03 #28.22 28.25
3 1289.25 200.82 #28.17 28.18
9 1865.79 1341.31 384.23 #28.24
27 3411.88 3257.38 816.07 #28.24
81 7518.93 9572.27 18564.51 #1991.75
243 37723.17 24465.42 61376.28 #20745.56
729 133654.08 #108648.30 226299.89 768142.04
I'm still testing it, but I think I'll push it soon...
Wow, awesome speedups!
Reasonable starting values has long been on my TODO list. I hadn't
realised that one could get away with tiny tables for all k values. I
had a combination of tables and single-limb iterations in mind.
> Now, for small sizes, computing the 4th root with mpn_root is slower
> than calling mpn_sqrt twice.
Even with the new code, this is true on a wide range (up to 1000 limbs).
Should we write special code for k=4,8,16 that repeatedly calls mpn_sqrt?
That's worth considering, sure. Asking the user to do that (and doing
it ourselves from mpz etc) does not make sense, I think.
How much speed difference is there now, for k = 4 vs sqrt(sqrt())? Is
the difference small enough that we could fix it by running the first
few iterations using plain limb arithmetic?
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list