sqrt algorithm
Torbjörn Granlund
tg at gmplib.org
Fri Aug 7 07:15:23 UTC 2015
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
I pushed an implementation for mpn_sqrlo, it is based on mpn_mullo.
I pushed also a primitive mpn_sqrlo_basecase, based on sqr_basecase.
The range where it is faster than sqrmod is narrow:
tune/speed-cs 1-9000 -f2 mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
overhead 5.84 cycles, units of 2.86e-10 secs, CPU freq 3500.08 MHz
mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
1 #15.58 19.88 36.04
2 #19.48 20.54 39.94
4 #51.43 56.01 81.91
8 139.29 #114.17 174.01
16 453.79 #308.36 469.08
32 1495.95 #974.61 1146.67
64 4805.27 3421.11 #3096.75
128 14815.55 10885.63 #8784.71
256 44312.53 34502.41 #24808.02
512 119390.17 109101.40 #72427.37
1024 310496.06 284290.99 #209454.00
2048 868589.83 784437.11 #467387.69
4096 2268668.51 2162067.98 #1014982.18
8192 4887540.14 4951380.77 #2237012.00
I assume this is with assembly sqr_basecase and C sqrlo_basecase?
The speedup in a range is significant. With an assembly sqrlo_basecase
the range and speedup will improve.
Curious that mpn_sqrlo is slower than both functions for 8192. And
shouldn't mpn_sqrlo and mpn_sqr be within a factor 2+epsilon of
mpn_sqrmod_bnm1?
Anyway sqrlo should be improved, at least by tuning its internal
threshold.
The current powlo needs sqrlo...
Sort of...
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list