sqrt algorithm

Torbjörn Granlund tg at gmplib.org
Fri Aug 7 07:15:23 UTC 2015


"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

  I pushed an implementation for mpn_sqrlo, it is based on mpn_mullo.
  I pushed also a primitive mpn_sqrlo_basecase, based on sqr_basecase.
  
  The range where it is faster than sqrmod is narrow:
  
  tune/speed-cs 1-9000 -f2 mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
  overhead 5.84 cycles, units of 2.86e-10 secs, CPU freq 3500.08 MHz
                mpn_sqr     mpn_sqrlo mpn_sqrmod_bnm1
  1              #15.58         19.88         36.04
  2              #19.48         20.54         39.94
  4              #51.43         56.01         81.91
  8              139.29       #114.17        174.01
  16             453.79       #308.36        469.08
  32            1495.95       #974.61       1146.67
  64            4805.27       3421.11      #3096.75
  128          14815.55      10885.63      #8784.71
  256          44312.53      34502.41     #24808.02
  512         119390.17     109101.40     #72427.37
  1024        310496.06     284290.99    #209454.00
  2048        868589.83     784437.11    #467387.69
  4096       2268668.51    2162067.98   #1014982.18
  8192       4887540.14    4951380.77   #2237012.00
  
I assume this is with assembly sqr_basecase and C sqrlo_basecase?

The speedup in a range is significant.  With an assembly sqrlo_basecase
the range and speedup will improve.

Curious that mpn_sqrlo is slower than both functions for 8192.  And
shouldn't mpn_sqrlo and mpn_sqr be within a factor 2+epsilon of
mpn_sqrmod_bnm1?

  Anyway sqrlo should be improved, at least by tuning its internal
  threshold.
  
  The current powlo needs sqrlo...
  
Sort of...

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list