sqrt algorithm

Niels Möller nisse at lysator.liu.se
Mon Aug 10 16:06:11 UTC 2015


"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

> I pushed an implementation for mpn_sqrlo, it is based on mpn_mullo.
> I pushed also a primitive mpn_sqrlo_basecase, based on sqr_basecase.

Cool!

> The range where it is faster than sqrmod is narrow:

> tune/speed-cs 1-9000 -f2 mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
> overhead 5.84 cycles, units of 2.86e-10 secs, CPU freq 3500.08 MHz
>               mpn_sqr     mpn_sqrlo mpn_sqrmod_bnm1
> 1              #15.58         19.88         36.04
> 2              #19.48         20.54         39.94
> 4              #51.43         56.01         81.91
> 8              139.29       #114.17        174.01
> 16             453.79       #308.36        469.08
> 32            1495.95       #974.61       1146.67
> 64            4805.27       3421.11      #3096.75

But for the cases where they compete, typically where the high half of
the product is known, one should maybe add the (linear) reconstruction
cost one will have for sqrmod_bnm1. Which is O(n), I think, but which
might push the sqrlo range a bit wider. (And with assembly, I imagine
mpn_sqrlo ought to be faster than mpn_sqr for the smallest sizes).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list