nisse at lysator.liu.se
Mon Aug 10 16:06:11 UTC 2015
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> I pushed an implementation for mpn_sqrlo, it is based on mpn_mullo.
> I pushed also a primitive mpn_sqrlo_basecase, based on sqr_basecase.
> The range where it is faster than sqrmod is narrow:
> tune/speed-cs 1-9000 -f2 mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
> overhead 5.84 cycles, units of 2.86e-10 secs, CPU freq 3500.08 MHz
> mpn_sqr mpn_sqrlo mpn_sqrmod_bnm1
> 1 #15.58 19.88 36.04
> 2 #19.48 20.54 39.94
> 4 #51.43 56.01 81.91
> 8 139.29 #114.17 174.01
> 16 453.79 #308.36 469.08
> 32 1495.95 #974.61 1146.67
> 64 4805.27 3421.11 #3096.75
But for the cases where they compete, typically where the high half of
the product is known, one should maybe add the (linear) reconstruction
cost one will have for sqrmod_bnm1. Which is O(n), I think, but which
might push the sqrlo range a bit wider. (And with assembly, I imagine
mpn_sqrlo ought to be faster than mpn_sqr for the smallest sizes).
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel