middle products
David Harvey
dmharvey at cims.nyu.edu
Wed Jun 17 23:14:37 CEST 2009
On Jun 17, 2009, at 3:36 AM, Paul Zimmermann wrote:
> those timings are quite good! I'm impressed you win already at 16
> limbs,
I think the reason for this is that the K8 mulmid_basecase uses
internally an addmul_2 loop that runs at 2.375 cycles/limb, whereas
the division routines are only using addmul_1, which runs at 2.5
cycles/limb. If the division code did two rows at a time, it would
probably push this threshold somewhat higher (surely at least to the
mulmid karatsuba threshold)
> and already save more than 33% at 55 limbs. Larger timings in the FFT
> range (say 100,000, 1M, 10M, 100M and 1G limbs) would be
> interesting too.
10^4 | 4.09e+07 | 2.53e+07 | 0.619 |
10^5 | 9.68e+08 | 7.99e+08 | 0.825 |
10^6 | 1.90e+10 | 2.70e+10 | 1.421 |
That's the karatsuba asymptotics starting to dominate. At these sizes
surely it is better to use the FFT wraparound than to use my mpn_mulmid.
david
More information about the gmp-devel
mailing list