middle products

David Harvey dmharvey at cims.nyu.edu
Wed Jun 17 23:14:37 CEST 2009


On Jun 17, 2009, at 3:36 AM, Paul Zimmermann wrote:

> those timings are quite good! I'm impressed you win already at 16  
> limbs,

I think the reason for this is that the K8 mulmid_basecase uses  
internally an addmul_2 loop that runs at 2.375 cycles/limb, whereas  
the division routines are only using addmul_1, which runs at 2.5  
cycles/limb. If the division code did two rows at a time, it would  
probably push this threshold somewhat higher (surely at least to the  
mulmid karatsuba threshold)

> and already save more than 33% at 55 limbs. Larger timings in the FFT
> range (say 100,000, 1M, 10M, 100M and 1G limbs) would be  
> interesting too.

10^4 | 4.09e+07 | 2.53e+07 |  0.619  |
10^5 | 9.68e+08 | 7.99e+08 |  0.825  |
10^6 | 1.90e+10 | 2.70e+10 |  1.421  |

That's the karatsuba asymptotics starting to dominate. At these sizes  
surely it is better to use the FFT wraparound than to use my mpn_mulmid.

david



More information about the gmp-devel mailing list