big number arithmetic

Sat Mar 24 09:18:45 CET 2007

       Dear Felix,

> Date: Thu, 22 Mar 2007 21:38:24 +0100
> From: Felix von Leitner <felix-bnis at fefe.de>
> 
> Thus spake Paul Zimmermann (Paul.Zimmermann at loria.fr):
> > I do not agree. On a Pentium M, "make tune" in gmp-4.2.1 gives:
> 
> > #define MUL_KARATSUBA_THRESHOLD          22
> 
> > which means that Karatsuba is faster than the schoolbook method up from
> > 22 words, i.e., 704 bits.
> 
> My hunch is that that's because gmp's schoolbook multiplication is not
> very fast.  I haven't polished mine yet, but it's significantly faster
> than even tomfastmath's one, which in turn beats gmp.

Can you prove those assertions with some figures? Here are some cycles
with gmp-4.2.1 on a Pentium M:

bash-3.00$ ./speed -c -f 1.6 -s 1-50 mpn_mul_basecase mpn_kara_mul_n
overhead 6.13 cycles, precision 10000 units of 5.37e-10 secs, CPU freq 1862.21 MHz
        mpn_mul_basecase mpn_kara_mul_n
1              #16.73           n/a
2              #39.46        282.03
3              #91.49        377.59
4             #149.73        446.44
6             #288.11        685.19
9             #587.47        959.92
14           #1322.67       1567.57
22            3156.00      #2983.75
35            7803.00      #6918.50

This means that a 22x22 word product (i.e., a 704-bit product) takes about 3156
cycles with the schoolbook multiplication, i.e., about 6.5 cycles per word by
word product.

Can you provide cycle numbers for tomfastmath and your code?

Regards,
Paul Zimmermann