big number arithmetic

Paul Zimmermann Paul.Zimmermann at loria.fr
Sat Mar 24 10:44:30 CET 2007


       Felix,

> Date: Sat, 24 Mar 2007 10:21:28 +0100
> From: Felix von Leitner <felix-bnis at fefe.de>
> 
> > Can you prove those assertions with some figures? Here are some cycles
> > with gmp-4.2.1 on a Pentium M:
> 
> I don't have a Pentium M handy right now, but here are my measurements
> for a 1024 bit multiplicate (1024x1024->2045 bits) on my Core 2 Duo
> notebook (all 32-bit mode):
> 
> my code:	7716 cycles
> gmp:		9600 cycles
> tomfastmath:	9156 cycles
> 
> > Can you provide cycle numbers for tomfastmath and your code?
> 
> 1024 bits == 32 words.
> 
> I called mpz_mul for gmp and fp_mul for tomfastmath.  In my code I'm
> cheating a little because I directly call the 1024x1024 bit multiply
> routine.  Once I finish my library, there will be added overhead to
> check that there is enough space allocated.
> 
> I implemented Karatsuba, too, but that implementation is comparatively
> unoptimized so it's not fair to use it for comparison.
> 
> Felix

I cannot reproduce your GMP timings. Using gmp-4.2.1 on a Core 2, I get with
the default installation:

% ./speed.orig -c -f 2 -s 1-64 mpn_mul_basecase mpn_kara_mul_n
overhead 6.15 cycles, precision 10000 units of 3.76e-10 secs, CPU freq 2658.07 MHz
        mpn_mul_basecase mpn_kara_mul_n
1              #22.44           n/a
2              #44.13        260.24
4             #130.12        452.92
8             #451.74        935.00
16           #1698.57       2356.00
32           #6565.00       6920.00
64           25480.00     #23960.00

and with Pierrick Gaudry's assembly code:

% ./speed -c -f 2 -s 1-64 mpn_mul_basecase mpn_kara_mul_n
overhead 6.16 cycles, precision 10000 units of 3.76e-10 secs, CPU freq 2658.07 MHz
        mpn_mul_basecase mpn_kara_mul_n
1               #8.22           n/a
2              #20.97        221.22
4             #123.60        410.77
8             #490.87        691.88
16           #1666.67       1801.67
32            6555.00      #5480.00
64           25450.00     #17110.00

Since GMP uses 64-bit arithmetic (it would be silly to use only 32 bits on a
64-bit computer), multiplying two 1024-bit numbers corresponds to 16 words,
i.e., about 1700 cycles with the original gmp-4.2.1, and 1667 cycles with
Gaudry's patch.

>From your figures, it thus seems that tomfastmath is slower than GMP by a
factor 5.5, and your current code by a factor 4.6, for multiplying two
1024-bit numbers.

Paul Zimmermann















More information about the gmp-discuss mailing list