big number arithmetic
Paul Zimmermann
Paul.Zimmermann at loria.fr
Sat Mar 24 10:44:30 CET 2007
Felix,
> Date: Sat, 24 Mar 2007 10:21:28 +0100
> From: Felix von Leitner <felix-bnis at fefe.de>
>
> > Can you prove those assertions with some figures? Here are some cycles
> > with gmp-4.2.1 on a Pentium M:
>
> I don't have a Pentium M handy right now, but here are my measurements
> for a 1024 bit multiplicate (1024x1024->2045 bits) on my Core 2 Duo
> notebook (all 32-bit mode):
>
> my code: 7716 cycles
> gmp: 9600 cycles
> tomfastmath: 9156 cycles
>
> > Can you provide cycle numbers for tomfastmath and your code?
>
> 1024 bits == 32 words.
>
> I called mpz_mul for gmp and fp_mul for tomfastmath. In my code I'm
> cheating a little because I directly call the 1024x1024 bit multiply
> routine. Once I finish my library, there will be added overhead to
> check that there is enough space allocated.
>
> I implemented Karatsuba, too, but that implementation is comparatively
> unoptimized so it's not fair to use it for comparison.
>
> Felix
I cannot reproduce your GMP timings. Using gmp-4.2.1 on a Core 2, I get with
the default installation:
% ./speed.orig -c -f 2 -s 1-64 mpn_mul_basecase mpn_kara_mul_n
overhead 6.15 cycles, precision 10000 units of 3.76e-10 secs, CPU freq 2658.07 MHz
mpn_mul_basecase mpn_kara_mul_n
1 #22.44 n/a
2 #44.13 260.24
4 #130.12 452.92
8 #451.74 935.00
16 #1698.57 2356.00
32 #6565.00 6920.00
64 25480.00 #23960.00
and with Pierrick Gaudry's assembly code:
% ./speed -c -f 2 -s 1-64 mpn_mul_basecase mpn_kara_mul_n
overhead 6.16 cycles, precision 10000 units of 3.76e-10 secs, CPU freq 2658.07 MHz
mpn_mul_basecase mpn_kara_mul_n
1 #8.22 n/a
2 #20.97 221.22
4 #123.60 410.77
8 #490.87 691.88
16 #1666.67 1801.67
32 6555.00 #5480.00
64 25450.00 #17110.00
Since GMP uses 64-bit arithmetic (it would be silly to use only 32 bits on a
64-bit computer), multiplying two 1024-bit numbers corresponds to 16 words,
i.e., about 1700 cycles with the original gmp-4.2.1, and 1667 cycles with
Gaudry's patch.
>From your figures, it thus seems that tomfastmath is slower than GMP by a
factor 5.5, and your current code by a factor 4.6, for multiplying two
1024-bit numbers.
Paul Zimmermann
More information about the gmp-discuss
mailing list