GMP 4.3 multiplication performance
David Harvey
dmharvey at cims.nyu.edu
Tue Jun 2 16:38:38 CEST 2009
On May 31, 2009, at 1:35 PM, Torbjorn Granlund wrote:
> While working on multiplication improvements for GMP 4.4, I realized
> that we are doing something quite silly for unbalanced operands in GMP
> 4.3. For some machines, GMP 4.3 might actually be substantially
> slower
> than GMP 4.2!
>
> The current multiply code works roughly like this:
>
> if (operands have same size)
> use mpn_sqr_n or mpn_mul_n
>
> if (schoolbook operand range)
> use mpn_mul_basecase
>
> if (fft range)
> use mpn_mul_fft_full
>
> else use toom42, toom32, or toom22, or some combination of these.
>
> The problem is that for any unbalanced operands between the schoolbook
> range and the fft range, we will basically use karatsuba's algorithm,
> not any toom variant. This is pretty awful except for smallish
> operands.
>
> The use of unbalanced toom/karatsuba was introduced with GMP 4.3,
> and is
> a great improvement for many operands. Unfortunately, it is done too
> naively.
>
> Some measurements demonstrating the effects:
>
> mpn_mul.7900 mpn_mul.7901
> 7900 #11635165.00 20355963.00
> mpn_mul.8000 mpn_mul.8001
> 8000 #10745592.00 10750716.00
>
> I.e., a 7900-limb operand times a 7900-lim operand takes 11635165
> cycles
> on an athlon 64. But a I.e., a 7901-limb operand times a 7900-lim
> operand takes 20355963 cycles.
>
> In the FFT range, as for 8000 limbs, things are back to normal.
> The FFT
> code handles arbitrarily unbalanced operands (although its performance
> degrades when the operands are very unbalanced).
>
> (Note that a balanced 7900-limb product is also slower than a 8000-
> limb
> balanced product. That's another flaw.)
One case in which I am particularly interested is the 2n * n
unbalanced multiplication. This should be no more expensive than
twice the cost of an n * n multiplication, plus O(n) overhead that
should be negligible for large n.
But here is what I get for current 4.3 stable (changeset 85a941ada149
checked out this morning, on core 2 duo, mac OS X, after running
tuneup, using the attached python script). Columns are n, cost for
2n*n mult, cost for two n*n mults, ratio.
5 262 300 0.87
6 362 391 0.93
7 489 541 0.90
8 609 673 0.90
9 782 821 0.95
10 937 1016 0.92
11 1141 1200 0.95
12 1338 1386 0.97
14 1814 1905 0.95
15 2098 2179 0.96
17 2718 2748 0.99
18 2976 3096 0.96
20 3672 3804 0.97
22 4436 4517 0.98
25 5820 5550 1.05
27 6672 6042 1.10
30 7386 7096 1.04
33 9024 8864 1.02
37 10776 10944 0.98
40 12000 12240 0.98
44 14124 14376 0.98
49 17040 17040 1.00
54 18624 19284 0.97
59 21816 22704 0.96
65 25620 27960 0.92
72 29628 31704 0.93
79 34488 36240 0.95
87 40884 41448 0.99
95 44904 49368 0.91
105 53244 57600 0.92
116 62220 64464 0.97
127 70224 77112 0.91
140 84048 85896 0.98
154 97248 103296 0.94
170 115380 119640 0.96
187 130536 132984 0.98
205 154992 155856 0.99
226 180684 183744 0.98
248 204144 206400 0.99
273 246600 250224 0.99
301 283752 280392 1.01
331 339936 323400 1.05
364 382116 363000 1.05
400 444612 424992 1.05
440 504624 503184 1.00
485 602100 565200 1.07
533 703584 655944 1.07
586 797592 747144 1.07
645 954792 875832 1.09
710 1137924 1002576 1.14
781 1358592 1169976 1.16
859 1527456 1292976 1.18
945 1768872 1519728 1.16
1039 2061648 1741776 1.18
1143 2374656 2010936 1.18
1258 2616612 2222496 1.18
1384 3050760 2601168 1.17
1522 3431532 3013944 1.14
1674 3977448 3441648 1.16
1842 4593672 3975576 1.16
2026 5278488 4562856 1.16
2228 6131268 5302056 1.16
2451 7414524 5940816 1.25
2697 8650152 6959136 1.24
2966 9065088 7724040 1.17
3263 8020428 9060024 0.89
3589 8907216 10528536 0.85
3948 9791028 11693568 0.84
4343 12739212 13591536 0.94
4777 12952488 15351120 0.84
5255 15334704 18207840 0.84
5781 15569076 20097816 0.77
6359 19425096 20106096 0.97
6995 19531848 24842040 0.79
7694 21518000 29708568 0.72
8464 21552000 31327464 0.69
9310 23510000 39621768 0.59
The numbers around 700-3000 limbs are not good.
david
-------------- next part --------------
A non-text attachment was scrubbed...
Name: script.py
Type: text/x-python-script
Size: 402 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20090602/cad5c505/attachment.bin>
-------------- next part --------------
More information about the gmp-devel
mailing list