GMP 4.3 multiplication performance

David Harvey dmharvey at cims.nyu.edu
Tue Jun 2 16:38:38 CEST 2009


On May 31, 2009, at 1:35 PM, Torbjorn Granlund wrote:

> While working on multiplication improvements for GMP 4.4, I realized
> that we are doing something quite silly for unbalanced operands in GMP
> 4.3.  For some machines, GMP 4.3 might actually be substantially  
> slower
> than GMP 4.2!
>
> The current multiply code works roughly like this:
>
>   if (operands have same size)
>     use mpn_sqr_n or mpn_mul_n
>
>   if (schoolbook operand range)
>     use mpn_mul_basecase
>
>   if (fft range)
>     use mpn_mul_fft_full
>
>   else use toom42, toom32, or toom22, or some combination of these.
>
> The problem is that for any unbalanced operands between the schoolbook
> range and the fft range, we will basically use karatsuba's algorithm,
> not any toom variant.  This is pretty awful except for smallish
> operands.
>
> The use of unbalanced toom/karatsuba was introduced with GMP 4.3,  
> and is
> a great improvement for many operands.  Unfortunately, it is done too
> naively.
>
> Some measurements demonstrating the effects:
>
>          mpn_mul.7900  mpn_mul.7901
> 7900     #11635165.00   20355963.00
>          mpn_mul.8000  mpn_mul.8001
> 8000     #10745592.00   10750716.00
>
> I.e., a 7900-limb operand times a 7900-lim operand takes 11635165  
> cycles
> on an athlon 64.  But a I.e., a 7901-limb operand times a 7900-lim
> operand takes 20355963 cycles.
>
> In the FFT range, as for 8000 limbs, things are back to normal.   
> The FFT
> code handles arbitrarily unbalanced operands (although its performance
> degrades when the operands are very unbalanced).
>
> (Note that a balanced 7900-limb product is also slower than a 8000- 
> limb
> balanced product.  That's another flaw.)

One case in which I am particularly interested is the 2n * n  
unbalanced multiplication. This should be no more expensive than  
twice the cost of an n * n multiplication, plus O(n) overhead that  
should be negligible for large n.

But here is what I get for current 4.3 stable (changeset 85a941ada149  
checked out this morning, on core 2 duo, mac OS X, after running  
tuneup, using the attached python script). Columns are n, cost for  
2n*n mult, cost for two n*n mults, ratio.

   5            262             300      0.87
   6            362             391      0.93
   7            489             541      0.90
   8            609             673      0.90
   9            782             821      0.95
  10            937            1016      0.92
  11           1141            1200      0.95
  12           1338            1386      0.97
  14           1814            1905      0.95
  15           2098            2179      0.96
  17           2718            2748      0.99
  18           2976            3096      0.96
  20           3672            3804      0.97
  22           4436            4517      0.98
  25           5820            5550      1.05
  27           6672            6042      1.10
  30           7386            7096      1.04
  33           9024            8864      1.02
  37          10776           10944      0.98
  40          12000           12240      0.98
  44          14124           14376      0.98
  49          17040           17040      1.00
  54          18624           19284      0.97
  59          21816           22704      0.96
  65          25620           27960      0.92
  72          29628           31704      0.93
  79          34488           36240      0.95
  87          40884           41448      0.99
  95          44904           49368      0.91
105          53244           57600      0.92
116          62220           64464      0.97
127          70224           77112      0.91
140          84048           85896      0.98
154          97248          103296      0.94
170         115380          119640      0.96
187         130536          132984      0.98
205         154992          155856      0.99
226         180684          183744      0.98
248         204144          206400      0.99
273         246600          250224      0.99
301         283752          280392      1.01
331         339936          323400      1.05
364         382116          363000      1.05
400         444612          424992      1.05
440         504624          503184      1.00
485         602100          565200      1.07
533         703584          655944      1.07
586         797592          747144      1.07
645         954792          875832      1.09
710        1137924         1002576      1.14
781        1358592         1169976      1.16
859        1527456         1292976      1.18
945        1768872         1519728      1.16
1039       2061648         1741776      1.18
1143       2374656         2010936      1.18
1258       2616612         2222496      1.18
1384       3050760         2601168      1.17
1522       3431532         3013944      1.14
1674       3977448         3441648      1.16
1842       4593672         3975576      1.16
2026       5278488         4562856      1.16
2228       6131268         5302056      1.16
2451       7414524         5940816      1.25
2697       8650152         6959136      1.24
2966       9065088         7724040      1.17
3263       8020428         9060024      0.89
3589       8907216        10528536      0.85
3948       9791028        11693568      0.84
4343      12739212        13591536      0.94
4777      12952488        15351120      0.84
5255      15334704        18207840      0.84
5781      15569076        20097816      0.77
6359      19425096        20106096      0.97
6995      19531848        24842040      0.79
7694      21518000        29708568      0.72
8464      21552000        31327464      0.69
9310      23510000        39621768      0.59

The numbers around 700-3000 limbs are not good.

david

-------------- next part --------------
A non-text attachment was scrubbed...
Name: script.py
Type: text/x-python-script
Size: 402 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20090602/cad5c505/attachment.bin>
-------------- next part --------------




More information about the gmp-devel mailing list