speed of unbalanced multiplication

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Thu Feb 7 09:53:59 CET 2013


Ciao Paul,

I elaborated a better patch (forget the experimental one I sent yesterday)
to the tune/speed program, it is attached.

Before the patch, mpn_mul seem sensibly slower than mpn_mul_n:

$ tune/speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
freq 3205.77 MHz
            mpn_mul_n       mpn_mul     mpn_mul_n       mpn_mul
800000    0.646153000   0.673501000  #0.643274000   0.686486000


After the patch, only changing the way tune/speed allocate memory for the
operands, their results are comparable:

$ tune/speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
freq 3200.20 MHz
            mpn_mul_n       mpn_mul     mpn_mul_n       mpn_mul
800000    0.644460000   0.649850000   0.634180000  #0.631246000

There is a side-effect: to measure the speed of unbalanced multiplication,
eg ###### x ##, you used

tune/speed -s ## mpn_mul.######

now the roles of the two parameters are swapped, and you have to write

tune/speed -s ###### mpn_mul.##

The transposed version of the matrix of times I suggested in the previous
message, can now be obtained with the following:

$ tune/speed -s 800000-1200000 -t 100000 mpn_mul.400000 mpn_mul.500000
mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
freq 3200.23 MHz
        mul.400000 mul.500000 mul.600000 mul.700000 mul.800000 mpn_mul_n
800000  #0.430677   0.433753   0.515757   0.535098   0.630629  0.645156
900000  #0.431647   0.521545   0.532850   0.638642   0.644031  0.642488
1000000 #0.522817   0.527930   0.633221   0.646514   0.648290  0.708614
1100000 #0.516791   0.648199   0.640584   0.651306   0.681567  0.857438
1200000  0.647544  #0.640084   0.652030   0.675864   0.690255  0.950390

There still are problems of non-monotonicity (12..x5.. is slightly faster
than both the more unbalanced 12..x4.. and the less unbalanced 11..x5..),
but at least we isolated the issue.

If other developers does not dislike the changed meaning of the .<r>
parameter to mpn_mul, this patch can be applied to the main repo...

Opinions?

Best regards,
m

-- 
http://bodrato.it/software/combinatorics.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed.diff
Type: text/x-patch
Size: 2169 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130207/97e8fd4c/attachment.bin>


More information about the gmp-devel mailing list