speed of unbalanced multiplication

Wed Feb 6 17:59:44 CET 2013

Ciao Paul!

Il Dom, 27 Gennaio 2013 10:09 am, Zimmermann Paul ha scritto:
> In the FTT range, multiplying n limbs by m limbs should not be more
> expensive then multiplying two numbers of (n+m)/2 limbs.

Of course. With current implementation, unbalanced multiplications need
some more memory and a few additions/subtractions, but this should not
give a measurable slow-down. The "matrix" obtained with

$ tune/speed -s 400000-800000 -t 100000 mpn_mul.800000 mpn_mul.900000
mpn_mul.1000000 mpn_mul.1100000 mpn_mul.1200000

shows that times are not as monotonic as desired, but "unbalancement" does
not really have an influence.

> In that case, we get a 6.5% slowdown with respect to that strategy.

I think that the culprit is the tune/speed program, but I'm not able to
correct it. I just tested the attached patch. After patching, the results
are:

$ tune/speed -s 800000-1000000 -t 100000 mpn_mul_n mpn_mul mpn_mul_bal
overhead 0.000000000 secs, precision 10000 units of 3.13e-11 secs, CPU
freq 31990.26 MHz
            mpn_mul_n       mpn_mul   mpn_mul_bal
800000    0.646571000   0.682834000  #0.632961000
900000   #0.652178000   0.678564000   0.655979000
1000000   0.710674000   0.740998000  #0.702508000

As you can see mpn_mul_n and mpn_mul_bal are comparable, and mpn_mul is
always slower. All the three functions measure the time to multiply two
numbers of the same size. mpn_mul_bal and mpn_mul measure the same
function, but the first uses the same macro that tune/speed uses for
mpn_mul_n...

Regards,
m

-- 
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed.diff
Type: text/x-patch
Size: 1382 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130206/2c58991a/attachment.bin>