speed of unbalanced multiplication

Thu Feb 7 10:08:56 CET 2013

       Marco,

> After the patch, only changing the way tune/speed allocate memory for the
> operands, their results are comparable:
> 
> $ tune/speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
> overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
> freq 3200.20 MHz
>             mpn_mul_n       mpn_mul     mpn_mul_n       mpn_mul
> 800000    0.644460000   0.649850000   0.634180000  #0.631246000

I confirm on my side:

frite% ./speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
overhead 0.000000008 secs, precision 10000 units of 1.25e-09 secs, CPU freq 800.00 MHz
            mpn_mul_n       mpn_mul     mpn_mul_n       mpn_mul
800000    0.660041000  #0.656041000   0.660041000   0.660041000

> There is a side-effect: to measure the speed of unbalanced multiplication,
> eg ###### x ##, you used
> 
> tune/speed -s ## mpn_mul.######
> 
> now the roles of the two parameters are swapped, and you have to write
> 
> tune/speed -s ###### mpn_mul.##
> 
> The transposed version of the matrix of times I suggested in the previous
> message, can now be obtained with the following:
> 
> $ tune/speed -s 800000-1200000 -t 100000 mpn_mul.400000 mpn_mul.500000
> mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
> overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
> freq 3200.23 MHz
>         mul.400000 mul.500000 mul.600000 mul.700000 mul.800000 mpn_mul_n
> 800000  #0.430677   0.433753   0.515757   0.535098   0.630629  0.645156
> 900000  #0.431647   0.521545   0.532850   0.638642   0.644031  0.642488
> 1000000 #0.522817   0.527930   0.633221   0.646514   0.648290  0.708614
> 1100000 #0.516791   0.648199   0.640584   0.651306   0.681567  0.857438
> 1200000  0.647544  #0.640084   0.652030   0.675864   0.690255  0.950390

I confirm too:

frite% ./speed -s 800000-1200000 -t 100000 mpn_mul.400000 mpn_mul.500000 mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
overhead 0.000000008 secs, precision 10000 units of 1.25e-09 secs, CPU freq 800.00 MHz
        mpn_mul.400000 mpn_mul.500000 mpn_mul.600000 mpn_mul.700000 mpn_mul.800000     mpn_mul_n
800000   #0.432027000   0.448028000   0.524033000   0.528033000   0.664042000   0.668041000
900000   #0.444028000   0.532033000   0.532033000   0.668042000   0.648040000   0.648040000
1000000   #0.524033000   0.528033000   0.660041000   0.648040000   0.656041000   0.704044000
1100000   #0.524033000   0.660042000   0.664041000   0.652041000   0.676042000   0.868054000
1200000    0.656041000   0.656041000  #0.652041000   0.680043000   0.724045000   0.968060000

> There still are problems of non-monotonicity (12..x5.. is slightly faster
> than both the more unbalanced 12..x4.. and the less unbalanced 11..x5..),
> but at least we isolated the issue.
> 
> If other developers does not dislike the changed meaning of the .<r>
> parameter to mpn_mul, this patch can be applied to the main repo...
> 
> Opinions?

I like your change to the meaning of the .r parameter, I find the new meaning
more natural, with -s setting the largest size.

Paul