speed of unbalanced multiplication
Zimmermann Paul
Paul.Zimmermann at loria.fr
Thu Feb 7 10:08:56 CET 2013
Marco,
> After the patch, only changing the way tune/speed allocate memory for the
> operands, their results are comparable:
>
> $ tune/speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
> overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
> freq 3200.20 MHz
> mpn_mul_n mpn_mul mpn_mul_n mpn_mul
> 800000 0.644460000 0.649850000 0.634180000 #0.631246000
I confirm on my side:
frite% ./speed -s 800000 mpn_mul_n mpn_mul mpn_mul_n mpn_mul
overhead 0.000000008 secs, precision 10000 units of 1.25e-09 secs, CPU freq 800.00 MHz
mpn_mul_n mpn_mul mpn_mul_n mpn_mul
800000 0.660041000 #0.656041000 0.660041000 0.660041000
> There is a side-effect: to measure the speed of unbalanced multiplication,
> eg ###### x ##, you used
>
> tune/speed -s ## mpn_mul.######
>
> now the roles of the two parameters are swapped, and you have to write
>
> tune/speed -s ###### mpn_mul.##
>
> The transposed version of the matrix of times I suggested in the previous
> message, can now be obtained with the following:
>
> $ tune/speed -s 800000-1200000 -t 100000 mpn_mul.400000 mpn_mul.500000
> mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
> overhead 0.000000002 secs, precision 10000 units of 3.12e-10 secs, CPU
> freq 3200.23 MHz
> mul.400000 mul.500000 mul.600000 mul.700000 mul.800000 mpn_mul_n
> 800000 #0.430677 0.433753 0.515757 0.535098 0.630629 0.645156
> 900000 #0.431647 0.521545 0.532850 0.638642 0.644031 0.642488
> 1000000 #0.522817 0.527930 0.633221 0.646514 0.648290 0.708614
> 1100000 #0.516791 0.648199 0.640584 0.651306 0.681567 0.857438
> 1200000 0.647544 #0.640084 0.652030 0.675864 0.690255 0.950390
I confirm too:
frite% ./speed -s 800000-1200000 -t 100000 mpn_mul.400000 mpn_mul.500000 mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
overhead 0.000000008 secs, precision 10000 units of 1.25e-09 secs, CPU freq 800.00 MHz
mpn_mul.400000 mpn_mul.500000 mpn_mul.600000 mpn_mul.700000 mpn_mul.800000 mpn_mul_n
800000 #0.432027000 0.448028000 0.524033000 0.528033000 0.664042000 0.668041000
900000 #0.444028000 0.532033000 0.532033000 0.668042000 0.648040000 0.648040000
1000000 #0.524033000 0.528033000 0.660041000 0.648040000 0.656041000 0.704044000
1100000 #0.524033000 0.660042000 0.664041000 0.652041000 0.676042000 0.868054000
1200000 0.656041000 0.656041000 #0.652041000 0.680043000 0.724045000 0.968060000
> There still are problems of non-monotonicity (12..x5.. is slightly faster
> than both the more unbalanced 12..x4.. and the less unbalanced 11..x5..),
> but at least we isolated the issue.
>
> If other developers does not dislike the changed meaning of the .<r>
> parameter to mpn_mul, this patch can be applied to the main repo...
>
> Opinions?
I like your change to the meaning of the .r parameter, I find the new meaning
more natural, with -s setting the largest size.
Paul
More information about the gmp-devel
mailing list