rth at twiddle.net
Fri Mar 8 18:44:02 CET 2013
On 2013-03-08 03:46, Torbjorn Granlund wrote:
> I assume you mean that the destination ptr are naturally aligned, while
> the source ptrs are 32-bit aligned?
> My guess for the "jaggyness" is that of two src ptrs, you rarely strike
> a case where they are 256-bit aligned, in particular not when both are
> 256-bit aligned. But that happens much more often for 128-bit
> alignment. My copy was alignment insensitive, perhaps thanks to
> scheduling, or that it stresses the unaligned load logic less, with its
> one load-per-store?
I don't know. I do know there's something bizzare going on that's probably
needs some chip knowledge to figure out.
For instance, testing the -128 patch I posted here, and making no other change
except *adding* :128 markers to both source operands, I hoped to determine what
effect source alignment has on the loop. (This change is not generally
correct, but does work for the case of speed with specified alignment.)
The peak result is slightly *slower* than before.
with align without align
mpn_and_n mpn_nand_n mpn_and_n mpn_nand_n
10 #1.7989 1.8987 1.7990 1.8989
50 #0.9393 1.0693 0.9395 1.0694
100 #1.2491 1.3891 1.2496 1.3893
500 #0.8154 0.9753 0.8156 0.9756
1000 0.8746 1.0642 #0.7787 0.9435
5000 #1.4067 1.4939 1.5012 1.5577
10000 #1.5454 1.6702 1.5521 1.5926
More information about the gmp-devel