AMD-64 optimizations, some (new) code
Torbjorn Granlund
tege at swox.com
Mon Sep 26 22:56:23 CEST 2005
Ashod Nakashian <saghmos at xter.net> writes:
'speed' dumps the original command line at the top of every .gnuplot
file. So, for example open any of the attached mul_1.gnuplot files, and
you'll read the command line I used. Take the following for example:
# Generated with:
# ./speed -s 50-50000 -t 10 -C -P mul_1 mpn_mul_1.1
I looked at the raw data.
I really don't see any -CD. I do use -C (which means "per limb time"). I
don't know what -D does or -CD (if combining makes a difference.) Does
this command line agree with your assumptions?
No. But I am positive there is something fishy with the
measurements, after the analysis, after my own measurements, and
after having seen your fluctuating measurements. Sorry, it would
have been more fun if the code actually ran at close to 2 c/l.
:-)
Should I perform the tests in some other manner (say, write my own code
to do the math and use 'time' to see the overall time then do the math
to get the per limb value minus overhead). I don't know if that would
help, but it is quite disappointing to have such findings that cannot be
explained rationally.
I am sure it is possible to explain "rationally". You just need
to systematically debug what is wrong with speed. There might be
a bug in speed, or a bug in the compiler with which you compiled
speed.
On the other hand, I think 3.3 c/l is pretty decent performance,
wouldn't you agree? (putting aside that the code is arguably even faster
than that.)
3.3 is definitely good performance. The GMP development code
(scheduled for GMP 5) has similar speed (3.0). I haven't been
able to get under 3.0, although I have tried hard.
It turns out that it is possible to reach 3.0 with quite simple
code:
TEXT
ALIGN(16)
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ASM_START()
PROLOGUE(mpn_mul_1)
movq %rdx, %r11
leaq (%rsi,%rdx,8), %rsi
leaq (%rdi,%rdx,8), %rdi
negq %r11
xorl %r8d, %r8d
.Loop: movq (%rsi,%r11,8), %rax
mulq %rcx
addq %r8, %rax
movl $0, %r8d
adcq %rdx, %r8
movq %rax, (%rdi,%r11,8)
incq %r11
jne .Loop
movq %r8, %rax
ret
EPILOGUE()
--
Torbjörn
More information about the gmp-discuss
mailing list