Performance of addlsh_n and sublsh_n
tg at gmplib.org
Thu Feb 3 21:32:16 CET 2011
I think I understand your intended algorithm now.
You want to add an up limb to the left-shifted vp part, and
propagate carry to the next higher right-shifted vp part.
Your algorithm makes a lot of sense, it would not use more operations
than my current shl+shr+or code, but simplify cross-iteration carry
Having said that, AMD K8-K10 will surely be best made with mul (as you
so rightly also said).
In the meantime, I have loopmixed shrd based code. The numbers are good
for some Intel procressors, awful for AMD processors as well as Intel
Atom and VIA Nano. Results:
dnl Core c/l
dnl PNR 2.9
dnl NHM 2.8
dnl SBR 2.7
These are not bad numbers. (Only SBR might get an addmul_1 that
competes with this.)
More information about the gmp-devel