Sandybridge addmul_N challenge
Niels Möller
nisse at lysator.liu.se
Wed Feb 22 19:16:23 CET 2012
Torbjorn Granlund <tg at gmplib.org> writes:
> I recently played with karatsuba-based addmul_2 for s390. For MUL-
> challenged machines, that might be an option. Sandybridge is OK there,
> but Bull-dozer is not. A disadvantage is that such code would be
> unsuitable for code which wants to be "side-channel silent".
I'm pretty sure one can do Karatsuba branchfree. Not sure one can do
that and do it fast (but if the alternative is branches, they're costly
too).
Evaluation:
mov u0, a
sub u1, a
lea (u0, u1), t
cmovc t, a
sbb mask, mask
Now a = |u0 - u1|, with sign in mask. Can easily be done also without
cmov, but with a longer chain of dependent instructions:
mov u0, a
sub u1, a
sbb mask, mask
xor mask, a
sub mask, a
Get b = |v0 - v1| in the same way, and arrange so that the final mask is
all ones if the term |u0 - u1| * |v0 - v1| should be subtracted, i.e., if
(u0 - u1)(v0 - v1) >= 0.
Interpolation:
Add in the signed term (u0 - u1) * (v0 - v1) to the three limbs
<r3,r2,r1>, using two's complement:
mov a, %rax
mul b
xor mask, %rax
xor mask, %rdx
bt $0, mask C Set carry from mask. Any better way?
C <mask, %rdx, %rax> + c = - (u0 - u1) (v0 - v1), in two's complement.
adc %rax, r1
adc %rdx, r2
adc mask, r3
Alternatively, one could use (u0 + u1) * (v0 + v1), with a couple of
conditional adds instead.
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list