# Sandybridge addmul_N challenge

Niels Möller nisse at lysator.liu.se
Wed Feb 22 19:16:23 CET 2012

```Torbjorn Granlund <tg at gmplib.org> writes:

> I recently played with karatsuba-based addmul_2 for s390.  For MUL-
> challenged machines, that might be an option.  Sandybridge is OK there,
> but Bull-dozer is not.  A disadvantage is that such code would be
> unsuitable for code which wants to be "side-channel silent".

I'm pretty sure one can do Karatsuba branchfree. Not sure one can do
that and do it fast (but if the alternative is branches, they're costly
too).

Evaluation:

mov	u0, a
sub	u1, a
lea	(u0, u1), t
cmovc	t, a
sbb	mask, mask

Now a = |u0 - u1|, with sign in mask. Can easily be done also without
cmov, but with a longer chain of dependent instructions:

mov	u0, a
sub	u1, a
sbb	mask, mask
xor	mask, a
sub	mask, a

Get b = |v0 - v1| in the same way, and arrange so that the final mask is
all ones if the term |u0 - u1| * |v0 - v1| should be subtracted, i.e., if
(u0 - u1)(v0 - v1) >= 0.

Interpolation:

Add in the signed term (u0 - u1) * (v0 - v1) to the three limbs
<r3,r2,r1>, using two's complement:

mov	a, %rax
mul	b
xor	mask, %rax
xor	mask, %rdx
bt	\$0, mask	C Set carry from mask. Any better way?
C <mask, %rdx, %rax> + c = - (u0 - u1) (v0 - v1), in two's complement.
adc	%rax, r1
adc	%rdx, r2
adc	mask, r3

Alternatively, one could use (u0 + u1) * (v0 + v1), with a couple of
conditional adds instead.

/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
```

More information about the gmp-devel mailing list