Better tabselect

Fri Apr 12 19:22:24 CEST 2013

From: Torbjorn Granlund <tg at gmplib.org>
Date: Fri, 12 Apr 2013 10:04:35 +0200

> David Miller <davem at davemloft.net> writes:
> 
>   The existing C code approaches 6 cycles/limb on T4, the best I can do
>   without pipelining with this new approach at 4 way unrolling is ~4.5
>   cycles/limb:
>   
> This function need to leak no side-channel information of actual operand
> contents.  Conditional execution is a no-no.  Instead, we need to for
> the mask using arithmetic operations.

It isn't really conditional execution on sparc, the resources and
timing required for the "move" instruction are constant whether the
condition matches or not.

> I am quite ignorant about VIS, but doesn't that allow 128-bit
> operations?

The largest vector element on sparc VIS is 32-bits, and the largest
vectors are 64-bits.

And even if the elements were large enough, the instructions provided
to do this violate the side-channel restriction.

What you'd do is do a VIS comparison to generate a store mask, and
then do a partial store instruction using that mask to conditionally
store the elements into the 'rp' array.

	fpcmpXXX	xxx, %g1
	stda		%fN, [rp + %g1] ASI_PSTXXX

But that conditional store would change the timing between the case
where we select and the case where we do not.

> Here us a tabselect test program:

Thanks!