Better tabselect
David Miller
davem at davemloft.net
Fri Apr 12 19:22:24 CEST 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Fri, 12 Apr 2013 10:04:35 +0200
> David Miller <davem at davemloft.net> writes:
>
> The existing C code approaches 6 cycles/limb on T4, the best I can do
> without pipelining with this new approach at 4 way unrolling is ~4.5
> cycles/limb:
>
> This function need to leak no side-channel information of actual operand
> contents. Conditional execution is a no-no. Instead, we need to for
> the mask using arithmetic operations.
It isn't really conditional execution on sparc, the resources and
timing required for the "move" instruction are constant whether the
condition matches or not.
> I am quite ignorant about VIS, but doesn't that allow 128-bit
> operations?
The largest vector element on sparc VIS is 32-bits, and the largest
vectors are 64-bits.
And even if the elements were large enough, the instructions provided
to do this violate the side-channel restriction.
What you'd do is do a VIS comparison to generate a store mask, and
then do a partial store instruction using that mask to conditionally
store the elements into the 'rp' array.
fpcmpXXX xxx, %g1
stda %fN, [rp + %g1] ASI_PSTXXX
But that conditional store would change the timing between the case
where we select and the case where we do not.
> Here us a tabselect test program:
Thanks!
More information about the gmp-devel
mailing list