nisse at lysator.liu.se
Mon Feb 20 11:59:55 CET 2012
Torbjorn Granlund <tg at gmplib.org> writes:
> Why would one want to support such large sizes?
It would be nice to get rid of the SQR_TOOM2_THRESHOLD size restriction
in *all* squaring code. Since that restriction makes things a bit
brittle, and causes additional complexities for the fat case. I see good
good reason besides that.
Switching the use of the allocation areas seems like a simple way to get
rid of that. I've been looking mostly at the C sqr_basecase.
> 2. One could do the shifting differently, applying the shift to the limb
> argument of addmul_1. Something like, when doing the off-diagonal
> products for up[i],
> mpn_addmul_1 (rp + 2*i+1, up + i + 1, n - i - 1,
> (up[i] << 1) + (up[i-1] >> (GMP_LIMB_BITS - 1)));
> Might be cheaper, if we can get this shifting done in parallel with
> other operations, and get a simpler carry propagation recurrency when
> adding diagonal and off-diagonal terms together.
> And then handle carry-out from the last shift a a conditional add_n.
It's not that bad. In this scheme, up[i] (shifted) is multiplied by the
n-1-i limbs up[i+1, ..., n-1], i.e., fewer as i increases. The final
off-diagonal iteration (i = n-2) then adds up[n-2] * up[n-1], so if we
shift up[n-2], we only need a conditional add of the single limb
[On use of addmul_2]:
> I beg to differ about the greatness of this approach.
Care to elaborate on why you expect it to be to slow? I imagine carry
handling for the close-to-diagonal terms up[2k] * up[2k+1] will be slow
without assembler support. Or should we postpone this discussion until
there's some code to compare?
BTW, what do you think about the mpn_addmul_1c entrypoint? Would it make
sense with addmul_2c as well? addmul_1c is declared in gmp-impl.h, and
it seems it's implemented on some x86_32 configurations and on
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel