Cancellation with hgcd / unbalanced mulmod_bnm1
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Sat Nov 12 11:09:32 CET 2011
Ciao Niels!
Il Ven, 11 Novembre 2011 4:21 pm, Niels ha scritto:
> The input sizes to mpn_mulmod_bnm1 is then 3n x n. How well suited is
> current mpn_mulmod_bnm1 to these unbalanced operands?
The cost of current mpn_mulmod_bnm1 implementation does not really depend
on how much balanced the operands are.
It computes two products: mod m'=B^{k/2}+1, and mod m"=B^{k/2}-1. In your
scenario, it will further reduce a, resulting in two 3/2 n x n mulmods; it
will detect that b is short and will not reduce it, this is the only
saving related to unbalancement.
> Another way is to split a as a = a_3 B^{3n} + a_2 B^{2n} + a_1 B^n +
[...]
> (a_1 B^n + a_0) * b // mpn_mul, 2n x n
> + B^2 (a_2 b mod B^n) // mpn_mullo, n x n
> Another option is to use the modulo m = B^{3n} - B^{2n}, and do the
[...]
> (a_1 B^n + a_0) * b // mpn_mul, 2n x n
> + B^2 (a_2 b mod (B^n - 1)) // mpn_mulmod_bnm1, n x n
> Will that be better or worse than an unbalanced 3n x n mpn_mulmod_bnm1?
> In the fft range, it *should* be worse, since the mpn_mul will be done
> as an mpn_mulmod_bnm1 of size 3n. But what about smaller sizes?
Exactly, in the FFT range the cost of the mpn_mul 2n x n and the cost of
the mpn_mulmod_bnm1 3n x n are the same. Near the threshold the difference
is small, i.e. probably smaller than the time required by mullo.
There is a similar choice in invertappr, where you need only 2n limbs from
a 2n x n product. The threshold in this case is INV_MULMOD_BNM1_THRESHOLD.
It varies from 20 to 100 limbs. But I didn't try the mul(nxn)+B*mullo(nxn)
trick.
> Any other tricks?
Directly using FFT modulo m = B^{3n} + 1 should be slower than mulmod_bnm1...
Regards,
m
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list