Faster integer remainder
Niels Möller
nisse at lysator.liu.se
Tue Oct 15 07:39:32 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
> One could attempt to omit q_1 and instead compute a more accurate q_0 as
>
> q_0 = LO(u_1 * v) + u_0 + HI(u_0 * v)
>
> and remainder would then hopefully be
>
> r = HI(f * d)
>
> That's one more multiplication, if I'm not missing something,
Had a closer look, and this won't work. To get close to working, one
needs a more accurate recipropcal, say
{v_1, v_0} = floor [(B^3 - 1)/d] - B^2
and add one more term to the fraction:
q_0 = LO(u_1 * v_1) + u_0 + HI(u_0 * v_1) + HI(u_1 * v_0)
So that's yet another multiplication, and I think it's still not
accurate enough to get a correct remainder. HI(q_0 * d) will now be
close, but I think it can still be one off.
If one adds in the carry from the next lower fraction limb, LO(u0 * v1)
+ LO(u1 * v0) + HI(u0 * v0) it will probably work, but that's yet
another multiply and a couple of additions. *Maybe* one can get away
with only the LO parts.
So with a two-limb reciprocal, the multiplication is
+---+---+---+---+
|u_1|u_0|u_0*v_0|
+---+---+---+---+
|u_1*v_1|
+---+---+---+
|u_0*v_1|
+-------+
|u_1*v_0|
+---+---+---+---+
|q_1|q_0|q_{low}|
+---+---+-------+
The "direct" method only needs q_0, but it needs it to be computed
accurately, with contributions from all four products (*maybe* one could
omit u_0*v0).
While udiv_rnnd_preinv needs q_1 and a crude version of q_0, and gets by
with only the top corner, omitting all products involving u_0 or v_0.
So I doubt the direct method can be faster for 2/1 modulo.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list