Faster integer remainder

Tue Oct 15 07:39:32 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

> One could attempt to omit q_1 and instead compute a more accurate q_0 as
>
>   q_0 = LO(u_1 * v) + u_0 + HI(u_0 * v)
>
> and remainder would then hopefully be
>
>   r = HI(f * d)
>
> That's one more multiplication, if I'm not missing something, 

Had a closer look, and this won't work. To get close to working, one
needs a more accurate recipropcal, say

  {v_1, v_0} = floor [(B^3 - 1)/d] - B^2

and add one more term to the fraction:

  q_0 = LO(u_1 * v_1) + u_0 + HI(u_0 * v_1) + HI(u_1 * v_0)

So that's yet another multiplication, and I think it's still not
accurate enough to get a correct remainder. HI(q_0 * d) will now be
close, but I think it can still be one off.

If one adds in the carry from the next lower fraction limb, LO(u0 * v1)
+ LO(u1 * v0) + HI(u0 * v0) it will probably work, but that's yet
another multiply and a couple of additions. *Maybe* one can get away
with only the LO parts.

So with a two-limb reciprocal, the multiplication is

  +---+---+---+---+
  |u_1|u_0|u_0*v_0|
  +---+---+---+---+
  |u_1*v_1|
  +---+---+---+
      |u_0*v_1|
      +-------+
      |u_1*v_0|
  +---+---+---+---+
  |q_1|q_0|q_{low}|
  +---+---+-------+

The "direct" method only needs q_0, but it needs it to be computed
accurately, with contributions from all four products (*maybe* one could
omit u_0*v0).

While udiv_rnnd_preinv needs q_1 and a crude version of q_0, and gets by
with only the top corner, omitting all products involving u_0 or v_0.

So I doubt the direct method can be faster for 2/1 modulo.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.