# div_qr_1n_pi1

Niels Möller nisse at lysator.liu.se
Thu Jun 3 20:30:25 UTC 2021

```nisse at lysator.liu.se (Niels Möller) writes:

> The current (METHOD == 2) code some somethign similar, it conditionally
> adds in B2 at the right place in the next iteration. It uses the
> name u2 for the carry that lives unti the next iteration,
>
>       umul_ppmm (p1, p0, u1, B2);
>       ADDC_LIMB (cy, u0, u0, u2 & B2);
>       u0 -= (-cy) & d;
>       add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0);

[...]

> The drawback, in the context of div_qr_1n_pi1, is that the u2 input to
> the iteration makes the quotient production much more costly.

And the cost is basically an

add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);

I think I've found a way to get rid of that, at the cost of more
side-channel leakage, by adding it into the multiplication by previous
u1 (which, unfortunately, may give another carry out). The iteration
would become something like

/* u recurrence, via u2, u1, u0 variables. */
umul_ppmm (p1, p0, u1, B2);
ADDC_LIMB (cy, u0, u0, u2 & B2);
u0 -= (-cy) & d;
q_u1 = u1; /* Save u1, for below quotient processing */
add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0);

/* q recurrence, via q0 variable. Input is the sum of "old" u1 and
"new" u2. */
q_u1 -= u2;
if (UNLIKELY(q_u1 == 0) && u2)
{
/* The increment of q_u1 overflowed, handle specially. */
q0 += cy;
add_ssaaaa (q1, q2, CNST_LIMB(1), dinv, q0 < cy, q0);
q0 = 0;
}
else
{
umul_ppmm (p1, p0, q_u1, dinv);
add_ssaaaa (q_2, q_1, q_2, q_1, CNST_LIMB(0), p1 + cy);
q0 = p0;
}
... store q2 and q1 ...

The second multiplication now depends on the first, so may put some more
pressure on out-of-order execution.

> It would be nice if we could stay with this recurrency for u, but keep
> the other way of carry handling for q.

I've given it a quick try, so far without success.

I might play a bit more with microoptimizations of the current algorithm
before doing something more sophisticated. You're idea of conditonally
adding the invariant d * B2 at the right place is also interesting, I
don't think I understood it correctly at first reading.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```