# Multi-limb inverse for mpn_divrem_1

Torbjorn Granlund tege at swox.com
Mon Sep 15 01:41:07 CEST 2003

```We are planning to radically speed mpn_divrem_1, mpn_mod_1, and
friends by allowing for a multi-limb divisor reciprocal.

The problem with division by a single-limb divisor is that the
limb multiply instructions lie on the recurrency path.  Pipelined
multiply units are largely wasted.

By using a n-limb inverse, we can develop n quotient limbs per
iteration, at the cost of O(n^2) multiplies, but without
significantly increased recurrency depth between iterations.  It
will be practical to use n of up to about 4, bigger n result in
too many limb multiplies.

Here is a proof-of-concept implementation for n = 2.
(Karl Hasselström and I wrote this.)

#include "gmp.h"
#include "gmp-impl.h"
#include "longlong.h"

/* # of limbs in inverse	# of full limb mults	# of limb half mults
1                             1+1                     1+2
2                             4+2                     7+3
3                             8+3                    14+5
4                            13+4                    23+7
*/

mp_limb_t
xmpn_divrem_1 (mp_ptr qp, mp_ptr up, mp_size_t un, mp_limb_t vl)
{
mp_limb_t vip;
mp_limb_t tp;
mp_limb_t rp;
mp_limb_t pp;
int i;
mp_limb_t r;

#if GMP_NAIL_BITS != 0
#error not yet for anils
#endif

/* Compute 2-limb inverse.  */
if ((vl << 1) == 0)
{
vip = ~ (mp_limb_t) 0;
vip = ~ (mp_limb_t) 0;
}
else
{
/* For now use mpn_divrem_1.  */
mp_limb_t pow2;
pow2 = 0;  pow2 = 0;  pow2 = -vl;
mpn_divrem_1 (tp, 0, pow2, 3, vl);
vip = tp;
vip = tp;
}

qp += un - 1;
up += un - 1;

if (up >= vl)
{
qp = 1;
rp = up - vl;
}
else
{
qp = 0;
rp = up;
}
qp -= 2;
up -= 2;

for (i = un - 3; i >= 0; i -= 2)
{
rp = rp;
rp = up;
mpn_mul_basecase (tp, rp, 2, vip, 2);
add_ssaaaa (qp, qp, tp, tp, rp, rp);

mpn_mul_1 (pp, qp, 2, vl);
rp = rp;
rp = up;
sub_ddmmss (rp, rp, rp, rp, pp, pp);

if (rp != 0)
{
mpn_incr_u (qp, 1);
mpn_decr_u (rp, vl);
}
if (rp != 0)
{
mpn_incr_u (qp, 1);
mpn_decr_u (rp, vl);
}
if (rp >= vl)
{
mpn_incr_u (qp, 1);
mpn_decr_u (rp, vl);
}
ASSERT (rp == 0);
qp -= 2;
up -= 2;
}
if ((un & 1) == 0)
{
udiv_qrnnd_preinv (qp, r, rp, up, vl, vip);
}
else
{
r = rp;
}
return r;
}
```