GCD project status?

Thu Sep 19 08:14:45 UTC 2019

tg at gmplib.org (Torbjörn Granlund) writes:

> With some thought, I am sure the table size could come down while the
> accuracy could improve.

I see two ways. Either just do an inversion table, which will cost an
additional multiplication when used. Than one could handle quotients up
to 7 or 8 bits bits with just 256 bytes of table.

Or shift both ni and di, and get quotient with something like

  q = tabp[(ni << NBITS) + di] >> (NBITS - (dcnt - ncnt))

Then one could handle quotients up to 5 bits with a table of 1024 bytes
(indexing by 5 significant bits of each of n and d, excluding the most
significant one bit).

>       q0 = tabp[(ni << NBITS) + di];
>       r0 = n0 - q0 * d0;
>       mask = -(mp_limb_t) (r0 >= d0);
>       q0 -= mask;
>       r0 -= d0 & mask;

You round tabulated q down, and adjust up. I guess one can save one
cycle if one instead rounds it up and adjusts down,

        q0 = tabp[(ni << NBITS) + di];
        t = q0 * d0;
        r0 = n0 - t;
        mask = -(mp_limb_t) (n0 < t);
        q0 += mask;
        r0 += d0 & mask;

since computation of mask and initial r0 can then run in parallel (or
the same instruction, if the compiler is clever enough to use use carry
out). Or is there a risk for overflow in q0 * d0?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.