mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

Tue Sep 2 07:50:51 UTC 2014

tg at gmplib.org (Torbjörn Granlund) writes:

>   Vagely related: We chould have an mpn_gcd_11 function, accepting two
>   limb-size arguments.
>   
> Which would then be in assembly.  This would obsolete the many mpn_gcd_1
> implementations, I think.

I've had a look at x86_64/gcd_1.asm, to try to extract an gcd_11.asm
from it.

Before entering the main binary gcd loop, the code needs to:

1. Remove and keep track of common trailing zeros,

2. Remove trailing zeros on one of the arguments (that might be cheaper
   than checking which argument is odd already and swap them, not sure).

But the gcd_1.asm code also has a call to mpn_modexact_1_odd in the case
that u0 >> v0. I'm not sure this makes sense. If something like this is
kept at all, I'd suggest just getting an 8-bit v0-inverse from
binvert_limb_table, and rather than computing a larger inverse, just
loop reducing 8 bits at a time as long as u0 > v0. But one get some
overhead. Before the main loop one needs the following steps:

1. Remove and keep track of common trailing zeros,

2. Remove trailing zeros on *both* arguments (one will be odd, but we
   don't yet know which).

3. Swap u0, v0 if u0 < v0

4. Check if u0 is much larger than v0, if not, enter main loop directly.

Do you think that's worth the effort?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.