[PATCH] mpn/generic/jacbase: Another method

David Sparks sparks05 at proton.me
Mon Feb 16 20:55:40 CET 2026


On Monday, February 16th, 2026 at 15:52, marco.bodrato at tutanota.com <marco.bodrato at tutanota.com> wrote:
> If I understand correctly, the main idea of method 5 is to have just one exit condition for the loop, so that each run has a single branch. Of course this means that method 4 returns earlier in some cases.

Thank you for your review and comments!

To me, it was mostly the loop rotation to share the right shift,
and simplified handling of the sign-flip bit.  In particular,
	  bit ^= 2*c & (int)(b+1);
rather than
	  bit ^= c & (b ^ b >> 1);
A 1-bit left shift can be done by an add or an LEA, while
a right shift must use a right shift instruction.
But yes, it's probably the loop condition.

> This happens, when GCD(a,b)!=1, so that jacobi(a,b)=0, am I right?

You have it backward.  The loop is exited when a == b ==
GCD(orig_a,orig_b).  But if we have GCD == 1, then b will reach 1
and we'll successively cancel out the lsbit of a until a == 1
as well and the loop will exit.

So something like jacbase(1, 0xffffffff) will trigger a worst case.

But of course, that rarely happens.  Typical usage in a Lucas test is
we search for small d which makes jacobi(d, large_n) == -1.
Which in turn reduces to jacbase(large_n % d, d), which has a
reasonably uniformly distributed.

> How much faster is method 5 over method 4 when GCD=1? How the two methods compare when GCD!=1? How the two cases are distributed?

I get about 5.5% faster for GCD == 1.  I'll need to figure out a
good way to generate test data for GCD != 1.  It's easy enough to
generate random (odd) multiples of a particular common divisor, but
how should the divisor be distributed?

GCD == 1 is the common case in the primality tests I'm familiar with.
If GCD != 1, you've stumbled on a factor, which is very unlikely
after even a little trial division.

(My personal non-GMP test driver which I used to develop this code
benchmarks *only* GCD == 1 cases.  One thing I learned is that, on
x86, using an 8-bit "bit" register ends up being a tiny bit faster
for some reason.  I should, if HAVE_STDINT_H, use int_fast8_t.)



More information about the gmp-devel mailing list