[PATCH] mpn/generic/perfsqr: Improve alternate (currently disabled) test.
marco.bodrato at tutanota.com
marco.bodrato at tutanota.com
Mon Mar 2 21:32:47 CET 2026
Ciao David,
14 feb 2026, 19:11 da sparks05 at proton.me:
> If count_trailing_zeros is slow (i.e. not implemented in hardware),
> the mod^2^k test can skip shifting down the input and instead
> shift up the bits to be tested.
>
That's interesting.
We should measure if it's also effective. So, at first, I slightly changed the tune/speed function to measure also the size 1, where the impact is more relevant.
https://gmplib.org/repo/gmp/rev/a15023544812
Anyway, tune/speed measure only the worst case, i.e. when the operand actually is a square.
> + /* Check that we have even multiplicity of 2, and then check that the rest
> + is a possible perfect square mod 8. Leave disabled until we can
> + determine this really is an improvement. This supersedes the mod-256
> + test, throwing out more non-squares (5/6 = 83.3%), but at the expense
> + of somewhat more cycles.
>
Then I looked at the code. The current mod-256 test detects the 82.8% of non-squares,
in O(1) time. The new one promises a better rate: 83.3%. Not much, and...
> - lo = up[0];
> - while (lo == 0)
> - up++, lo = up[0], usize--;
>
...this test is not even O(1)!
But in the rare circumstances when it isn't O(1), the size of the following tests is reduced.
This sounds as a good idea.
My question: can we really skip limbs?
My answer: if GMP_NUMB_BITS is even, yes we can. But if it's odd...
That's why I tried to write a code working for any GMP_NUMB_BITS value.
I attach a proposed patch. GCC seems able to optimize out all the additional computations
needed only when GMP_NUMB_BITS is odd.
I hardly measure any difference, between the current code (#if 1) and the proposed one (#if 0).
That's why I changed tune/speed again, now mpn_perfect_square_p.2 measures the time
spent by the function on operands with the lowest 2 limbs set to zero.
https://gmplib.org/repo/gmp/rev/48b8f3a3c4ce
Here are the results on my laptop.
Current code (#if 1):
build/tune/speed.1 -s1-4 -p100000000 -c mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
overhead 4.89 cycles, precision 100000000 units of 5.28e-10 secs, CPU freq 1895.61 MHz
mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
1 71.47 71.50 #71.46
2 130.36 128.36 #128.27
3 349.57 343.29 #334.26
4 289.18 289.35 #287.18
Proposed code (#if 0):
build/tune/speed.0 -s1-4 -p100000000 -c mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
overhead 4.89 cycles, precision 100000000 units of 5.28e-10 secs, CPU freq 1895.58 MHz
mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
1 74.44 #74.25 74.34
2 133.11 #74.83 74.87
3 344.04 134.13 #75.99
4 290.53 355.27 #137.47
Here are the results on shell at gmp.
Current code (#if 1):
[bodrato at shell ~/gmp-repo]$ ./speed.18498_1 -s 1-4 -p1000000000 -c mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
overhead 5.35 cycles, precision 1000000000 units of 2.86e-10 secs, CPU freq 3500.17 MHz
mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
1 #69.91 69.91 69.92
2 130.71 #126.20 126.20
3 338.19 334.79 #334.32
4 284.02 #279.31 282.25
Proposed code (#if 0):
[bodrato at shell ~/gmp-repo]$ ./speed.18498_0 -s 1-4 -p1000000000 -c mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
overhead 5.35 cycles, precision 1000000000 units of 2.86e-10 secs, CPU freq 3500.17 MHz
mpn_perfect_square_p mpn_perfect_square_p.1 mpn_perfect_square_p.2
1 70.31 #70.30 70.35
2 132.74 #70.94 71.01
3 340.88 133.17 #72.19
4 278.65 337.05 #132.94
It seems slightly slower in general, even if it can be much faster when skipping zero limbs. But is this case relevant? For the mpn_ layer, we may assume that if this case is frequent, the user code can handle it efficently. on the other side, that's not equally easy for the mpz_ layer.
Should we use that version of the code? Or just change it while keeping it disabled :-) ?
Ĝis,mb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmp.diff
Type: text/x-patch
Size: 1792 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20260302/a3bae5b3/attachment.bin>
More information about the gmp-devel
mailing list