[PATCH] mpn/generic/perfsqr: Improve alternate (currently disabled) test.

David Sparks sparks05 at proton.me
Wed Mar 4 14:11:09 CET 2026


On Wednesday, March 4th, 2026 at 06:18, marco.bodrato at tutanota.com <marco.bodrato at tutanota.com> wrote:
>> But... something's niggling me. Is it okay to strip off an odd
>> number of lsbits in general?
> 
> Of course it is not.
> That's why the proposed code keeps track of parity of the stripped bits:
>     int off = 0;
>     for (; UNLIKELY ((lo = *up) == CNST_LIMB (0)); ++up, --usize)
>       off ^= GMP_NUMB_BITS & 1;
> 
> off is 0 if an even number of bits was skipped, it is 1 if we skipped an odd number of zeros.
> 
> The two following alternative codes are adapted accordingly,
> on the "count_trailing_zeros" side, adjusting cnt with
>   cnt += cnt+off+1 & 1
> (but I now realize that it might reach GMP_NUMB_BITS,
> should one use something more clever? )

Yes, but shifting by GMP_NUMB_BITS < GMP_LIMB_BITS is safe.
And GMP_LIMB_BITS is always even, so this is always true if
GMP_NUMB_BITS is odd.

But yes, I saw and approve of this code...

> and shifting the mask on the COUNT_TRAILING_ZEROS_SLOW side, with
>   lsbit = lsbit*3 & MP_LIMB_T_MAX/3 << 1-off;
> 
> Moreover, before falling back to the square root computation, a limb is added back, if needed,
> to restore parity.
>     usize += off;
>     up -= off;

Ah!  I missed this part, which fixes the error I was thinking of.

I was imagining we stripped off the zero limbs and then fell straight
through to the square root.  Adding back this limb fixes everything.

My apologies!


>> If count_trailing_zeros(0) is fully nasal-demons Undefined
>> (in practice, might divide by 0 or something), then we have
>> a problem. But if it returns a value, even an unpredictable
>> one, then it can be masked.
>
> The current "plain C" implementation that we have in longlong.h
> loops forever if the operand is zero.
> https://gmplib.org/repo/gmp/file/tip/longlong.h#l2256

Good point.  I was looking at all the asm implementations and
forgot about that one!  GCC's __builtin_ctz might do the same.

>> Just FYI, I get "403 Forbidden" for this, so I'm not sure what the
>> ".1" and ".2" option differences actually are.
> 
> With .1 the argument of the function has a trailing zero limb, with .2 it has two...
> If the code skips them, then the actual computation of the square root is faster.

Thank you!


More information about the gmp-devel mailing list