[PATCH] mpn/generic/perfsqr: Improve alternate (currently disabled) test.
David Sparks
sparks05 at proton.me
Wed Mar 4 14:11:09 CET 2026
On Wednesday, March 4th, 2026 at 06:18, marco.bodrato at tutanota.com <marco.bodrato at tutanota.com> wrote:
>> But... something's niggling me. Is it okay to strip off an odd
>> number of lsbits in general?
>
> Of course it is not.
> That's why the proposed code keeps track of parity of the stripped bits:
> int off = 0;
> for (; UNLIKELY ((lo = *up) == CNST_LIMB (0)); ++up, --usize)
> off ^= GMP_NUMB_BITS & 1;
>
> off is 0 if an even number of bits was skipped, it is 1 if we skipped an odd number of zeros.
>
> The two following alternative codes are adapted accordingly,
> on the "count_trailing_zeros" side, adjusting cnt with
> cnt += cnt+off+1 & 1
> (but I now realize that it might reach GMP_NUMB_BITS,
> should one use something more clever? )
Yes, but shifting by GMP_NUMB_BITS < GMP_LIMB_BITS is safe.
And GMP_LIMB_BITS is always even, so this is always true if
GMP_NUMB_BITS is odd.
But yes, I saw and approve of this code...
> and shifting the mask on the COUNT_TRAILING_ZEROS_SLOW side, with
> lsbit = lsbit*3 & MP_LIMB_T_MAX/3 << 1-off;
>
> Moreover, before falling back to the square root computation, a limb is added back, if needed,
> to restore parity.
> usize += off;
> up -= off;
Ah! I missed this part, which fixes the error I was thinking of.
I was imagining we stripped off the zero limbs and then fell straight
through to the square root. Adding back this limb fixes everything.
My apologies!
>> If count_trailing_zeros(0) is fully nasal-demons Undefined
>> (in practice, might divide by 0 or something), then we have
>> a problem. But if it returns a value, even an unpredictable
>> one, then it can be masked.
>
> The current "plain C" implementation that we have in longlong.h
> loops forever if the operand is zero.
> https://gmplib.org/repo/gmp/file/tip/longlong.h#l2256
Good point. I was looking at all the asm implementations and
forgot about that one! GCC's __builtin_ctz might do the same.
>> Just FYI, I get "403 Forbidden" for this, so I'm not sure what the
>> ".1" and ".2" option differences actually are.
>
> With .1 the argument of the function has a trailing zero limb, with .2 it has two...
> If the code skips them, then the actual computation of the square root is faster.
Thank you!
More information about the gmp-devel
mailing list