GMP 6.1.2 t-count_zeros failure on ARM with assertions

Wed Jan 17 16:23:02 UTC 2018

On Wed, 17 Jan 2018, Vincent Lefevre wrote:

> On 2017-12-27 17:15:43 +0100, Niels Möller wrote:
>> Vincent Lefevre <vincent at vinc17.net> writes:
>>
>>>> diff -r 20cf1131dc94 longlong.h
>>>> --- a/longlong.h	Thu Aug 31 01:00:02 2017 +0200
>>>> +++ b/longlong.h	Tue Dec 26 10:59:24 2017 +0100
>>>> @@ -535,7 +535,6 @@ extern UWtype __MPN(udiv_qrnnd) (UWtype
>>>>  #endif /* defined(__ARM_ARCH_2__) ... */
>>>>  #define count_leading_zeros(count, x)  count_leading_zeros_gcc_clz(count, x)
>>>>  #define count_trailing_zeros(count, x)  count_trailing_zeros_gcc_ctz(count, x)
>>>> -#define COUNT_LEADING_ZEROS_0 32
>>>>  #endif /* __arm__ */
>>>>
>>>>  #if defined (__aarch64__) && W_TYPE_SIZE == 64
>>>> @@ -586,7 +585,6 @@ extern UWtype __MPN(udiv_qrnnd) (UWtype
>>>>  #endif
>>>>  #define count_leading_zeros(count, x)  count_leading_zeros_gcc_clz(count, x)
>>>>  #define count_trailing_zeros(count, x)  count_trailing_zeros_gcc_ctz(count, x)
>>>> -#define COUNT_LEADING_ZEROS_0 64
>>>>  #endif /* __aarch64__ */
>>>>
>>>>  #if defined (__clipper__) && W_TYPE_SIZE == 32
>>>
>>> I confirm that this fixes the problem.
>>
>> Thanks for testing. Pushed now.
>
> This is not sufficient. I get a failure for 32-bit x86 with MinGW.
> This one is incorrect too:

That's a bit different.

> #else /* ! pentiummmx || LONGLONG_STANDALONE */
> /* The following should be a fixed 14 cycles or so.  Some scheduling
>   opportunities should be available between the float load/store too.  This
>   sort of code is used in gcc 3 for __builtin_ffs (with "n&-n") and is
>   apparently suggested by the Intel optimizing manual (don't know exactly
>   where).  gcc 2.95 or up will be best for this, so the "double" is
>   correctly aligned on the stack.  */
> #define count_leading_zeros(c,n)                                        \
>  do {                                                                  \
>    union {                                                             \
>      double    d;                                                      \
>      unsigned  a[2];                                                   \
>    } __u;                                                              \
>    ASSERT ((n) != 0);                                                  \
>    __u.d = (UWtype) (n);                                               \
>    (c) = 0x3FF + 31 - (__u.a[1] >> 20);                                \
>  } while (0)
> #define COUNT_LEADING_ZEROS_0   (0x3FF + 31)
> #endif /* pentiummx */
>
> There's also a typo in the latest comment. Patch attached.

Indeed, it doesn't make sense to have both the assertion and 
COUNT_LEADING_ZEROS_0. Would it work to remove the assertion instead?

-- 
Marc Glisse