minor noises made by gcc and -std=iso9899:1999 with -Wall -pedantic -Wextra -pedantic-errors

Wed Jul 3 08:58:58 UTC 2019

On 2019-07-01 23:12:07 +0200, Torbjorn Granlund wrote:
> Vincent Lefevre <vincent at vinc17.net> writes:
> 
>   >     mp_size_t n = 1 + (2 * an >= 3 * bn ? (an - 1) / (size_t) 3 : (bn - 1) >> 1);
>   > 
>   >   What's the point of the cast to size_t?
>   > 
>   > It is a slight performance optimisation.
> 
>   Could you explain?
> 
> The code for division by constants is shorter for unsigned than for
> signed dividends.

This could be enabled by providing range information to the compiler:

  ASSUME (an >= 1);
  ASSUME (bn >= 1);

It makes the source code longer, but IMHO, much easier to understand.
And the compiler has more information (this might make the code even
shorter on some processors).

This seems to be OK with GCC (without the cast): the generated code
is the same for both:

#define ASSUME(x)                                      \
  (! __builtin_constant_p (!!(x) || !(x)) || (x) ?     \
   (void) 0 : __builtin_unreachable())

long sdiv (long i)
{
  ASSUME (i >= 0);
  return i / 3;
}

unsigned long udiv (unsigned long i)
{
  return i / 3;
}

The ASSUME macro is here for GCC; Microsoft's compiler has __assume
and replacing __builtin_unreachable by code with UB might be useful
with some other compilers (to be tested). Anyway, you don't control
what alternate compilers will generate (for instance, with tcc, the
generated code for the signed division is 1-byte *shorter* than for
the unsigned one).

>   I would say that, on the contrary, it may make
>   the code slower, e.g. on platforms where size_t > mp_size_t: the
>   compiler does not know that an > 0, thus it could not reduce the
>   size of the division to the one of mp_size_t in order to support
>   negative arguments implicitly converted to size_t.
> 
> Do you have some real-world platform in mind?

Yes, 64-bit MS Windows (WIN64): I can't test, but according to
MS specifications[*], long (= mp_size_t) is on 32 bits while
size_t is on 64 bits.

[*] https://docs.microsoft.com/en-us/windows/desktop/winprog/windows-data-types

> Please show me the asm code which is slower with that (size_t) cast
> than without it.

If I try similar code with GCC on Linux/x86_64 (matching the type
sizes):

int msdiv1 (int i)
{
  return i / (unsigned long) 3;
}

gives

        .cfi_startproc
        movslq  %edi, %rdx
        movabsq $-6148914691236517205, %rdi
        movq    %rdx, %rax
        mulq    %rdi
        movq    %rdx, %rax
        shrq    %rax
        ret
        .cfi_endproc

while

int msdiv2 (int i)
{
  return i / 3;
}

gives

        .cfi_startproc
        movl    %edi, %eax
        movl    $1431655766, %edx
        sarl    $31, %edi
        imull   %edx
        movl    %edx, %eax
        subl    %edi, %eax
        ret
        .cfi_endproc

It seems that the cast has made the code more complex, i.e.
with a larger constant ($-6148914691236517205 instead of
$1431655766).

>   >   Moreover, GMP favors signed arithmetic with a signed type mp_size_t,
>   >   thus artificially switching to unsigned arithmetic with this cast is
>   >   a bad idea.
>   > 
>   > Why?  We have full control of the ranges of the involved variables.
> 
>   You do, but the compiler does not know the ranges. And if the compiler
>   knew the ranges, the presence of the cast would not change anything
>   for the compiler.
> 
> I thought you argued against casting to size_t, but here you seem to
> prefer additional such casting.

This is not what I've said. I've said "if the compiler knew the
ranges [...]". Actually, I've tested ASSUME (i >= 0) and the cast
in the MS Windows case, and the cast still yields more complex code
just like above. This means that GCC won't try to reduce the type
sizes even when it is known that everything would fit. Thus the
best code for GCC, based on my tests is:

1. Provide range information (ASSUME macro).
2. Do not use a cast; or if you want to use a cast, make sure that
   this is an unsigned type with the same size as mp_size_t, never
   a larger size.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)