On 2018-02-05 22:46:58 +0100, Torbjorn Granlund wrote:
> As suspected, memset adds quite some moverhead.

This depends on what the compiler knows about the size.
If GCC knows the size at compile time, it does not generate
a call to memset:

#include <string.h>

void foo (long *p)
  memset (p, 0, 1);

gives with gcc -O2 (7.3.0) on x86_64:

        movb    $0, (%rdi)

When the size is not known at compile time, I wonder whether there
is a way to tell the compiler how to optimize. The best code may be
different whether the size can be often small or not, i.e. whether
to inline a test on the size or not. But this is not even obvious
as the code size may be important for the instruction cache.

