_mp_alloc vs ALLOC

Torbjorn Granlund tg at gmplib.org
Fri Feb 24 10:40:00 CET 2012


nisse at lysator.liu.se (Niels Möller) writes:

  What about the test in
  
  #define TMP_ALLOC(n) \
     (LIKELY ((n) < 65536) ? TMP_SALLOC(n) : TMP_BALLOC(n))
  
  That test will cost a cycle or two for each TMP_ALLOC call (with
  non-constant n), regardless of size, won't it?
  
I think my previous statement "1 cycle" should be amended to "2 cycles".

A correctly predicted compare-and-branch cost 1-2 cycles, with a
throughput of 1 per cycle (on any modern machine).  The allocation code
will run in parallel with the branch (assuming again correct prediction).

I cannot see how TMP_ALLOC_LIMBS_2 could save *anything* for small
allocations, since it basically performs the same operations.  I.e., the
net cost of splitting TMP_ALLOC_LIMBS_2 into two TMP_ALLOC_LIMBS is 0.
But it might be +-1 depending on alignment and all sorts of magic.

-- 
Torbjörn


More information about the gmp-devel mailing list