_mp_alloc vs ALLOC
tg at gmplib.org
Fri Feb 24 10:40:00 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
What about the test in
#define TMP_ALLOC(n) \
(LIKELY ((n) < 65536) ? TMP_SALLOC(n) : TMP_BALLOC(n))
That test will cost a cycle or two for each TMP_ALLOC call (with
non-constant n), regardless of size, won't it?
I think my previous statement "1 cycle" should be amended to "2 cycles".
A correctly predicted compare-and-branch cost 1-2 cycles, with a
throughput of 1 per cycle (on any modern machine). The allocation code
will run in parallel with the branch (assuming again correct prediction).
I cannot see how TMP_ALLOC_LIMBS_2 could save *anything* for small
allocations, since it basically performs the same operations. I.e., the
net cost of splitting TMP_ALLOC_LIMBS_2 into two TMP_ALLOC_LIMBS is 0.
But it might be +-1 depending on alignment and all sorts of magic.
More information about the gmp-devel