binvert_limb speedup on 64 bit machines with UHWtype
Marco Bodrato
bodrato at mail.dm.unipi.it
Sun Feb 27 16:52:13 CET 2022
Ciao John Gartell,
Il 2022-02-25 17:06 John Gatrell ha scritto:
> Hi everyone. I'm new here so I don't know how to submit a new
> gmp-impl.h
This list is the correct place for such a discussion.
> I tested using UHWtype in the macro for binvert_limb. On a 64 bit
> machine
> one of my programs gained a 3% speedup. On a 32 bit machine, there was
> no
Unfortunately, on different platforms, we have different speeds.
Should we use uint8_fast_t, uint16_fast_t, uint32_fast_t for the
different levels, and let the compiler choose? :-D
Anyway, if you are curious, you can also try what happens with a
table-lookup-free binvert_limb function. You can find it, named
sec_binvert_limb, in the file mpn/generic/sec_powm.c .
There are also architectures where smaller operations are NOT faster,
but, on the other side, more than one mul instruction can run
concurrently. For those architectures I'd suggest the following, any
couple of multiplications beyond the 32-bits limit can be computed in
parallel.
#define binvert_limb(inv,n)
\
do {
\
mp_limb_t __n = (n);
\
mp_limb_t __inv;
\
ASSERT ((__n & 1) == 1);
\
\
__inv = binvert_limb_table[(__n/2) & 0x7F]; /* 8 */
\
if (GMP_NUMB_BITS <= 32)
\
{
\
if (GMP_NUMB_BITS > 8)
\
__inv = 2 * __inv - __inv * __inv * __n;
\
if (GMP_NUMB_BITS > 16)
\
__inv = 2 * __inv - __inv * __inv * __n;
\
}
\
else
\
{
\
mp_limb_t __t, __t2, __p;
\
int __invbits = 32;
\
__t = __n * __inv - 1; /* x */
\
__t2= __t * __t; /* x^2 */
\
__p = __inv * (__t - __t2); /* (1-x^2)x/(x+1) */
\
\
while (__invbits < GMP_NUMB_BITS - 8)
\
{
\
__p *= __t2 + 1; /* (1-x^{2^n})x/(x+1) */
\
__t2 *= __t2; /* x^{2^n} */
\
__invbits <<= 1;
\
}
\
/* __inv * (x^{2^{n+2}+1}-1)/(x+1) */
\
__inv -= __p * (__t2 + 1);
\
}
\
\
ASSERT ((__inv * __n & GMP_NUMB_MASK) == 1);
\
(inv) = __inv & GMP_NUMB_MASK;
\
} while (0)
In any case, for 64-bits machines, all the described functions use 6
multiplications :-) but with very different paths toward the desired
result! :-D
Maybe you can try them on your platform... and tell us exactly which CPU
are you using for your tests, and how do they behave.
Ĝis,
m
More information about the gmp-devel
mailing list