Significant speedup divrem of very short numbers
Andy
borucki.andrzej at gmail.com
Sun Nov 10 23:39:20 UTC 2019
Although GMP is designed for very long numbers, sometimes systems using it
would have task divide amd mod very short numbers : one limb by one limb.
For example if we factor this number.
My suggestion (is has correct values?) is:
void mpn_tdiv_qr (mp_ptr qp, mp_ptr rp, mp_size_t qxn,
mp_srcptr np, mp_size_t nn, mp_srcptr dp, mp_size_t dn)
..........................
.........................
case 1:
{
if (nn = 1) <----------------
{
qp[0] = *np / *dp;
rp[0] = *np % *dp;
}
else
rp[0] = mpn_divrem_1 (qp, (mp_size_t) 0, np, nn, dp[0]);
return;
}
is called twice machine divrem but is I add assignment to limbs, it is
speedup, because doesn't careof memory changes. Better than using div
structure from <stdlib.h> because this structure is signed.
Finally:
if (nn = 1)
{
mp_limb_t n = *np;
mp_limb_t d = *dp;
qp[0] = n / d;
rp[0] = n % d;
}
VisualC++ give
mp_limb_t n = *np;
mp_limb_t d = *dp;
qp[0] = n / d;
00007FF75A0727B0 mov rax,qword ptr [r9]
00007FF75A0727B3 mov r10,rdx
00007FF75A0727B6 mov r8,qword ptr [dp]
00007FF75A0727BB xor edx,edx
00007FF75A0727BD div rax,qword ptr [r8]
00007FF75A0727C0 mov qword ptr [rcx],rax
rp[0] = n % d;
It is correct? These values must return mpn_tdiv_qr? It is possible also
sppedup larger (more complicated)
2limbs/1limb
?
My bench for million (1 to 1000000) loop and 10 measurement, choosing best
on i3 4160, 3.60 GHz
machine: 7.9392 ms
boost cpp_int: 21.7078 ms
gmp: 33.8823 ms
gmp with change 8.7771
Speedup 3.86 x
More information about the gmp-devel
mailing list