Significant speedup divrem of very short numbers

Andy borucki.andrzej at
Sun Nov 10 23:39:20 UTC 2019

Although GMP is designed for very long numbers, sometimes systems using it
would have task divide amd mod very short numbers : one limb by one limb.
For example if we factor this number.
My suggestion (is has correct values?) is:

void mpn_tdiv_qr (mp_ptr qp, mp_ptr rp, mp_size_t qxn,
    mp_srcptr np, mp_size_t nn, mp_srcptr dp, mp_size_t dn)
case 1:
    if (nn = 1) <----------------
             qp[0] = *np / *dp;
             rp[0] = *np % *dp;
        rp[0] = mpn_divrem_1 (qp, (mp_size_t) 0, np, nn, dp[0]);
is called twice machine divrem but is I add assignment to limbs, it is
speedup, because doesn't careof memory changes. Better than using div
structure from <stdlib.h> because this structure is signed.
if (nn = 1)
mp_limb_t n = *np;
mp_limb_t d = *dp;
qp[0] = n / d;
rp[0] = n % d;
VisualC++ give
mp_limb_t n = *np;
mp_limb_t d = *dp;
qp[0] = n / d;
00007FF75A0727B0  mov         rax,qword ptr [r9]
00007FF75A0727B3  mov         r10,rdx
00007FF75A0727B6  mov         r8,qword ptr [dp]
00007FF75A0727BB  xor         edx,edx
00007FF75A0727BD  div         rax,qword ptr [r8]
00007FF75A0727C0  mov         qword ptr [rcx],rax
rp[0] = n % d;

It is correct? These values must return mpn_tdiv_qr? It is possible also
sppedup larger (more complicated)
My bench  for million (1 to 1000000) loop and 10 measurement, choosing best
on i3 4160, 3.60 GHz
machine: 7.9392 ms
boost cpp_int: 21.7078 ms
gmp: 33.8823 ms
gmp with change 8.7771
Speedup 3.86 x

More information about the gmp-devel mailing list