mpf_get_str() is slow for big numbers?

Brian Hurt bhurt at spnz.org
Mon Dec 1 12:27:23 CET 2003

```On 1 Dec 2003, Torbjorn Granlund wrote:

>   In some bases though, like base 2, 4, 16, the process can be accelerated by
>   dropping the divisions.  Only masks are needed.  No need to even use
>   extensive shifts operations.  Just run through each limbs and do a AND mask.
>
>   I wonder if GMP is built with such optimizations.  The speed difference may
>   come from that.  With the optimization, for base 16 for example, the runtime
>   become linear to the number of digits (O = ln(n))
>
> No need to wonder.  A great feature of GMP is that it comes with
> source code.  It is Free Software, remember.  :-)
>
> In GMP 4.1.2, mpz radix conversion is O(n) for bases 2^t, and
> O(n^2) for other bases.  mpf radix conversion is O(n^2) for all
> bases.
>
> (In the next major GMP release, both mpz and mpf will be O(n) for
> bases 2^t and O(n*log(n)) for other bases.  They will be based on
> common mpn_get_str and mpn_set_str with that property.)
>

My apologies if I'm preaching to the choir.

The bases 2, 8, 10, and 16 should get special attention, even to the point
of being special-cased.  Base 10 can be accelerated like this:

static const unsigned short digits[10000] = {
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x0010, 0x0011, ...
..., 0x9997, 0x9998, 0x9999 };

char * mpz_to_decstring(mpz_t n) {
char * retval = malloc(...);
unsigned long r;
int i = 0;
unsigned short t;

assert(mpz_cmp_ui(n, 0ul) >= 0);

do {
r = mpz_fdiv_ui(n, 1000000000ul);
t = digits[r % 10000ul];
r /= 10000ul;
retval[i] = t & 15;
retval[i+1] = (t >> 4) & 15;
retval[i+2] = (t >> 8) & 15;
retval[i+3] = (t >> 12) & 15;
t = digits[r % 10000ul];
r /= 10000ul;
retval[i+4] = t & 15;
retval[i+5] = (t >> 4) & 15;
retval[i+6] = (t >> 8) & 15;
retval[i+7] = (t >> 12) & 16;
retval[i+8] = r;
i += 9;
} while (mpz_cmp_ui(n, 0ul) > 0);

/* We've probably added too many zeros- pull some off */
while ((i > 1) && (retval[i-1] == 0)) {
i -= 1;
}

/* Truncate array to length i */
return retval;
}

With the above code, you're only doing one full-length divide every 9
digits.  Note the 20K table the algorithm requires.  Not exceptional huge,
if there is just one of them kicking around, but I wouldn't apply this
trick to "odd" bases like 7 or 13.

For another possible speed up, instead of using / as above, you could
simply shift left four and multiply by the multiplicative inverse of 625
mod 2^32.  The modulo would then become a multiply and a subtract.
t = digits[r % 10000];
r /= 10000;
You would do:
{
unsigned long newr = ((r >> 4) * 989560465ul) & 0xFFFFFFFFul;
/* newr is now r/10000ul */
t = digits[r - (newr * 10000ul)];
r = newr;
}

Extending this algorithm so that on 64-bit systems you can do 18-digit
chunks instead of just 9-digit chunks is left as an exercise to the reader
:-).

Note that all of this is just linear speedups- the algorithm is still O(N
log N) effectively.  But IMHO, given the importance of base 10, linear
speedups should be gone after.

--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian

```