mpf_get_str() is slow for big numbers?
Brian Hurt
bhurt at spnz.org
Mon Dec 1 12:27:23 CET 2003
On 1 Dec 2003, Torbjorn Granlund wrote:
> In some bases though, like base 2, 4, 16, the process can be accelerated by
> dropping the divisions. Only masks are needed. No need to even use
> extensive shifts operations. Just run through each limbs and do a AND mask.
>
> I wonder if GMP is built with such optimizations. The speed difference may
> come from that. With the optimization, for base 16 for example, the runtime
> become linear to the number of digits (O = ln(n))
>
> No need to wonder. A great feature of GMP is that it comes with
> source code. It is Free Software, remember. :-)
>
> In GMP 4.1.2, mpz radix conversion is O(n) for bases 2^t, and
> O(n^2) for other bases. mpf radix conversion is O(n^2) for all
> bases.
>
> (In the next major GMP release, both mpz and mpf will be O(n) for
> bases 2^t and O(n*log(n)) for other bases. They will be based on
> common mpn_get_str and mpn_set_str with that property.)
>
My apologies if I'm preaching to the choir.
The bases 2, 8, 10, and 16 should get special attention, even to the point
of being special-cased. Base 10 can be accelerated like this:
static const unsigned short digits[10000] = {
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x0010, 0x0011, ...
..., 0x9997, 0x9998, 0x9999 };
char * mpz_to_decstring(mpz_t n) {
char * retval = malloc(...);
unsigned long r;
int i = 0;
unsigned short t;
assert(mpz_cmp_ui(n, 0ul) >= 0);
do {
r = mpz_fdiv_ui(n, 1000000000ul);
t = digits[r % 10000ul];
r /= 10000ul;
retval[i] = t & 15;
retval[i+1] = (t >> 4) & 15;
retval[i+2] = (t >> 8) & 15;
retval[i+3] = (t >> 12) & 15;
t = digits[r % 10000ul];
r /= 10000ul;
retval[i+4] = t & 15;
retval[i+5] = (t >> 4) & 15;
retval[i+6] = (t >> 8) & 15;
retval[i+7] = (t >> 12) & 16;
retval[i+8] = r;
i += 9;
} while (mpz_cmp_ui(n, 0ul) > 0);
/* We've probably added too many zeros- pull some off */
while ((i > 1) && (retval[i-1] == 0)) {
i -= 1;
}
/* Truncate array to length i */
return retval;
}
With the above code, you're only doing one full-length divide every 9
digits. Note the 20K table the algorithm requires. Not exceptional huge,
if there is just one of them kicking around, but I wouldn't apply this
trick to "odd" bases like 7 or 13.
For another possible speed up, instead of using / as above, you could
simply shift left four and multiply by the multiplicative inverse of 625
mod 2^32. The modulo would then become a multiply and a subtract.
Instead of:
t = digits[r % 10000];
r /= 10000;
You would do:
{
unsigned long newr = ((r >> 4) * 989560465ul) & 0xFFFFFFFFul;
/* newr is now r/10000ul */
t = digits[r - (newr * 10000ul)];
r = newr;
}
Extending this algorithm so that on 64-bit systems you can do 18-digit
chunks instead of just 9-digit chunks is left as an exercise to the reader
:-).
Note that all of this is just linear speedups- the algorithm is still O(N
log N) effectively. But IMHO, given the importance of base 10, linear
speedups should be gone after.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
More information about the gmp-discuss
mailing list