mpf_get_str() is slow for big numbers?

Mon Dec 1 12:27:23 CET 2003

On 1 Dec 2003, Torbjorn Granlund wrote:

>   In some bases though, like base 2, 4, 16, the process can be accelerated by 
>   dropping the divisions.  Only masks are needed.  No need to even use 
>   extensive shifts operations.  Just run through each limbs and do a AND mask.
> 
>   I wonder if GMP is built with such optimizations.  The speed difference may 
>   come from that.  With the optimization, for base 16 for example, the runtime 
>   become linear to the number of digits (O = ln(n))
> 
> No need to wonder.  A great feature of GMP is that it comes with
> source code.  It is Free Software, remember.  :-)
> 
> In GMP 4.1.2, mpz radix conversion is O(n) for bases 2^t, and
> O(n^2) for other bases.  mpf radix conversion is O(n^2) for all
> bases.
> 
> (In the next major GMP release, both mpz and mpf will be O(n) for
> bases 2^t and O(n*log(n)) for other bases.  They will be based on
> common mpn_get_str and mpn_set_str with that property.)
> 

My apologies if I'm preaching to the choir.

The bases 2, 8, 10, and 16 should get special attention, even to the point
of being special-cased.  Base 10 can be accelerated like this:

static const unsigned short digits[10000] = {
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x0010, 0x0011, ...
..., 0x9997, 0x9998, 0x9999 };

char * mpz_to_decstring(mpz_t n) {
    char * retval = malloc(...);
    unsigned long r;
    int i = 0;
    unsigned short t;

    assert(mpz_cmp_ui(n, 0ul) >= 0);

    do {
        r = mpz_fdiv_ui(n, 1000000000ul);
        t = digits[r % 10000ul];
        r /= 10000ul;
        retval[i] = t & 15;
        retval[i+1] = (t >> 4) & 15;
        retval[i+2] = (t >> 8) & 15;
        retval[i+3] = (t >> 12) & 15;
        t = digits[r % 10000ul];
        r /= 10000ul;
        retval[i+4] = t & 15;
        retval[i+5] = (t >> 4) & 15;
        retval[i+6] = (t >> 8) & 15;
        retval[i+7] = (t >> 12) & 16;
        retval[i+8] = r;
        i += 9;
    } while (mpz_cmp_ui(n, 0ul) > 0);

    /* We've probably added too many zeros- pull some off */
    while ((i > 1) && (retval[i-1] == 0)) {
        i -= 1;
    }

    /* Truncate array to length i */
    return retval;
}

With the above code, you're only doing one full-length divide every 9 
digits.  Note the 20K table the algorithm requires.  Not exceptional huge, 
if there is just one of them kicking around, but I wouldn't apply this 
trick to "odd" bases like 7 or 13.

For another possible speed up, instead of using / as above, you could 
simply shift left four and multiply by the multiplicative inverse of 625 
mod 2^32.  The modulo would then become a multiply and a subtract.  
Instead of:
        t = digits[r % 10000];
        r /= 10000;
You would do:
        {
            unsigned long newr = ((r >> 4) * 989560465ul) & 0xFFFFFFFFul;
            /* newr is now r/10000ul */
            t = digits[r - (newr * 10000ul)];
            r = newr;
        }

Extending this algorithm so that on 64-bit systems you can do 18-digit 
chunks instead of just 9-digit chunks is left as an exercise to the reader 
:-).

Note that all of this is just linear speedups- the algorithm is still O(N 
log N) effectively.  But IMHO, given the importance of base 10, linear 
speedups should be gone after.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian