Conversions from binary to a power-of-2 radix use a simple and fast O(N) bit extraction algorithm.
Conversions from binary to other radices use one of two algorithms. Sizes
below GET_STR_PRECOMPUTE_THRESHOLD
use a basic O(N^2) method.
Repeated divisions by b^n are made, where b is the radix and
n is the biggest power that fits in a limb. But instead of simply
using the remainder r from such divisions, an extra divide step is done
to give a fractional limb representing r/b^n. The digits of r
can then be extracted using multiplications by b rather than divisions.
Special case code is provided for decimal, allowing multiplications by 10 to
optimize to shifts and adds.
Above GET_STR_PRECOMPUTE_THRESHOLD
a sub-quadratic algorithm is used.
For an input t, powers b^(n*2^i) of the radix are
calculated, until a power between t and sqrt(t) is
reached. t is then divided by that largest power, giving a quotient
which is the digits above that power, and a remainder which is those below.
These two parts are in turn divided by the second highest power, and so on
recursively. When a piece has been divided down to less than
GET_STR_DC_THRESHOLD
limbs, the basecase algorithm described above is
used.
The advantage of this algorithm is that big divisions can make use of the sub-quadratic divide and conquer division (see Divide and Conquer Division), and big divisions tend to have less overheads than lots of separate single limb divisions anyway. But in any case the cost of calculating the powers b^(n*2^i) must first be overcome.
GET_STR_PRECOMPUTE_THRESHOLD
and GET_STR_DC_THRESHOLD
represent
the same basic thing, the point where it becomes worth doing a big division to
cut the input in half. GET_STR_PRECOMPUTE_THRESHOLD
includes the cost
of calculating the radix power required, whereas GET_STR_DC_THRESHOLD
assumes that’s already available, which is the case when recursing.
Since the base case produces digits from least to most significant but they
want to be stored from most to least, it’s necessary to calculate in advance
how many digits there will be, or at least be sure not to underestimate that.
For GMP the number of input bits is multiplied by chars_per_bit_exactly
from mp_bases
, rounding up. The result is either correct or one too
big.
Examining some of the high bits of the input could increase the chance of getting the exact number of digits, but an exact result every time would not be practical, since in general the difference between numbers 100… and 99… is only in the last few bits and the work to identify 99… might well be almost as much as a full conversion.
The r/b^n scheme described above for using multiplications to bring out digits might be useful for more than a single limb. Some brief experiments with it on the base case when recursing didn’t give a noticeable improvement, but perhaps that was only due to the implementation. Something similar would work for the sub-quadratic divisions too, though there would be the cost of calculating a bigger radix power.
Another possible improvement for the sub-quadratic part would be to arrange for radix powers that balanced the sizes of quotient and remainder produced, i.e. the highest power would be an b^(n*k) approximately equal to sqrt(t), not restricted to a 2^i factor. That ought to smooth out a graph of times against sizes, but may or may not be a net speedup.