itch/scratch interface for mul_n
Niels Möller
nisse at lysator.liu.se
Wed Oct 7 13:45:03 CEST 2009
Hi,
I think the next stage in adoption of itch/scratch interfaces for the
mpn functions should be balanced multiplication, mpn_mul_n. That has
to be done, before even trying to fully do itch/scratch interfaces for
unbalanced multiplication and the advanced division algorithms.
I imagine it might be good for performance to inline the first test,
i.e., do something like
#define MPN_MUL_N_ITCH(n) \
(BELOW_THRESHOLD (n, MUL_KARATSUBA_THRESHOLD) ? 0 : mpn_mul_n_itch(n))
although that's not so nice for binary compatibility... One might
consider some MUL_KARATSUBA_MIN_THRESHOLD which would be part of the
ABI. Or to use the above macro only for internal calls and only export
the real, non-inlined, function mpn_mul_n_itch. (Or maybe always
inline the test for the first threshold, and call either mul_basecase
or mpn_mul_n_itch + mpn_mul_n). There seems to be a lot of ways to do
things.
Anyway, when looking at mul_n.c, I have a couple of comments and
questions:
* itchifying the function should be fairly straight-forward, using the
same sequence of size checks. I guess one should introduce some
wrapper function with the same interface as the current mpn_mul_n,
to be able to update callers more incrementally?
* the code uses its own karatsuba implementation, and doesn't use
mpn_toom22_mul. Is there any real performance gain for special
casing balanced case? Or should mpn_kara_mul_n be deleted, replacing
all uses with calls to mpn_toom22_mul? (And for squaring,
mpn_kara_sqr_n and mpn_toom2_sqr are two implementatiions of exactly
the same thing, so I guess there's no question that mpn_kara_sqr_n
should be deleted?)
Same goes for toom-3, and there the overhead for the general
function, comparerd to a balanced function, should be less
significant than for Karatsuba.
* At the end of mpn_mul_n, there are some conditionals like
else
#if WANT_FFT || TUNE_PROGRAM_BUILD
{
/* The current FFT code allocates its own space. That should probably
change. */
mpn_mul_fft_full (p, a, n, b, n);
}
#else
{
/* Toom4 for large operands. */
mp_ptr ws;
TMP_DECL;
TMP_MARK;
ws = TMP_BALLOC_LIMBS (mpn_toom44_mul_itch (n, n));
mpn_toom44_mul (p, a, n, b, n, ws);
TMP_FREE;
}
#endif
}
There's no threshold between Toom-4 and FFT, it's one or the other
decided at compile time. Why is that? Does toom-4 never beat FFT?
* If all the algorithms called by mpn_mul_n (basecase, toom2, toom3,
toom4, fft), allow unbalanced inputs, does it still make sense to
keep a separate function for balanced multiplication? I think it
still might do, because thresholds and allocation will be lot easier
than the general unbalanced case.
Regards,
/Niels
More information about the gmp-devel
mailing list