Shared toom evaluation functions
Torbjorn Granlund
tg at gmplib.org
Sat Nov 14 19:18:19 CET 2009
nisse at lysator.liu.se (Niels Möller) writes:
bodrato at mail.dm.unipi.it writes:
> My mpn_toom_ev_lsh can be used for your mpn_toom_eval_pm2, but it can
> evaluate also _pm4, or _pm8 as needed by higer degree Toom.
Makes sense! I thought _pm2 only needed mpn_addlsh1_n (or falls back
to separate shift and add), but it actually uses the more general
mpn_addlsh_n. (Don't know which platforms actually have these
functions).
Several have mpn_addlsh1_n, and they run up to 2x faster than separate
lshift and add_n. (Same goes for sub.) No machine or almost no has
mpn_addlsh_n, since it has proven tricky to make fast.
We should use mpn_addlsh1_n in more places I think, even for the s,t
related computations, such as pm2 in toom72. That will be a bit tricky,
and will require a compare that cannot use mpn_cmp (but typically these
compares neeed to look at just one limb paot).
(I am enabling a missed trivial case of mpn_addlsh1_n in toom52_mul.)
PS. Speaking of combination functions available only on some
platforms, mpn_add_n_sub_n code seems to not be well tested, the
toom52 in the tree contains the following
#if HAVE_NATIVE_mpn_add_n_sub_n
if (mpn_cmp (a0a2, a1a3, n+1) < 0)
{
mpn_add_n_sub_n (as2, asm2, a1a3, a0a2, n+1);
flags ^= toom6_vm1_neg;
}
else
{
mpn_add_n_sub_n (as2, asm2, a0a2, a1a3, n+1);
}
#else
mpn_add_n (as2, a0a2, a1a3, n+1);
if (mpn_cmp (a0a2, a1a3, n+1) < 0)
{
mpn_sub_n (asm2, a1a3, a0a2, n+1);
flags ^= toom6_vm2_neg;
}
else
{
mpn_sub_n (asm2, a0a2, a1a3, n+1);
}
#endif
This seems to have been forgotten. I fixed it now.
--
Torbjörn
More information about the gmp-devel
mailing list