Shared toom evaluation functions

Torbjorn Granlund tg at gmplib.org
Sat Nov 14 19:18:19 CET 2009


nisse at lysator.liu.se (Niels Möller) writes:

  bodrato at mail.dm.unipi.it writes:
  
  > My mpn_toom_ev_lsh can be used for your mpn_toom_eval_pm2, but it can
  > evaluate also _pm4, or _pm8 as needed by higer degree Toom.
  
  Makes sense! I thought _pm2 only needed mpn_addlsh1_n (or falls back
  to separate shift and add), but it actually uses the more general
  mpn_addlsh_n. (Don't know which platforms actually have these
  functions).
  
Several have mpn_addlsh1_n, and they run up to 2x faster than separate
lshift and add_n.  (Same goes for sub.)  No machine or almost no has
mpn_addlsh_n, since it has proven tricky to make fast.

We should use mpn_addlsh1_n in more places I think, even for the s,t
related computations, such as pm2 in toom72.  That will be a bit tricky,
and will require a compare that cannot use mpn_cmp (but typically these
compares neeed to look at just one limb paot).

(I am enabling a missed trivial case of mpn_addlsh1_n in toom52_mul.)

  PS. Speaking of combination functions available only on some
  platforms, mpn_add_n_sub_n code seems to not be well tested, the
  toom52 in the tree contains the following
  
  #if HAVE_NATIVE_mpn_add_n_sub_n
    if (mpn_cmp (a0a2, a1a3, n+1) < 0)
      {
        mpn_add_n_sub_n (as2, asm2, a1a3, a0a2, n+1);
        flags ^= toom6_vm1_neg;
      }
    else
      {
        mpn_add_n_sub_n (as2, asm2, a0a2, a1a3, n+1);
      }
  #else
    mpn_add_n (as2, a0a2, a1a3, n+1);
    if (mpn_cmp (a0a2, a1a3, n+1) < 0)
      {
        mpn_sub_n (asm2, a1a3, a0a2, n+1);
        flags ^= toom6_vm2_neg;
      }
    else
      {
        mpn_sub_n (asm2, a0a2, a1a3, n+1);
      }
  #endif
  
This seems to have been forgotten.  I fixed it now.  

-- 
Torbjörn


More information about the gmp-devel mailing list