Wraparound multiplicaton vs mullo

Torbjorn Granlund tg at gmplib.org
Sat Oct 17 21:59:49 CEST 2009


nisse at lysator.liu.se (Niels Möller) writes:

  nisse at lysator.liu.se (Niels "=?iso-8859-1?Q?M=F6ller?=") writes:
  
  > Lets see what this means in Schoolbook, Karatsuba and Toom ranges,
  > ignoring the O(n) term:
  >
  >         Schoolbook  Karatsuba     Toom-3     Toom-4
  > exponent         2       1.59       1.46       1.40           
  >            
  > x^2 - 1  0.5  M(n)  0.66 M(n)  0.72 M(n)  0.76 M(n)
  > x^4 - 1  0.31 M(n)  0.56 M(n)  0.65 M(n)  0.71 M(n)
  
  I've tried the x^2 - 1 algorithm now, and benchmarked on an AMD
  x86_32. A first implementation is attached.
  
  It's faster than full multiplication, mpn_mul_n, for n >= 20, and
  faster than mpn_mullow_n for n >= 24 (current mpn_mullow_n uses
  quadratic algorithms in this range, while for larger sizes it uses the
  obvious divide-and-conquer strategy, not Mulders' enhancement).
  
  I've benchmarked up to n = 500, and in the range 40 <= n <= 500, it
  consistently saves approximately 25% compared to mpn_mul_n. Since this
  is mostly toom3 and toom4 range, it's quite close to the above
  analysis.

Exellent!
  
  /* mul_2nm1.c
   *
   * Experimental code for multiplication mod 2^n-1.
   */
  
Should this be B^n-1?

  /* Inputs and outputs are n limbs, computation is mod B^n - 1, and
   * values are seminormalized; zero is represented as either 0 or B^n -
   * 1. Needs 2n limbs at rp. */
  static void
  mul_2nm1_ref (mp_ptr rp, mp_srcptr ap, mp_srcptr bp, mp_size_t n)
  {
    mp_limb_t cy;
    mpn_mul_n (rp, ap, bp, n);
    cy = mpn_add_n (rp, rp, rp+n, n);
    /* If cy == 1, then the value of rp is at most B^n - 2, so there can
     * be no overflow when adding in the carry. */
    MPN_INCR_U (rp, n, cy);
  }
  
Do you represent 0 canonically as B^n-1?
(I suppose one could preincrement, then decrement if not carry,
to canonicalize to 0 instead.)

  mul_2nm1_2 (mp_ptr rp, mp_srcptr ap, mp_srcptr bp, mp_size_t size,

2nm1 is for 2^n minus 1?  Again, isn't B the correct base?
What is the _2 for?

You might want to add a LGPL 3 header to the code, without any header it
defaults to "all rights reserved".

-- 
Torbjörn


More information about the gmp-devel mailing list