Wraparound multiplicaton vs mullo

Thu Oct 22 15:25:39 CEST 2009

nisse at lysator.liu.se (Niels Möller) writes:

  When n has a single trailing zero, performance is about the same as
  x^2-1 (not surprising), saving around 25% compared to a full
  multiplication. I'm still benchmarking over range 2-512 limbs, on an
  x86_32.

  When n has two trailing zerros, the dc algorithm beats both x^2-1 and
  x^4-1, with typical savings of 25%, 32% and 35%.

  When n has three trailing zeros, the dc algorithm gives a saving of
  39%. Increasing the number of trailing zeros further gives pretty
  small additional improvements in this range.

I suppose some maths on your table has the answer, but I ask anyway:
Does size 2^t m - k ever run slower than size 2^t m?  Than it could help
to pad out one's operands (for the presumed main usages of this new
trick Newton, ab-cd with cancellation, etc).  An interface would be
needed for that.

  So for numbers of a couple of hundred limbs, multiplication mod 2^n -
  1 is 67% faster (40% time saved) compared to full multiplication.

Wow!  Which are the largest n for which it is still faster than full
multiplication?

-- 
Torbjörn