Wraparound multiplicaton vs mullo
    Torbjorn Granlund 
    tg at gmplib.org
       
    Thu Oct 22 15:25:39 CEST 2009
    
    
  
nisse at lysator.liu.se (Niels Möller) writes:
  When n has a single trailing zero, performance is about the same as
  x^2-1 (not surprising), saving around 25% compared to a full
  multiplication. I'm still benchmarking over range 2-512 limbs, on an
  x86_32.
  
  When n has two trailing zerros, the dc algorithm beats both x^2-1 and
  x^4-1, with typical savings of 25%, 32% and 35%.
  
  When n has three trailing zeros, the dc algorithm gives a saving of
  39%. Increasing the number of trailing zeros further gives pretty
  small additional improvements in this range.
  
I suppose some maths on your table has the answer, but I ask anyway:
Does size 2^t m - k ever run slower than size 2^t m?  Than it could help
to pad out one's operands (for the presumed main usages of this new
trick Newton, ab-cd with cancellation, etc).  An interface would be
needed for that.
  So for numbers of a couple of hundred limbs, multiplication mod 2^n -
  1 is 67% faster (40% time saved) compared to full multiplication.
  
Wow!  Which are the largest n for which it is still faster than full
multiplication?
-- 
Torbjörn
    
    
More information about the gmp-devel
mailing list