Replacement mpn/generic/mul.c

Torbjorn Granlund tg at gmplib.org
Fri Dec 18 20:32:59 CET 2009


bodrato at mail.dm.unipi.it writes:

  I can understand why it calls mpn_mul when the final un < vn, but the else
  branch should look exactly like the else branch of the previous "if (2 *
  un >= 5 * vn)", with ws replacing prodp; saving a new allocation of the
  scratch.
  
This is a flaw.

The idea was that the toomX3 block should be used for larger operands.
The  BELOW_THRESHOLD (vn, MUL_TOOM44_THRESHOLD) should go away.

Then the mpn_mul call makes more sense--we can have operands in the
toom44 range.

Do you agree?

  About the scratch, in both the toomX2 and the toomX3 variants you
  #define ITCH ((un + vn) * 4 + 100)
  but assume un=70*vn... you are allocating a lot of space, but you will
  call all the lower-level toom with at most un=3*vn (or 5/2 vn for toomX3)
  
  Maybe
  #define ITCH ((3*vn + vn) * 4 + 100)
  and
  #define ITCH ((5*vn + 2*vn) * 2 + 100)
  are enough?

Yes, the ITCH definition is not good (it was inherited from the old code).
  
I am not sure about your exact expressions, though.  Perhaps toom53
might need as much 15vn?  You're suggesting 14vn?

  > * The toomX4 code now consists of just a call to toom44_mul.  The looping
  > is gone.
  
  I have some doubt about using FFT when the operand are very unbalanced.
  
  Let us assume FFT threshold is 2000. If Toom-4 threshold is less than 500,
  then we will compute a 1500x500 product with FFT. This may be a good idea.
  
  But what about greater unbalancements? 3000x500, 5000x500.... let us
  examine the general (500k)x500 case.
  
  Complexity of the school-book algorithm (or any toomXn iteration) grows
  linearly with the growing k. FFT grows at least with O(k log(k)).
  Iteration will eventually will be faster than FFT...
  Even for (un+vn) above the FFT line, it is maybe wise to iterate if the
  product is very unbalanced e.g. for un>10vn. The iterated algorithm can be
  FFT itself...
  
I completely agree.

Question is: What condition should we add as a disjunction for entering
the toomX3 block?

Your analysis implies that is should be a function if both un and vn, of
course...

Thanks much for your quick feedback!  I have attached an updated
version.  The main change is that the "X3" and "X4" blocks are now
entered with a common condition, a condition that just excludes FFT
operands.  They are then separated with a cimple MUL_TOOM44_THRESHOLD +
TOOM44_OK condition.

(I have used coverage analysis to make sure all code is executed.)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul.c
Type: application/octet-stream
Size: 9579 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091218/fad4ac81/attachment.obj>
-------------- next part --------------

-- 
Torbj?rn


More information about the gmp-devel mailing list