Replacement mpn/generic/mul.c
Torbjorn Granlund
tg at gmplib.org
Fri Dec 18 20:32:59 CET 2009
bodrato at mail.dm.unipi.it writes:
I can understand why it calls mpn_mul when the final un < vn, but the else
branch should look exactly like the else branch of the previous "if (2 *
un >= 5 * vn)", with ws replacing prodp; saving a new allocation of the
scratch.
This is a flaw.
The idea was that the toomX3 block should be used for larger operands.
The BELOW_THRESHOLD (vn, MUL_TOOM44_THRESHOLD) should go away.
Then the mpn_mul call makes more sense--we can have operands in the
toom44 range.
Do you agree?
About the scratch, in both the toomX2 and the toomX3 variants you
#define ITCH ((un + vn) * 4 + 100)
but assume un=70*vn... you are allocating a lot of space, but you will
call all the lower-level toom with at most un=3*vn (or 5/2 vn for toomX3)
Maybe
#define ITCH ((3*vn + vn) * 4 + 100)
and
#define ITCH ((5*vn + 2*vn) * 2 + 100)
are enough?
Yes, the ITCH definition is not good (it was inherited from the old code).
I am not sure about your exact expressions, though. Perhaps toom53
might need as much 15vn? You're suggesting 14vn?
> * The toomX4 code now consists of just a call to toom44_mul. The looping
> is gone.
I have some doubt about using FFT when the operand are very unbalanced.
Let us assume FFT threshold is 2000. If Toom-4 threshold is less than 500,
then we will compute a 1500x500 product with FFT. This may be a good idea.
But what about greater unbalancements? 3000x500, 5000x500.... let us
examine the general (500k)x500 case.
Complexity of the school-book algorithm (or any toomXn iteration) grows
linearly with the growing k. FFT grows at least with O(k log(k)).
Iteration will eventually will be faster than FFT...
Even for (un+vn) above the FFT line, it is maybe wise to iterate if the
product is very unbalanced e.g. for un>10vn. The iterated algorithm can be
FFT itself...
I completely agree.
Question is: What condition should we add as a disjunction for entering
the toomX3 block?
Your analysis implies that is should be a function if both un and vn, of
course...
Thanks much for your quick feedback! I have attached an updated
version. The main change is that the "X3" and "X4" blocks are now
entered with a common condition, a condition that just excludes FFT
operands. They are then separated with a cimple MUL_TOOM44_THRESHOLD +
TOOM44_OK condition.
(I have used coverage analysis to make sure all code is executed.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul.c
Type: application/octet-stream
Size: 9579 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091218/fad4ac81/attachment.obj>
-------------- next part --------------
--
Torbj?rn
More information about the gmp-devel
mailing list