middle products
David Harvey
dmharvey at cims.nyu.edu
Wed Jun 17 23:54:58 CEST 2009
On Jun 17, 2009, at 5:31 PM, Torbjorn Granlund wrote:
> David Harvey <dmharvey at cims.nyu.edu> writes:
>
> I think the reason for this is that the K8 mulmid_basecase uses
> internally an addmul_2 loop that runs at 2.375 cycles/limb, whereas
> the division routines are only using addmul_1, which runs at 2.5
> cycles/limb. If the division code did two rows at a time, it would
> probably push this threshold somewhat higher (surely at least to the
> mulmid karatsuba threshold)
>
> Maybe, maybe not.
Actually, what I said before was nonsense. There is no direct
comparison to be made.
Suppose that n * n multiplication costs n^2. (I ignore linear terms
in this email).
Then schoolbook 2n * n division with remainder also costs n^2.
Now what about middle-product-based reciprocal, assuming one
recursion layer before switching to mpn_tdiv_qr?
It should be:
-- one schoolbook n * n/2 division = n^2/4
-- one n * n/2 middle product = n^2/4
-- one n/2 * n/2 product = n^2/4
so the total is only (3/4)*n^2, less than the n^2 incurred by
mpn_tdiv_qr. So maybe this is the reason it wins so early, regardless
of the 2.375 vs 2.5 issue.
Now here is a best case scenario. Suppose the reciprocal is done
using quotient-only-division. Then it's reasonable to assume the cost
is n^2/2. The middle-product-based version will incur:
-- one schoolbook n * n/2 quotient-only-division = n^2/8
-- one n * n/2 middle product = n^2/4
-- one n/2 * n/2 truncated product (mulhi) = n^2/8
and of course the total is still n^2/4. It's pretty much doing the
same products in a different order.
I guess my point is: if everything was implemented "properly", then
the middle-product-based algorithm shouldn't be helpful until much
larger n; the only reason it should help is that it is more efficient
than truncated products for larger n.
david
More information about the gmp-devel
mailing list