Some secondary asm T3,T4,T5 functions
Torbjorn Granlund
tg at gmplib.org
Tue Apr 2 21:31:25 CEST 2013
David Miller <davem at davemloft.net> writes:
This turned out to be easy, you were using %o5 as a register for
'dinv' but this gets clobbered elsewhere in the code, using %o4
instead fixes the problems.
Well, I suppose that was another of my "safe" last-minute fixes. :-)
Attached is a dive_1.asm that works for me on real hardware as
well as T4 timings from:
tune/speed -p10000000 -s1-1000 -f1.1 -C mpn_divexact_1.3
Terrible speed, as expected on these machines for code that relies on
mul *latency*.
We will need to compute d^(-1) mod B^2 (or B^k, k > 2) where B is the
limb base.
With such an inverse, we will develop k quotient limbs at a time, using
several *independent* limb multiplies.
There is an bdiv_qr_1_pi2 lurking, which does this for k = 2.
--
Torbjörn
More information about the gmp-devel
mailing list