Some secondary asm T3,T4,T5 functions

Torbjorn Granlund tg at
Tue Apr 2 21:31:25 CEST 2013

David Miller <davem at> writes:

  This turned out to be easy, you were using %o5 as a register for
  'dinv' but this gets clobbered elsewhere in the code, using %o4
  instead fixes the problems.
Well, I suppose that was another of my "safe" last-minute fixes.  :-)

  Attached is a dive_1.asm that works for me on real hardware as
  well as T4 timings from:
  tune/speed -p10000000 -s1-1000 -f1.1 -C mpn_divexact_1.3
Terrible speed, as expected on these machines for code that relies on
mul *latency*.

We will need to compute d^(-1) mod B^2 (or B^k, k > 2) where B is the
limb base.

With such an inverse, we will develop k quotient limbs at a time, using
several *independent* limb multiplies.

There is an bdiv_qr_1_pi2 lurking, which does this for k = 2.


More information about the gmp-devel mailing list