mulmid

Wed Oct 5 09:02:09 CEST 2011

David Harvey <d.harvey at unsw.edu.au> writes:

  If we have enough trust in refmpn_mul, and if refmpn_mul is
  "asymptotically fast", then that is the simplest solution.

It is asymptotically fast, actually.  I made it like that to allow us to
test the full range of multiply functions; if refmpn_mul had been O(n^2)
we could not have tested the FFT multiply code.

  > 	shr	$1, %al		C restore carry
  > 
  >    can be replaced by
  > 
  >        shr	%al		C restore carry
  > 
  >    for a bit more compact object code.

  Hmmm. Next time I will keep that in mind, I didn't realise the object
  code is different. I'm slightly worried that such a change might
  affect the speed, these processors can be quite fickle. For now I
  suggest you change all the occurrences that don't occur in the inner
  loops. Unless you have time to check the inner loop performance, I
  suggest leaving the ones in the inner loops for now (perhaps add a
  comment to this effect).

Different binutils (the package with the GNU assembler) seem to handle
the two operand shift instructions differently.  It seems newer binutils
misassembles the two operand form into the one operand form.

This means that we might get different code for different systems.  If
we need the two operand form for alignment, then we will unfortunately
need to use the .byte form.

  Definitely I agree invert_appr() is one of the first things we should
  try as an application of the mulmid code.

Another thing to try is mpn_sqrt.  It is currently division based.  An
iteration computing B = 1/sqrt(A) should be tried; it uses just
multiplication.  (We have sqrt(A) = B * A.)

-- 
Torbjörn