Some basic questions on the invert_limb code

Tue Nov 19 13:37:05 UTC 2013

Anil Singhar <anil.singhar at linaro.org> writes:

  Yes, I started off with 5.1.3 code and have already spent more than a month now
  coding the MPN functions in aarch64 assembly along the lines of ARM. I did so
  assuming the advisory mentioned in the gmp manual that these functions are
  generally implemented in assembly for speed. Sorry, I didn't understand the
  full scope your first reply to my query in October. I got stuck with
  invert_limb since it was bit non-trivial to implement and hence decided to ask.

You should compare your code to the code we already have.  (Did I
mention the GMP code repository...?)

Perhaps you have implemented something better than us, or have a more
complete set of functions.

We only implememted a basic set, where speedup for real hardware seems
likely, no matter how its pipeline works.

The most critical operation for any CPU-specific GMP optimisation
project is limb product accumulation.  Since A57 presumably can only
form a 64 x 64 -> 128 bit product every 7th cycle, at least when using
"core register" operations, one needs to look into alternative, SIMD
formulations.  Such code is quite tricky, but should give at least a
2-fold speedup on A57.  I don't think such code should be written
without hardware (or cycle accurate simulation).

-- 
Torbjörn