Some arm cortex-a8 improvements

Torbjorn Granlund tg at gmplib.org
Sun Apr 22 22:06:08 CEST 2012


Richard Henderson <rth at twiddle.net> writes:

  I used the following, almost certainly not appropriate for general application.
  
[snip]

Thanks.  I would be very useful to make GMP timing work with the kernel
Linux running om ARM.  Do you know if there are similar problems with,
say, NetBSD?

I have checked in major ARM improvements the last few days, inspired by
your patches.  I have an A9 but no A8, so I have optimised just for the
former.  I have left the top-level mpn/arm code largely unmodified in
order to keep supporting older v4 and v5 arch CPUs; new multiply code
using umaal resides in the directory mpn/arm/v6.  (Some new division
code still in the forge will appear in mpn/arm/v5, due to its use of
clz.)

The new code is carefully software pipelined, and mul_1 and addmul_1 run
faster than both the old code and your patched code, at least on A9.
Could you please try it on A8 and see if it is at least as fast as your
code there?  If it is slower, we need to try to make innocent
modifications that doesn't hurt A9, or if that turns out to be hard,
provide several functions and choose asm code not only based on
architecture, but also on exact core.

I would appreciate if you timed all new and modified functions on A8.

If you have ARMv4 (e.g., StrongARM) and/or ARMv5 (e.g., XScale) I would
appreciate if you could check if they still work after the latest
changes.

Do you know if there is a portable mechanism for recognising an ARM
core, akin to x86's cpuid?

-- 
Torbjörn


More information about the gmp-devel mailing list