Some arm cortex-a8 improvements

Fri Apr 20 18:34:04 CEST 2012

Richard Henderson <rth at twiddle.net> writes:

  Three patches herein.  If there's a better way to submit patches,
  please advise; I've never used hg before.

  The first patch gives gcc control over ctz/clz.  Particularly for
  armv6t2 and later, which have rbit for use for ctz.

  The second patch improves multiplication a bit.  I'm still playing
  with addmul_2, but this is a start for addmul_1/mul_1.  I couldn't
  do better than the existing submul_1.  Unfortunately the Xscale
  machines in the gcc build farm are turned off, so I can't test to
  see if I've regressed on that platform.

  The third patch tidies up add_n/sub_n, and provides for the carry-in
  entry points.

The GMP project now finally has an ARM system in the test environment,
so now we will implement ARM improvements.  I have taken a brief look at
your work, and it provides nice improvements.

I suppose we should make a few subdirs such as arm/a8 and arm/a9, to
make sure we don't optimise for one CPU and pessimise for another.

  It's a bit touchy speed testing these.  There's no cycle counter
  available in userspace, and Hz is depressingly low.  So I've had
  to bump the minimum iterations way way up in order to get semi-
  reliable results.  Which causes the speed testing to take quite
  a long time.

What did you do to make it work?

I always get "Fatal error: too many (11) failed measurements (0.0)"
on any arm system.

-- 
Torbjörn