Some arm cortex-a8 improvements
tg at gmplib.org
Fri Apr 20 18:34:04 CEST 2012
Richard Henderson <rth at twiddle.net> writes:
Three patches herein. If there's a better way to submit patches,
please advise; I've never used hg before.
The first patch gives gcc control over ctz/clz. Particularly for
armv6t2 and later, which have rbit for use for ctz.
The second patch improves multiplication a bit. I'm still playing
with addmul_2, but this is a start for addmul_1/mul_1. I couldn't
do better than the existing submul_1. Unfortunately the Xscale
machines in the gcc build farm are turned off, so I can't test to
see if I've regressed on that platform.
The third patch tidies up add_n/sub_n, and provides for the carry-in
The GMP project now finally has an ARM system in the test environment,
so now we will implement ARM improvements. I have taken a brief look at
your work, and it provides nice improvements.
I suppose we should make a few subdirs such as arm/a8 and arm/a9, to
make sure we don't optimise for one CPU and pessimise for another.
It's a bit touchy speed testing these. There's no cycle counter
available in userspace, and Hz is depressingly low. So I've had
to bump the minimum iterations way way up in order to get semi-
reliable results. Which causes the speed testing to take quite
a long time.
What did you do to make it work?
I always get "Fatal error: too many (11) failed measurements (0.0)"
on any arm system.
More information about the gmp-devel