[PATCH v3 1/2] MIPS r6 support
tg at gmplib.org
Tue Apr 16 12:21:18 UTC 2019
I took a look at the new assembly code.
GMP's assembly code is optimised for many CPU implementations, not just
for specific ISAs. In some cases, we provide generic, less complex
assembly functions. The latter happens when we don't have access to
actual CPUs where we could run timing tests.
It is important to keep complexity down except when complexity is can be
motivated by performance gains.
When I look at your assembly code, it looks a lot like the pre-existing
code which was optimised for some old MIPS CPUs. IIRC, it ran well on
the in-order R4000, R5000, and OK also on R10000 and R12000. Yes, these
are very old CPUs!
The code uses an old style which we no longer use, with an unrolled main
loop and a separate loop handling the first few iterations. It was
adequate for the time, and it gave a nice speedup. Unfortunatly, it
adds a large constant overhead.
The code you now contribute looks like our old code with replaced
multiply instructions, except that when there were no multiply
instructions, the new code looks like copies of the old code but now
with a new path (a new r6 subdirectory).
Creating copies of our existing code does not make much sense. An r6
subdirectory should contain code which takes advantage of new features.
And generic r6 code should not be optimised for old in-order pipelines
like R4000, which it seems to be now, due to how it was produced. (As a
matter of fact, that old code is surely bad also for R4000 with today's
GMP coding standards!)
It would be useful to have plain r6 code for r6 CPUs, i.e. we should
provide alternative to code which uses the hi and lo registers. Such
plain code might not do any unrolling, and only limited software
pipelining. It would also be useful to have code optimised for any r6
CPU, which I very much doubt looks like respun overscheduled R4000 code.
Please encrypt, key id 0xC8601622
More information about the gmp-devel