[PATCH v3 1/2] MIPS r6 support
syq at debian.org
Tue Apr 16 13:06:04 UTC 2019
Torbjörn Granlund <tg at gmplib.org> 于2019年4月16日周二 下午8:21写道：
> I took a look at the new assembly code.
> GMP's assembly code is optimised for many CPU implementations, not just
> for specific ISAs. In some cases, we provide generic, less complex
> assembly functions. The latter happens when we don't have access to
> actual CPUs where we could run timing tests.
> It is important to keep complexity down except when complexity is can be
> motivated by performance gains.
> When I look at your assembly code, it looks a lot like the pre-existing
> code which was optimised for some old MIPS CPUs. IIRC, it ran well on
> the in-order R4000, R5000, and OK also on R10000 and R12000. Yes, these
> are very old CPUs!
> The code uses an old style which we no longer use, with an unrolled main
> loop and a separate loop handling the first few iterations. It was
> adequate for the time, and it gave a nice speedup. Unfortunatly, it
> adds a large constant overhead.
The situation is just you said: it is bad.
While we have to do it like this, since it is not for make r6
while just make it work.
The r6 is not compatible with the previous ones: it drops some instructions....
> The code you now contribute looks like our old code with replaced
> multiply instructions, except that when there were no multiply
> instructions, the new code looks like copies of the old code but now
> with a new path (a new r6 subdirectory).
> Creating copies of our existing code does not make much sense. An r6
> subdirectory should contain code which takes advantage of new features.
> And generic r6 code should not be optimised for old in-order pipelines
> like R4000, which it seems to be now, due to how it was produced. (As a
> matter of fact, that old code is surely bad also for R4000 with today's
> GMP coding standards!)
> It would be useful to have plain r6 code for r6 CPUs, i.e. we should
> provide alternative to code which uses the hi and lo registers. Such
> plain code might not do any unrolling, and only limited software
> pipelining. It would also be useful to have code optimised for any r6
> CPU, which I very much doubt looks like respun overscheduled R4000 code.
We have to do so is due to that, r6 removes the HI/LO registers, and
multu/dmultu at all.
In r6, it use GPR and mulu/muhu.
So, you can even claim that r6 is different arch with the pre-r6 ones.
This path is only used on new r6 architectures, the pre-R6, like R4000,
continue to use the old code.
> Please encrypt, key id 0xC8601622
More information about the gmp-devel