[PATCH v3 1/2] MIPS r6 support
YunQiang Su
syq at debian.org
Tue Apr 16 13:06:04 UTC 2019
Torbjörn Granlund <tg at gmplib.org> 于2019年4月16日周二 下午8:21写道:
>
> I took a look at the new assembly code.
>
> GMP's assembly code is optimised for many CPU implementations, not just
> for specific ISAs. In some cases, we provide generic, less complex
> assembly functions. The latter happens when we don't have access to
> actual CPUs where we could run timing tests.
>
> It is important to keep complexity down except when complexity is can be
> motivated by performance gains.
>
> When I look at your assembly code, it looks a lot like the pre-existing
> code which was optimised for some old MIPS CPUs. IIRC, it ran well on
> the in-order R4000, R5000, and OK also on R10000 and R12000. Yes, these
> are very old CPUs!
>
> The code uses an old style which we no longer use, with an unrolled main
> loop and a separate loop handling the first few iterations. It was
> adequate for the time, and it gave a nice speedup. Unfortunatly, it
> adds a large constant overhead.
>
The situation is just you said: it is bad.
While we have to do it like this, since it is not for make r6
performance better,
while just make it work.
The r6 is not compatible with the previous ones: it drops some instructions....
> The code you now contribute looks like our old code with replaced
> multiply instructions, except that when there were no multiply
> instructions, the new code looks like copies of the old code but now
> with a new path (a new r6 subdirectory).
>
> Creating copies of our existing code does not make much sense. An r6
> subdirectory should contain code which takes advantage of new features.
>
> And generic r6 code should not be optimised for old in-order pipelines
> like R4000, which it seems to be now, due to how it was produced. (As a
> matter of fact, that old code is surely bad also for R4000 with today's
> GMP coding standards!)
>
> It would be useful to have plain r6 code for r6 CPUs, i.e. we should
> provide alternative to code which uses the hi and lo registers. Such
> plain code might not do any unrolling, and only limited software
> pipelining. It would also be useful to have code optimised for any r6
> CPU, which I very much doubt looks like respun overscheduled R4000 code.
We have to do so is due to that, r6 removes the HI/LO registers, and
multu/dmultu at all.
In r6, it use GPR and mulu/muhu.
So, you can even claim that r6 is different arch with the pre-r6 ones.
This path is only used on new r6 architectures, the pre-R6, like R4000,
continue to use the old code.
>
> --
> Torbjörn
> Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list