  Yes. I am working on it. And will send new version of patch soon.

There is no shame in discussing approaches too, and it might even save a
lot of time.

The existing mips assembly code is completely outdated.  According to
mpn/mips64/README it was written with MIPS R4x00 and R8000 in mind.
These are 20+ year old CPUs.

Besides being for long gone CPUs, it is written in a style we no longer

0. We unroll critical loops.  Only for CPUs where the multiply
instructions are silly slow, we use plain loops for things like
mul_1.asm, addmul_1.asm, etc.

1. We no longer have two loops for unrolled code, but instead jump into
different places in the unrolled loop.

2. We tend to make the code run very close to optimal for each hardware

We should never forget that asm code comes with a maintenance cost and
testing cost.  We could carefully evaluate if asm is needed at all.

