longlong.h and cpu type vectoring...

Mon Apr 8 17:40:55 CEST 2013

From: Torbjorn Granlund <tg at gmplib.org>
Date: Mon, 08 Apr 2013 17:17:47 +0200

> David Miller <davem at davemloft.net> writes:
> 
>   Through a twisty passage I came across this issue which we'll
>   need to address in some way in the near future.
>   
>   I noticed that the longlong.h included in gmp lacks a lot of
>   sparc optimizations, and in particular umul_ppmm doesn't use
>   mulx/umulxhi when it could.
>   
> I added such code, but since I couldn't find any cpp symbol which is
> triggered by -mvis2, I invented a name we need to set ourselves.

CPP will define __VIS__ >= 0x200 in that case (and likewise >= 0x300
for -mvis3).  But traditionally GMP has used local defines from
config.h the trigger these kinds of cpu specific optimized routines in
longlong.h, f.e. alpha does this.

> Incidentally, we don't set any -mvis options in configure.ac.

Yes, and we use -mcpu=ultrasparc for all cpu types.

> I am somewhat aware of this.  This is both a correctness issue (i.e.,
> not using unavailable instructions) and efficiency issue (use available
> instructions for critical code).
> 
> We can counter the efficiency reason by concentrating critical code to
> few loops in C or assembly, which we then put into __gmpn_cpuvec.  Then,
> we could put more slightly higher-level routines in __gmpn_cpuvec, which
> can then without overhead call other truly low-level functions.

That's my argument, leave the most generic code in longlong.h and if
something is critical.. then we have an assembler variant.

Sparc poses more difficulties.  Unfortunately, Sun, in their infinite
wisdom, marks every ELF object with the cpu features used by that ELF
object.

And the Solaris kernel exec() implementation as well as the dynamic
linker will refuse to load an object that claims to use feautres the
current cpu doesn't have.

(I don't do such checks on Linux, because I know they are stupid and
pointless and block real work)

A fat binary would obviously create such a conflict, so when doing fat
binaries we would have to do one of:

1) Use elfedit to edit out the feature bits of the final linked object.

2) Use macros to emit ".word OPCODE" type things for v9a and later
   instructions.  This is what we do in openssl.

I think the former is easier, but with the latter we can build on
systems that lack assembler support for the newer instructions.