longlong.h and cpu type vectoring...

Tue Apr 9 21:24:59 CEST 2013

David Miller <davem at davemloft.net> writes:

  > I added such code, but since I couldn't find any cpp symbol which is
  > triggered by -mvis2, I invented a name we need to set ourselves.

  CPP will define __VIS__ >= 0x200 in that case (and likewise >= 0x300
  for -mvis3).

Thanks, I missed that one in my experimentation.

  But traditionally GMP has used local defines from
  config.h the trigger these kinds of cpu specific optimized routines in
  longlong.h, f.e. alpha does this.

I think we only do that when compilers have been non-cooperative, since
we'd like to avoid GMPisms in longlong.h if we can.

  > Incidentally, we don't set any -mvis options in configure.ac.

  Yes, and we use -mcpu=ultrasparc for all cpu types.

This ought to be fixed.

  > I am somewhat aware of this.  This is both a correctness issue (i.e.,
  > not using unavailable instructions) and efficiency issue (use available
  > instructions for critical code).
  > 
  > We can counter the efficiency reason by concentrating critical code to
  > few loops in C or assembly, which we then put into __gmpn_cpuvec.  Then,
  > we could put more slightly higher-level routines in __gmpn_cpuvec, which
  > can then without overhead call other truly low-level functions.

  That's my argument, leave the most generic code in longlong.h and if
  something is critical.. then we have an assembler variant.

This is not how we usually do things, and there are arguments against
that as GMP is currently organised.  Look for all usages of umul_ppmm,
spread all over GMP.  For a fat binary, we will need to cope with its
absence, if it is not generally available.

I'd like to work on supporting more fat binaries and also work on making
those as close to "lean" binaries in performance.  We should not do that
by missing out on optimisation opportunities for lean binaries.

  Sparc poses more difficulties.  Unfortunately, Sun, in their infinite
  wisdom, marks every ELF object with the cpu features used by that ELF
  object.

  And the Solaris kernel exec() implementation as well as the dynamic
  linker will refuse to load an object that claims to use features the
  current cpu doesn't have.

How very silly of them.

  (I don't do such checks on Linux, because I know they are stupid and
  pointless and block real work)

  A fat binary would obviously create such a conflict, so when doing fat
  binaries we would have to do one of:

  1) Use elfedit to edit out the feature bits of the final linked object.

  2) Use macros to emit ".word OPCODE" type things for v9a and later
     instructions.  This is what we do in openssl.

  I think the former is easier, but with the latter we can build on
  systems that lack assembler support for the newer instructions.

It is not too hard to do these types of tricks in m4; we in fact do that
for some x86 instructions already.

-- 
Torbjörn