arm "neon"

Torbjorn Granlund tg at gmplib.org
Mon Jan 14 17:16:42 CET 2013


nisse at lysator.liu.se (Niels Möller) writes:

  The idea was that u0, u1 is the loop-invariant operand, and the above is
  for one iteration processing only a single limb from v.
  
Ehum.  Perhaps we should change to that cnvention, but until we've done
that, sticking to the current will improve my understanding...

  A sum of 32-bit values can be accumulated into 64-bit register. But if
  we want to accumulate 64-bit values, i.e., limb products, it gets
  tricky.
  
It cannot be done, except with lots of contortions.

One can add 32-bit things to a 64-bit product without problems, at least
one may add two such things, since ((2^32-1)^2 + (2^32-1) + (2^32-1)) =
B^2 - 1 just fits a two-word accumulator.

  > having a non-zero operand in the high part wouldn't work unless we use
  > nails, since else it would overflow.
  
  Maybe it's a poor way to think about addmul_2 to collect the two
  products involving a single v limb. I'm not really familiar with how
  current assembly loops are organized (if I ever looked into it, I'm
  afraid I've forgotten...).
  
There are lots of variations...

  > Neat with just umaal and ld/st...
  
  Definitely neat. I had a quick look, but I'll need a bit more time to
  digest it.
  
Note that there are two *parallel* recurrency paths, one over over cya
and one over cyb.  Pairwise adjacent umaal have a dependency, but that's
of the benign, non-recurrent type.

-- 
Torbjörn


More information about the gmp-devel mailing list