arm "neon"

Niels Möller nisse at lysator.liu.se
Mon Jan 14 18:45:54 CET 2013


Torbjorn Granlund <tg at gmplib.org> writes:

> Note that there are two *parallel* recurrency paths, one over over cya
> and one over cyb.  Pairwise adjacent umaal have a dependency, but that's
> of the benign, non-recurrent type.

I don't fully understand it, but at a closer look it appears that there
*are* indeed independent umaal operations.

E.g., the first two in the loop

	umaal	r4, cya, u1, v0
	.. store and reload r4 ...
	umaal	r5, cyb, u1, v1

If we use registers like

  d0:  v0, v1
  d1:  u1, u1
  d2:  r4, r5
  d3:  cya, cyb

precisely the same operations could be done with neon instructions as

  vmull.u32	q3, d0, d1
  vaddl.u32	q4, d2, d3
  vadd		q4, q4, q3

Do you agree? It would be 4 cycles on a9, 3 on a15. And then there will
be some data movements needed as well.

Regards,
/Niels


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list