Performance of addlsh_n and sublsh_n

Torbjorn Granlund tg at
Sat Feb 5 17:02:39 CET 2011

Torbjorn Granlund <tg at> writes:

  For AMD, Niels shows that we have little chances of beating an addmul_1
  loop.  I have not worked on that; starting with the existing 2.5 c/l
  code, adding a new pointer using the already setup indexing, should not
  be hard.

Well, it is not that easy.

The 2.5 c/l code uses an add-to-memory, and we can not possibly
stay at 2.5 c/l without that.  With a separate destination ptr,
we cannot use add-to-memory.

An addmul_1 with separate destination will run at >= 2.833 c/l.


More information about the gmp-devel mailing list