Some div_qr_2 assembler
Niels Möller
nisse at lysator.liu.se
Tue Mar 22 13:25:59 CET 2011
I've written a x86_64 loop for mpn_div_qr_2_pi1_norm, using 3/2
division. I had difficulty understanding the related assembler
implementation of divrem_2, so I wrote the new function from scratch.
Currently the new function is roughly one c/l slower than divrem_2 (36
c/l vs 35), but without any deep analysis, I hope it can be optimized to
gain one or a few cycles. It ought to be latency-limited.
/nisse
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: div_qr_2_pi1_norm.asm
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20110322/c20185ee/attachment.ksh>
-------------- next part --------------
--
Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list