Some div_qr_2 assembler

Niels Möller nisse at lysator.liu.se
Tue Mar 22 13:25:59 CET 2011


I've written a x86_64 loop for mpn_div_qr_2_pi1_norm, using 3/2
division. I had difficulty understanding the related assembler
implementation of divrem_2, so I wrote the new function from scratch.

Currently the new function is roughly one c/l slower than divrem_2 (36
c/l vs 35), but without any deep analysis, I hope it can be optimized to
gain one or a few cycles. It ought to be latency-limited.

/nisse

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: div_qr_2_pi1_norm.asm
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20110322/c20185ee/attachment.ksh>
-------------- next part --------------

-- 
Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list