Some div_qr_2 assembler

Niels Möller nisse at
Tue Mar 22 13:25:59 CET 2011

I've written a x86_64 loop for mpn_div_qr_2_pi1_norm, using 3/2
division. I had difficulty understanding the related assembler
implementation of divrem_2, so I wrote the new function from scratch.

Currently the new function is roughly one c/l slower than divrem_2 (36
c/l vs 35), but without any deep analysis, I hope it can be optimized to
gain one or a few cycles. It ought to be latency-limited.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: div_qr_2_pi1_norm.asm
URL: <>
-------------- next part --------------

Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list