sqrt algorithm

Torbjörn Granlund tg at gmplib.org
Mon Aug 3 10:02:36 UTC 2015

"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

  > And for small sizes, one ought to sqrlo rather than mulmod_bnm1 to
  > exploit the cancellation. So the tuning would be between sqrlo and
  > sqrmod_bnm1. But it seems we only have mullo, not sqrlo?
  Right, we currently don't have sqrlo...
I don't think I ever considered a sqrlo.

All the *_basecase assembly code is a major PITA to write for all
relevant CPUs.  The effort behind what we have today is considerable,
much more work is needed.  E.g., we only have 4 variant of
mullo_basecase, while we have around 20 variant of mul_basecase and

To improve the situation, we ought to look into semi-automatic
*_basecase generation.  The input would be annotated innerloops of
addmul_(k), mul_(k), and sqr_addlsh1.  A special compiler would take
such pieces and generate outter loops, cleverly reusing pointers between

This is not easy, but it would save a lot of time.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list