sqrt algorithm
    Torbjörn Granlund 
    tg at gmplib.org
       
    Mon Aug  3 10:02:36 UTC 2015
    
    
  
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
  > And for small sizes, one ought to sqrlo rather than mulmod_bnm1 to
  > exploit the cancellation. So the tuning would be between sqrlo and
  > sqrmod_bnm1. But it seems we only have mullo, not sqrlo?
  
  Right, we currently don't have sqrlo...
  
I don't think I ever considered a sqrlo.
All the *_basecase assembly code is a major PITA to write for all
relevant CPUs.  The effort behind what we have today is considerable,
much more work is needed.  E.g., we only have 4 variant of
mullo_basecase, while we have around 20 variant of mul_basecase and
sqr_basecase.
To improve the situation, we ought to look into semi-automatic
*_basecase generation.  The input would be annotated innerloops of
addmul_(k), mul_(k), and sqr_addlsh1.  A special compiler would take
such pieces and generate outter loops, cleverly reusing pointers between
iterations.
This is not easy, but it would save a lot of time.
-- 
Torbjörn
Please encrypt, key id 0xC8601622
    
    
More information about the gmp-devel
mailing list