GCD via Stein's algoriithm
R. Clint Whaley
rcwhaley at lsu.edu
Sun Nov 6 22:38:50 UTC 2016
GMP folks,
Hi, I do work on the computational library ATLAS:
http://math-atlas.sourceforge.net/
I mainly do floating point optimization, and so am pretty ignorant of
integral computation, and am entirely innocent of number theory.
However, I found myself needing a relatively fast least common multiple
for small integer numbers. I did some websearches, and ya'll's
documentation was about all I found before I gave up and just used
Stein's Algorithm for GCD:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
Anyway, I produced something that's about 50% faster than straight
Stein's algorithm (optimized using some bitlevel ops, but no inline
assembly) on my haswell-E and around 25% faster on my ARM64 a53 for
small problems. The main improvement over my earlier algorithm is
probably using some inline assembly and a loop peel allowing me to skip
some main-loop checks.
If I remember the doc of yours I read in my searches, you guys are using
some modified version of this algorithm for your *small* GCD
computations as well, is that correct?
If it is, I thought you might want to see how badly your implementation
crushes mine. I'm happy to donate the code with GPL or BSD license if
you are interested, or if you don't already have the inline assembly,
feel free to steal for your better algorithm.
Anyway, I feel sure you already have something a lot better than I could
come up with in a day and half effort, but I thought I'd check since as
far as I can tell your main emphasis is precision, not performance.
Feel free to ignore if this is more hassle than it is worth.
I attach the atlas_iopt.h that defines ATL_iGCD as an inline function,
as well as the rather embarrassingly primitive tester, that includes the
more normal Stein's (that one was written with my colleague Antoine
Petitet back in the 90s) I'm comparing against.
To get the inline assembly, you need to define ATL_GAS_x8664 or
ATL_GAS_x8632 on x86, and ATL_GAS_ARM64 for any 64-bit ARM (my configure
autodefines these guys: just define them to anything on compilation).
I'm not a member of this mailing list, so you'll have to send to me
directly for me to see any replies. If you want to suggest I RTFM about
how to do this less stupidly, I might learn something, assuming the
number theory doesn't defeat me at the beginning . . .
If this is not the right place to post, sorry. Feel free to post to
right place, or to not approve the submission. Just sending this on the
off chance it might be helpful, not because I need a reply.
Regards,
Clint
--
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
*******************************************************************
More information about the gmp-discuss
mailing list