fast gcd

Torbjorn Granlund tg at
Wed Oct 14 11:37:08 CEST 2009

  > this is what I did with bgcd. I found experimentally the threshold to be about
  > 32768 bits, i.e., 512 limbs.
  To be compared with the HGCD_THRESHOLD, which can be a few times lower
  than that. Also, below the HGCD_THRESHOLD, hgcd is computed
  essentially using Lehmer's algorithm, and then the corresponding
  remainders are always produced as a side product. So I'm afraid we're
  going to need both interfaces and some thresholding.
Here are the tuned values from last night's builds:

ultrasparc2-sun-solaris2.7          HGCD    45
ultrasparc2-unknown-linux-gnu      HGCD    54
powerpc630-ibm-aix5.1.0.0    HGCD    60
pentium4-pc-linux-gnu              HGCD    74
pentium4-pc-linux-gnu              HGCD    74
pentium4-pc-solaris2.10         HGCD    77
power5-ibm-aix5.3.0.0       HGCD    91
powerpc970-unknown-linux-gnu      HGCD    95
hppa2.0-unknown-linux-gnu      HGCD    96
pentium4-unknown-freebsd7.2          HGCD   102
armv5tel-unknown-linux-gnueabi      HGCD   107
pentium3-pc-linux-gnu            HGCD   115
itanium2-unknown-linux-gnu       HGCD   118
core2-pc-linux-gnu              HGCD   119
core2-unknown-freebsd8.0           HGCD   120
powerpc7455-unknown-linux-gnu      HGCD   120
powerpc7447-unknown-linux-gnu         HGCD   123
core2-unknown-freebsd7.0     HGCD   128
pentium2-pc-linux-gnu           login.medicis.polytechni HGCD   131
pentium3-pc-linux-gnu            HGCD   131
athlon-pc-linux-gnu            HGCD   133
athlon64-unknown-linux-gnu           HGCD   137
core2-unknown-freebsd7.1         HGCD   138
athlon64-pc-linux-gnu            HGCD   139
athlon64-pc-linux-gnu          HGCD   139
athlon64-unknown-linux-gnu      marie.medicis.polytechni HGCD   141
athlon64-unknown-freebsd7.1       HGCD   143
athlon64-unknown-linux-gnu      HGCD   143
athlon64-pc-solaris2.11         HGCD   144
athlon64-unknown-linux-gnu      HGCD   144
core2-unknown-freebsd8.0          HGCD   145
athlon-unknown-freebsd7.2       HGCD   165
armv5tel-unknown-linux-gnueabi      HGCD   199
alphaev6-unknown-linux-gnu       HGCD   276

  It's possible to combine Strassen-multiplication with FFT-invariance,
  to get down to 7 pointwise multiplies, at the cost of a bunch of
  additions and subtractions on transformed values. I don't think I
  tried that, but I don't expect any significant gain since both
  multiplication and additions are linear, and with a fairly small
  difference in cost. The FFT interface I designed should support it.

I think FFT-invariance + Strassen would be slower (in the FFT small ring
case), since one would need to split the integers into slightly smaller
polynomial coefficients, giving higher polynomial degree.  Also, it
performs more linear operations.


More information about the gmp-devel mailing list