fast gcd

Torbjorn Granlund tg at gmplib.org
Wed Oct 14 11:37:08 CEST 2009


  > this is what I did with bgcd. I found experimentally the threshold to be about
  > 32768 bits, i.e., 512 limbs.
  
  To be compared with the HGCD_THRESHOLD, which can be a few times lower
  than that. Also, below the HGCD_THRESHOLD, hgcd is computed
  essentially using Lehmer's algorithm, and then the corresponding
  remainders are always produced as a side product. So I'm afraid we're
  going to need both interfaces and some thresholding.
  
Here are the tuned values from last night's builds:

ultrasparc2-sun-solaris2.7      scheutz.sics.se          HGCD    45
ultrasparc2-unknown-linux-gnu   gcc54.fsffrance.org      HGCD    54
powerpc630-ibm-aix5.1.0.0       louis.sics.se.sics.se    HGCD    60
pentium4-pc-linux-gnu           bonk.sics.se             HGCD    74
pentium4-pc-linux-gnu           r2d2.sics.se             HGCD    74
pentium4-pc-solaris2.10         firewood.sics.se         HGCD    77
power5-ibm-aix5.3.0.0           friggms.hpc.ntnu.no      HGCD    91
powerpc970-unknown-linux-gnu    gcc40.fsffrance.org      HGCD    95
hppa2.0-unknown-linux-gnu       dingo.fsffrance.org      HGCD    96
pentium4-unknown-freebsd7.2     spis.gmplib.org          HGCD   102
armv5tel-unknown-linux-gnueabi  gcc50.fsffrance.org      HGCD   107
pentium3-pc-linux-gnu           doping.sics.se           HGCD   115
itanium2-unknown-linux-gnu      babe.fsffrance.org       HGCD   118
core2-pc-linux-gnu              lattice.sics.se          HGCD   119
core2-unknown-freebsd8.0        tom.gmplib.org           HGCD   120
powerpc7455-unknown-linux-gnu   gcc53.fsffrance.org      HGCD   120
powerpc7447-unknown-linux-gnu   spigg.gmplib.org         HGCD   123
core2-unknown-freebsd7.0        repentium.gmplib.org     HGCD   128
pentium2-pc-linux-gnu           login.medicis.polytechni HGCD   131
pentium3-pc-linux-gnu           matrix.sics.se           HGCD   131
athlon-pc-linux-gnu             shell.math.su.se         HGCD   133
athlon64-unknown-linux-gnu      abakus.sics.se           HGCD   137
core2-unknown-freebsd7.1        shell.gmplib.org         HGCD   138
athlon64-pc-linux-gnu           ior.math.su.se           HGCD   139
athlon64-pc-linux-gnu           tiger.math.su.se         HGCD   139
athlon64-unknown-linux-gnu      marie.medicis.polytechni HGCD   141
athlon64-unknown-freebsd7.1     panther.gmplib.org       HGCD   143
athlon64-unknown-linux-gnu      gcc11.fsffrance.org      HGCD   143
athlon64-pc-solaris2.11         tray.nada.kth.se         HGCD   144
athlon64-unknown-linux-gnu      gcc13.fsffrance.org      HGCD   144
core2-unknown-freebsd8.0        jail21.jails.se          HGCD   145
athlon-unknown-freebsd7.2       mandela.gmplib.org       HGCD   165
armv5tel-unknown-linux-gnueabi  gcc55.fsffrance.org      HGCD   199
alphaev6-unknown-linux-gnu      moufang.math.su.se       HGCD   276

  It's possible to combine Strassen-multiplication with FFT-invariance,
  to get down to 7 pointwise multiplies, at the cost of a bunch of
  additions and subtractions on transformed values. I don't think I
  tried that, but I don't expect any significant gain since both
  multiplication and additions are linear, and with a fairly small
  difference in cost. The FFT interface I designed should support it.

I think FFT-invariance + Strassen would be slower (in the FFT small ring
case), since one would need to split the integers into slightly smaller
polynomial coefficients, giving higher polynomial degree.  Also, it
performs more linear operations.

-- 
Torbjörn


More information about the gmp-devel mailing list