fast gcd
Torbjorn Granlund
tg at gmplib.org
Wed Oct 14 11:37:08 CEST 2009
> this is what I did with bgcd. I found experimentally the threshold to be about
> 32768 bits, i.e., 512 limbs.
To be compared with the HGCD_THRESHOLD, which can be a few times lower
than that. Also, below the HGCD_THRESHOLD, hgcd is computed
essentially using Lehmer's algorithm, and then the corresponding
remainders are always produced as a side product. So I'm afraid we're
going to need both interfaces and some thresholding.
Here are the tuned values from last night's builds:
ultrasparc2-sun-solaris2.7 scheutz.sics.se HGCD 45
ultrasparc2-unknown-linux-gnu gcc54.fsffrance.org HGCD 54
powerpc630-ibm-aix5.1.0.0 louis.sics.se.sics.se HGCD 60
pentium4-pc-linux-gnu bonk.sics.se HGCD 74
pentium4-pc-linux-gnu r2d2.sics.se HGCD 74
pentium4-pc-solaris2.10 firewood.sics.se HGCD 77
power5-ibm-aix5.3.0.0 friggms.hpc.ntnu.no HGCD 91
powerpc970-unknown-linux-gnu gcc40.fsffrance.org HGCD 95
hppa2.0-unknown-linux-gnu dingo.fsffrance.org HGCD 96
pentium4-unknown-freebsd7.2 spis.gmplib.org HGCD 102
armv5tel-unknown-linux-gnueabi gcc50.fsffrance.org HGCD 107
pentium3-pc-linux-gnu doping.sics.se HGCD 115
itanium2-unknown-linux-gnu babe.fsffrance.org HGCD 118
core2-pc-linux-gnu lattice.sics.se HGCD 119
core2-unknown-freebsd8.0 tom.gmplib.org HGCD 120
powerpc7455-unknown-linux-gnu gcc53.fsffrance.org HGCD 120
powerpc7447-unknown-linux-gnu spigg.gmplib.org HGCD 123
core2-unknown-freebsd7.0 repentium.gmplib.org HGCD 128
pentium2-pc-linux-gnu login.medicis.polytechni HGCD 131
pentium3-pc-linux-gnu matrix.sics.se HGCD 131
athlon-pc-linux-gnu shell.math.su.se HGCD 133
athlon64-unknown-linux-gnu abakus.sics.se HGCD 137
core2-unknown-freebsd7.1 shell.gmplib.org HGCD 138
athlon64-pc-linux-gnu ior.math.su.se HGCD 139
athlon64-pc-linux-gnu tiger.math.su.se HGCD 139
athlon64-unknown-linux-gnu marie.medicis.polytechni HGCD 141
athlon64-unknown-freebsd7.1 panther.gmplib.org HGCD 143
athlon64-unknown-linux-gnu gcc11.fsffrance.org HGCD 143
athlon64-pc-solaris2.11 tray.nada.kth.se HGCD 144
athlon64-unknown-linux-gnu gcc13.fsffrance.org HGCD 144
core2-unknown-freebsd8.0 jail21.jails.se HGCD 145
athlon-unknown-freebsd7.2 mandela.gmplib.org HGCD 165
armv5tel-unknown-linux-gnueabi gcc55.fsffrance.org HGCD 199
alphaev6-unknown-linux-gnu moufang.math.su.se HGCD 276
It's possible to combine Strassen-multiplication with FFT-invariance,
to get down to 7 pointwise multiplies, at the cost of a bunch of
additions and subtractions on transformed values. I don't think I
tried that, but I don't expect any significant gain since both
multiplication and additions are linear, and with a fairly small
difference in cost. The FFT interface I designed should support it.
I think FFT-invariance + Strassen would be slower (in the FFT small ring
case), since one would need to split the integers into slightly smaller
polynomial coefficients, giving higher polynomial degree. Also, it
performs more linear operations.
--
Torbjörn
More information about the gmp-devel
mailing list