hgcd1/2
Niels Möller
nisse at lysator.liu.se
Tue Sep 3 09:29:26 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> Intel ark doesn't seem to know any processor called "U4100" so I cannot
> figure out what generation it belongs to.
gmp's config.guess classifies it as core2. cat /proc/cpuinfo says
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Genuine Intel(R) CPU U4100 @ 1.30GHz
I suspect there's a letter missing, and the marketing name really is
"SU4100". It's likely since 2010, based on timestamps in my home
directory. Anyway, *far* from the latest and greatest.
> Again, if IIRC, small quotients may result in 16 cycle latency. That's
> the lowest possible timing.
In that case, not so surprising that the div1 function loses. Do other
architectures also have decent performance for small-quotient division?
Do you think table lookup on high bits should beat 16 cycles? It needs
to give good enough accuracy (possibly with an adjustment step) to not
result in a large penalty in iteration count.
Even with div1 deleted, the code handles q == 1 specially, and only
divides when q >= 2.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list