Niels Möller nisse at lysator.liu.se
Tue Sep 3 09:29:26 UTC 2019

tg at gmplib.org (Torbjörn Granlund) writes:

> Intel ark doesn't seem to know any processor called "U4100" so I cannot
> figure out what generation it belongs to.

gmp's config.guess classifies it as core2. cat /proc/cpuinfo says

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Genuine Intel(R) CPU           U4100  @ 1.30GHz

I suspect there's a letter missing, and the marketing name really is
"SU4100". It's likely since 2010, based on timestamps in my home
directory. Anyway, *far* from the latest and greatest.

> Again, if IIRC, small quotients may result in 16 cycle latency.  That's
> the lowest possible timing.

In that case, not so surprising that the div1 function loses. Do other
architectures also have decent performance for small-quotient division?

Do you think table lookup on high bits should beat 16 cycles? It needs
to give good enough accuracy (possibly with an adjustment step) to not
result in a large penalty in iteration count.

Even with div1 deleted, the code handles q == 1 specially, and only
divides when q >= 2.


Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list