The new code is faster on most x86-64 machines, see

I suppose we should replace the generic/mod_1_1.c?

Have you looked into a mod_1_2 using the same ideas?  Perhaps it will be
tricky to get that to be as fast as possible without further restricting
the divisor range?


