> mp_size_t is a signed type, for various reasons.  size_t is an unsigned
> type.  Unsigned division by constants is a few cycles faster.

How are these divisions by constants implemented? I guess it depends
on the compiler, but this is precisely the type of application your
and Montgomery's paper is about, right? IIRC, it should be a multiply,
possibly some shifting, and unlike udiv_qrnn_preinv, one may even get
away without any adjustment steps.


