# Better mpn_divexact_by3

Torbjorn Granlund tg at swox.com
Sat Aug 9 00:36:51 CEST 2008

```It was pointed out to me by David Harvey that fast division by 3 can
be performed by first multiplying by (2^64-1)/3 and then divide out
2^64-1.  If the original number was divisable by 3, the latter
division will also have no remainder, and can thus be done using
Hensel division.  That division can be done at cache speed for several
machines, but the best speed will be achieved by combining the
multiplication and the Hensel division.

The same trick forks for all divisors of the limbs base minus 1, such
as 5, 15, and 17 for the base 2^64.

Why is division by 3 important?  Because Toom multiplication, used in
GMP, requires such division.  The development code (gmplib.org/devel/)
has more Toom code, that relies even more on division by small
constants.

GMP already has a function for division by 3, mpn_divexact_by3.  Using
the previously known algorithms, it could be made to run at the
following cycle counts:

AMD     AMD   Pentium Pentium Pentium  Core    PPC     PPC     Alpha  Itanic
K7/32   K8/64  Nor/32  Pres/64  P6/13   2/64  74x7/32  970/64  EV67/64  2/64
8       5      18      20.5     9.6     8       6      13       6.3     4

With the new algorithm, it can be made to run much, much faster:

AMD     AMD   Pentium Pentium Pentium  Core    PPC     PPC     Alpha  Itanic
K7/32   K8/64  Nor/32  Pres/64  P6/13   2/64  74x7/32  970/64  EV67/64  2/64
4.5     2      13      12.5             4       3       7       2.5    2

Note that these divisions can be made at 2 c/l on K8, which is
actually faster than multiplication by the same constants (2.5 c/l).

--
Torbjörn
```