Toom-4.5 (aka Toom-5x4, Toom-6x3, Toom-7x2)
David Harvey
dmharvey at cims.nyu.edu
Mon Oct 12 15:19:36 CEST 2009
On Oct 12, 2009, at 9:04 AM, bodrato at mail.dm.unipi.it wrote:
> Is the division by 3 (a divisor of B-1) twice as fast as the
> division by 9
> (not a divisor)? If it is not, then I'll prefer one slow division.
> If it
> is, then I'll #define div_by9() {div_by3(); div_by3();} ...
Depends on the chip. For example, looking at the headers for mpn/
x86_64/bdiv_dbm1c.asm and mpn/x86_64/dive_1.asm (are these the right
files?), the cycle counts for K8 (opteron) are respectively 2.25
cycles per limb and 10 cycles per limb. More than a factor of four.
Note also that bdiv_dbm1 is even slightly faster than mul_1!! (2.5 c/l)
david
More information about the gmp-devel
mailing list