Toom-4.5 (aka Toom-5x4, Toom-6x3, Toom-7x2)

David Harvey dmharvey at
Mon Oct 12 15:19:36 CEST 2009

On Oct 12, 2009, at 9:04 AM, bodrato at wrote:

> Is the division by 3 (a divisor of B-1) twice as fast as the  
> division by 9
> (not a divisor)? If it is not, then I'll prefer one slow division.  
> If it
> is, then I'll #define div_by9() {div_by3(); div_by3();} ...

Depends on the chip. For example, looking at the headers for mpn/ 
x86_64/bdiv_dbm1c.asm and mpn/x86_64/dive_1.asm (are these the right  
files?), the cycle counts for K8 (opteron) are respectively 2.25  
cycles per limb and 10 cycles per limb. More than a factor of four.  
Note also that bdiv_dbm1 is even slightly faster than mul_1!! (2.5 c/l)


More information about the gmp-devel mailing list