GMP on Pentium 2

delta trinity deltatrinity at
Sat Nov 8 17:49:52 CET 2003

Humm, nice work!

I remember a book I baught a few years ago, about code optimization.  I 
think that this is just the kind of 'practical demonstration' that proove 
the importance of checking every single details of the processor in tight 

I wonder though if there are other places like that that could be optimized 
in the code by aligning to quad words.  Who knows, this may give us a 'free' 
speed-boost (without modifying the code, just the alignment).


>From: Torbjorn Granlund <tg at>
>To: Patrick Pelissier <Patrick.Pelissier at>,gmp-discuss at
>Subject: Re: GMP on Pentium 2
>Date: 08 Nov 2003 03:26:56 +0100
>I found the reason for the 3.7 vs 3.2 cycles/limb performance for
>mpn/x86/aors_n.asm on p6.  Alignment.  If the loop start is at an
>address 8 mod 16, the loop needs 3.7 cycles/limb, but if it is
>aligned 0 mod 16, it needs only 3.2 cycles/limb.  Since the code forces
>just 0 mod 8 alignment, both timing results happen depending on
>where the code end up being put by the linker.
>gmp-discuss mailing list
>gmp-discuss at

MSN Messenger with backgrounds, emoticons and more.

More information about the gmp-discuss mailing list