Performance of addlsh_n and sublsh_n

Rick Hodgin foxmuldrster at
Thu Feb 3 08:10:17 CET 2011

Sandy Bridge has a flaw.  No wonder it's sometimes faster.  It's taking short-cuts. :-),12123.html

- Rick C. Hodgin

--- On Wed, 2/2/11, Torbjorn Granlund <tg at> wrote:

From: Torbjorn Granlund <tg at>
Subject: Performance of addlsh_n and sublsh_n
To: gmp-devel at
Date: Wednesday, February 2, 2011, 2:14 PM

On AMD K8, K9, K10, and Intel Sandy Bridge, addlsh_n and sublsh_n are
slower than addmul_1 and submul_1.  The latters' functionality
completely cover the formers' functionality, except that
addmul_1/submul_1 do not allow separate lsh source operand and
destination operand.

Futhermore, on Intel Core2 and Nehalem, addlsh_n is slower than add_n
plus lshift, but the former presumably become faster when operands are
too large to fit in L1 cache.

We need to speed up addlsh_n and sublsh_n, or disable them for several

gmp-devel mailing list
gmp-devel at

More information about the gmp-devel mailing list