Performance of addlsh_n and sublsh_n

Thu Feb 3 08:10:17 CET 2011

Sandy Bridge has a flaw.  No wonder it's sometimes faster.  It's taking short-cuts. :-)

http://www.tomshardware.com/news/sandy-bridge-cougar-point-chipset-sandy-bridge-recall,12123.html

- Rick C. Hodgin

--- On Wed, 2/2/11, Torbjorn Granlund <tg at gmplib.org> wrote:

From: Torbjorn Granlund <tg at gmplib.org>
Subject: Performance of addlsh_n and sublsh_n
To: gmp-devel at gmplib.org
Date: Wednesday, February 2, 2011, 2:14 PM

On AMD K8, K9, K10, and Intel Sandy Bridge, addlsh_n and sublsh_n are
slower than addmul_1 and submul_1.  The latters' functionality
completely cover the formers' functionality, except that
addmul_1/submul_1 do not allow separate lsh source operand and
destination operand.

Futhermore, on Intel Core2 and Nehalem, addlsh_n is slower than add_n
plus lshift, but the former presumably become faster when operands are
too large to fit in L1 cache.

We need to speed up addlsh_n and sublsh_n, or disable them for several
processors.

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel at gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel