Performance of addlsh_n and sublsh_n

Wed Feb 2 20:14:56 CET 2011

On AMD K8, K9, K10, and Intel Sandy Bridge, addlsh_n and sublsh_n are
slower than addmul_1 and submul_1.  The latters' functionality
completely cover the formers' functionality, except that
addmul_1/submul_1 do not allow separate lsh source operand and
destination operand.

Futhermore, on Intel Core2 and Nehalem, addlsh_n is slower than add_n
plus lshift, but the former presumably become faster when operands are
too large to fit in L1 cache.

We need to speed up addlsh_n and sublsh_n, or disable them for several
processors.

-- 
Torbjörn