Performance of addlsh_n and sublsh_n

Mon Feb 7 08:34:49 CET 2011

Ciao,

On Wed, February 2, 2011 8:14 pm, Torbjorn Granlund wrote:
> On AMD K8, K9, K10, and Intel Sandy Bridge, addlsh_n and sublsh_n are
> slower than addmul_1 and submul_1.  The latters' functionality
> completely cover the formers' functionality, except that
> addmul_1/submul_1 do not allow separate lsh source operand and
> destination operand.

sublsh_n is not implemented on any system... but it would be very useful
in toom_interpolate*. I did check all the calls to any mpn_sublsh{,1,2}_n
function, they are used _only_ in toom_interpolate, and they are _always_
used with sublsh*_n (dst, dst, src, ...), i.e. it can be substituted by
submul_1. Should we define an inplace_sublsh (dst, src,...)?

addlsh*_n are used differently, there are some addlsh*_n(dst,dst,src,...),
some addlsh*_n(dst,src,dst,...), and some addlsh*_n(dst,src1,src2,...)
used in many places in the code... There is also a single use of
mpn_addlsh2_n (lp, lp, lp, lsize); in mpz/lucnum_ui.c (should it be
replaced by a mul_1(...,5) )?

Regards,
Marco

-- 
http://bodrato.it/papers/