speed of unbalanced multiplication

Fri Feb 8 20:29:18 CET 2013

Ciao,

Il Ven, 8 Febbraio 2013 11:42 am, Torbjorn Granlund ha scritto:
> bodrato at mail.dm.unipi.it writes:
>   I agree, but ... the only difference I could see on my netbook is not
>   memory alignment, but "position".
>
> Was this reproduced on any non-Linux system?  Perhaps Linux somehow
> messes up caching and/or TLD for certain address ranges?

The timings I posted on this list was measured on shell, a FreeBSD system.
I tested my patch on my netbook (atom-linux) and shell (K10-fbsd), on both
mul results was worst than mul_n results before the patch, and equivalent
after it.

Removing the (now pushed) patch, on shell I obtain:

$ tune/speed -o addrs -s 800000 mpn_mul_n
            mpn_mul_n
dst 801E00040 src 800E00040 801600040   (cf sp approx 7FFFFFFFC37C)
800000    0.646598000

$ tune/speed -o addrs -s 800000 mpn_mul
              mpn_mul
dst 801E00040 src 802C00040 801600040   (cf sp approx 7FFFFFFFC36C)
800000    0.680945000

Different addresses, different speed.

With the patch, always on shell, I get:

$ tune/speed -o addrs -s 800000 mpn_mul_n
            mpn_mul_n
dst 801E00040 src 800E00040 801600040   (cf sp approx 7FFFFFFFC37C)
800000    0.644599000

$ tune/speed -o addrs -s 800000 mpn_mul
              mpn_mul
dst 801E00040 src 800E00040 801600040   (cf sp approx 7FFFFFFFC36C)
800000    0.645062000

Same addresses, same speed.

-- 
http://bodrato.it/papers/