Sandybridge addmul_N challenge

Niels Möller nisse at
Thu Feb 23 21:09:32 CET 2012

nisse at (Niels Möller) writes:

> So the recurrency, for one iteration, seems to be just 3 cycles. But the
> loop mixer doesn't find anything faster then 6.36 cycles for one
> iteration, or 3.18 per limb product. Which isn't too bad (a slight
> improvement over 3.24, which I think is the best reported earlier), but
> stubbornly above 3 c/l.

One update. I have now tried unrolling four times. Then I've seen one
sequence running at 6.16 cycles per iteration, or 3.08 c/l.

See shell:~nisse/hack/loopmix/lms/addmul_2-nisse-2.nlms.


