Alexander Kruppa <akruppa at> writes:

> Maybe GMP could have configure look for a function to align limb
> arrays at multiples of 16 (or higher) on systems where SIMD wants to
> operate on aligned data, to avoid a speed penalty from working on
> partial data at loop start.

That won't quite work, unless you go all the way and define a "limb" as
a 128-bit quantity. Internal code passes around lots of pointers into
the middle of the limb arrays. In particular various divide-and-conquer
algorithms and the Toom multiply functions do that all the time.


