386 optimized bitblit code
Kevin Ryde
user42 at zip.com.au
Tue Feb 10 07:46:47 CET 2004
Brian Hurt <bhurt at spnz.org> writes:
>
> - memcpy the low part of the number into a temporary array
Rotates occur in mul_fft.c, I think we use a separate source and
destination there to avoid an initial memcpy.
> - memmove the high part down to the low part
> - lshift the high part so the bits are in the right places
That should be an mpn_rshift, it can do the shift and move together.
> - rshift the temporary array so the bits are in the right places
> - memcpy all but one word of the temprorary array back into the
> number
No need for a separate memcpy there, if you handle the end limbs
separately, like Torbjorn said.
> - or the last word together
Yep.
> SSE better yet.
I'm not aware of sse helping (in theory it might, but for instance the
pentium4 xmm stuff has poor throughput).
More information about the gmp-devel
mailing list