386 optimized bitblit code

Kevin Ryde user42 at zip.com.au
Tue Feb 10 07:46:47 CET 2004


Brian Hurt <bhurt at spnz.org> writes:
>
> 	- memcpy the low part of the number into a temporary array

Rotates occur in mul_fft.c, I think we use a separate source and
destination there to avoid an initial memcpy.

> 	- memmove the high part down to the low part
> 	- lshift the high part so the bits are in the right places

That should be an mpn_rshift, it can do the shift and move together.

> 	- rshift the temporary array so the bits are in the right places
> 	- memcpy all but one word of the temprorary array back into the 
> 	  number

No need for a separate memcpy there, if you handle the end limbs
separately, like Torbjorn said.

> 	- or the last word together

Yep.

> SSE better yet.

I'm not aware of sse helping (in theory it might, but for instance the
pentium4 xmm stuff has poor throughput).


More information about the gmp-devel mailing list