386 optimized bitblit code
Brian Hurt
bhurt at spnz.org
Tue Feb 10 14:38:21 CET 2004
On Tue, 10 Feb 2004, Bowers, Rickey N,,DMDCWEST wrote:
> Brian,
>
> An arbitrary rotate would consist of only two lshift or rshift, and special
> handling of end and center limb. Memory copy is needed when the destination
> is the same as the source. Bitblit does not avoid this problem in any way.
>
> As Kevin has said and I stated in emails, mpn_lshift/mpn_rshift are three
> times faster than your present implementation. The bitblit and rotate code
> should use this to its advantage. Looking over the MMX x86 code should be
> convincing enough - the MMX speedup already exists.
1) Do not assume rotates are less than one word in length.
2) It's no surprise to me that the mmx versions are faster than non-mmx
versions. The core loops are identical, however, so when I finish
debugging my code and post it, bitblit should be about the same speed
(bitblit may take a constant extra overhead). The 128-bit xmm registers
were introduced with SSE-1, and so long as they aren't 2x as slow as the
same mmx isntructions, should provide some further speedup.
3) lshift and rshift, since they determine both the direction of the shift
and the direction of the copy, fail on (some) overlapping source and
destinations. I dislike routines that fail when it's not absolutely
necessary for them to do so. lshift and rshift together are approximately
the complexity of bitblit by itself. But by seperating them you've lost
flexibility.
4) Long term, I'd like to see gmp support not just big number uses, but
also big bit set uses. Rather like C treats integers. C can get away
with requiring that more complex bitwise operations be built up out of
several primitives, as the primitives are fast. For larger bitset
operations, this shouldn't be assumed.
5) The core loop of GF2 multiplication in an ONB looks like this:
for (int j = 1; j < m; ++j) {
rot_right(b, 1);
zidx = lambda[0][j];
oidx = lambda[1][j];
for (i = 0; i < n; ++i) {
c[i] ^= b[i] & (amatrix[zidx][i] ^ amatrix[oidx][i]);
}
}
The more I think about it, the more I think that being able to do bitwise
operations from arbitrary bit sequences to arbitrary bit sequences is even
more usefull than simple fast rotates. The question is where to draw the
line.
6) If people are attempting to subtly hint that it'll be a cold day in
hell before my code is accepted, I wish they'd come out and say it, and
save me the work.
--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian
More information about the gmp-devel
mailing list