Division by 9 (and other factors of 2^60 - 1)
Torbjorn Granlund
tg at gmplib.org
Tue Oct 13 11:30:09 CEST 2009
nisse at lysator.liu.se (Niels Möller) writes:
And to correct myself: 15 can be handled by current bdiv_dbm1c. Doing
it twice, new interesting divisors include 9, 25, 45, 75 and 225 (and
more, involving the larger factors of 2^64 - 1).
Do you think there are any interesting machines where division via
2^60 - 1 is useful (or 2^24 - 1, for 32-bit machines)?
There are machines where multiplication is horribly slow, of course. It
also seems likely that we could shave an instruction or two from the
critical path of your code.
bdiv_dbm1c's inner loop element looks like this:
mov (%rsi,%r9,8), %rax
mul %rcx
sub %rax, %r8
mov %r8, (%rdi,%r9,8)
sbb %rdx, %r8
I suppose we can simply do the double variant thus:
mov (%rsi,%r9,8), %rax
mul %rcx
sub %rax, %r8
sbb %rdx, %r8
mov %r8, %rax
mul %rcx
sub %rax, %r8
mov %r8, (%rdi,%r9,8)
sbb %rdx, %r8
Core i7's extreme rdx latency might ask for a different sequence.
Perhaps this is worth doing, but it might not be the lowest hanging
fruit, though. We shouldn't forget that we're bashing cycles from
linear work of Toom functions that handle rather large operands.
--
Torbjörn
More information about the gmp-devel
mailing list