Division by 9 (and other factors of 2^60 - 1)

Torbjorn Granlund tg at gmplib.org
Tue Oct 13 11:30:09 CEST 2009


nisse at lysator.liu.se (Niels Möller) writes:

  And to correct myself: 15 can be handled by current bdiv_dbm1c. Doing
  it twice, new interesting divisors include 9, 25, 45, 75 and 225 (and
  more, involving the larger factors of 2^64 - 1).
  
  Do you think there are any interesting machines where division via
  2^60 - 1 is useful (or 2^24 - 1, for 32-bit machines)?
  
There are machines where multiplication is horribly slow, of course.  It
also seems likely that we could shave an instruction or two from the
critical path of your code.

bdiv_dbm1c's inner loop element looks like this:

	mov	(%rsi,%r9,8), %rax
	mul	%rcx
	sub	%rax, %r8
	mov	%r8, (%rdi,%r9,8)
	sbb	%rdx, %r8

I suppose we can simply do the double variant thus:

	mov	(%rsi,%r9,8), %rax
	mul	%rcx
	sub	%rax, %r8
	sbb	%rdx, %r8
        mov     %r8, %rax
	mul	%rcx
	sub	%rax, %r8
	mov	%r8, (%rdi,%r9,8)
	sbb	%rdx, %r8

Core i7's extreme rdx latency might ask for a different sequence.

Perhaps this is worth doing, but it might not be the lowest hanging
fruit, though.  We shouldn't forget that we're bashing cycles from
linear work of Toom functions that handle rather large operands.

-- 
Torbjörn


More information about the gmp-devel mailing list