Alternative div_qr_1

Thu Jun 17 16:28:50 CEST 2010

Torbjorn Granlund <tg at gmplib.org> writes:

> I see that your C code needs n >= 3, which might make sense when one
> does not have divisor invariance beyond a single dividend.

For n = 2, the computation is just an udiv_qrnnd(_preinv), the B2
constant is not used.

n = 2 and unnormalized d might make sense, though. (At the moment, my
thinking on unnormalized d is that one should not use B1, instead left
shift {r1, r0} after the loop to match the normalization of d. This
yields a three-limb number {r2, r1, r0}. Use B2 to eliminate r2, do
needed subtractins to get r_1 < d, then call udiv_qrnnd_preinv).

> **  	mov	 r1, %rax  	# 0  9 16 24 31   8,7

> I see.  Perhaps a cycle could be saved by not copying r1 to rax (marked
> with **) but instead copy b3 to rax, and replace 'mul b3' by 'mul r1'?

It seems I'm too used to having the constant factor as the explicit src
argument to mul. In the code I've been writing recently there are a lot
of moves of a recurrency-related variable to %rax, costing latency. I
should really consider moving the invariant constants to %rax instead.

In particular in loops where a variable such as r1 is used multiple
times, it seems it would be better to leave it out of %rax.

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.