Niels Möller nisse at lysator.liu.se
Mon Dec 7 12:35:20 CET 2009

bodrato at mail.dm.unipi.it writes:

>> Here A_k is the most significant part of the input, of size appropriate
>> for the iteration. A is assumed to be "strictly" normalized with high
>> limb >= B/2, to relax that I guess one would need one more limb in A.
>
> One more limb in the result, yes... because now a virtual limb containing
> one is assumed, but inverses of numbers smaller than one half needs
> inverses bigger than one-point-<whatever> (the function actually computes
> <whatever>).

What I was trying to say is that if we use the less strict normalization
requirement that the high limb of A is non-zero, then I suspect we need
A_k (the truncated input used in iteration k) to include one more limb
than in the algorithm described in MCA.

> The "adjustment step" I'm using now... is needed only for the last
> iteration (the intermediate ones are adjusted by the next iteration), but
> it is _not_ cheap.

>From the work on single-limb inverses (invert_limb, see Alg. 2 and 3 in
http://www.lysator.liu.se/~nisse/archive/draft-division-paper.pdf), I
suspect that it is very difficult to get an "exact" reciprocal
floor((B^{2n} - 1) / X) without an expensive multiplication at the end.

I think the value returned by the MCA algorithm equals either this
"exact" reciprocal or is one unit too small. Not sure if it being too
small is common, or if it can happen only in borderline cases where
B^{2n} mod X is close to zero (but I'm fairly sure that with some tweaks
it can be reduced to only such borderline cases). It would be desirable
to make Barrett-style division work with such an approximative
reciprocal, to avoid the expensive final adjustment step.

> Regards,
> Marco

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.