sqrt algorithm

Wed Jul 29 19:24:35 UTC 2015

bodrato at student.dm.unipi.it writes:

> I think that the currently implemented algorithm wins on yours because it
> splits evenly: when a 2n-limbs root is needed it starts computing the k
> higher limbs then updates the lower k. Your code splits in k1+ and k-1.
> Current: 10 <- 5 <- 3 <- 2; your 10 <- 6 <- 4 <- 3 <- 2, one more iteration.

The reason I do it that way is that if I split 5+5, and the high "half"
is short, I'm afraid I don't get enough accuracy from one newton
iteration. Say the high limb of the root is only 10 bits, so the full
root is 586 bits, split as 586 = 266 (high half) + 320 (low half). And a
single Newton iteration, as far as I understand it, can't get all those
320 new bits close to correct. Am I missing something?

It could likely be made to work if the input is properly normalized,
does the current code do that?

I kind-of dislike having to normalize numbers up-front, but it might
well be a good thing for sqrt. First, it provides better accuracy for a
k+k split. And further, we get normalized divisors for all the division
calls.

> But I like the idea that wraparound must be used. [...] The cost of
> the current sqr(,,n) should be comparable with the needed sqrmod_bnm1
> (,n+1,,2n,). For big enough sizes at least... This threshold would
> need tuning...

And for small sizes, one ought to sqrlo rather than mulmod_bnm1 to
exploit the cancellation. So the tuning would be between sqrlo and
sqrmod_bnm1. But it seems we only have mullo, not sqrlo?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.