# Anomaly in mpn_sqrtrem and mpn_rottrem

Niels Möller nisse at lysator.liu.se
Wed Jun 10 19:54:58 UTC 2015

```tg at gmplib.org (Torbjörn Granlund) writes:

> I haven't looked enough at your propposal to have an informed opinion.

Here's a sketch with some more details. I've tried to work out both
sqrt(B^{n-1} A) and sqrt(B^{n-2}). To my surprise, they seem
independent, not mutually recursive.

sqrt_nm1 (n, A):

Input: A = {a_{n-1}, ..., a_0}, n >= 2
Output: X = {x_{n-2}, ..., x_0} \approx sqrt(B^{n-2} A)

if (n == 2)
x_0 <-- floor (sqrt ({a_1, a_0}))
return X = {x_0}

k <-- floor (n/2)
H <-- sqrt_nm1 (k+1, {a_{n-1},...,a_{n-1-k}}) // k limbs

if (n odd)	// n = 2k+1
E <-- B H^2 - A
X <-- B^k H - floor (B^{k-1} E / 2H)

else		// n = 2k
E <-- H^2 - A
X <-- B^{k-1} H - floor (B^{k-1} E / 2H)

return X

sqrt_n (n, A):

Input: A = {a_{n-1}, ..., a_0}, n >= 1
Output: X = {x_{n-1}, ..., x_0} \approx sqrt(B^{n-1} A)

if (n == 1)
x_0 <-- floor (sqrt (a_0))
return X = {x_0}

k <-- floor (n/2)
H <-- sqrt_n(k+1, {a_{n-1},...,a_{n-1-k}}) // k+1 limbs

if (n odd)	// n = 2k+1
E <-- H^2 - A
X <-- H B^k - floor (B^k E / 2 H)

else		// n = 2k
E <-- H^2 - B A
X <-- B^{k-1} H - floor (B^{k-1} E / 2H)

return X

Note that all the computations of the error/residual E is subject to a
lot of cancellation, so one can use mullo (for small sizes) or
wraparound arithmetic using mulmod_bnm1 for larger sizes.

And the divisions really ought to use divappr, rather than exact
truncating division. I'm not sure how that interface works, but one
ought be able to pass in numerators like, e.g., B^{k-1} E, without
actually padding the numerator with zeros.

I don't know how large errors one gets, from the above algorithms, but I
would expect errors to be at most 2 or 3.

For proper error analysis, one should also think about desired sign of
the error. The newton iteration wants to converge from above, and on the
other hand the inclusion of more bits of A wants X to converge from
below. So maybe one should add some fudge factor into the rounding
somewhere force monotonous convergence? Also sign and magnitude of the
divappr errors must be taken into account.

If we end up with an error E and corresponding X update of varying sign,
that's no fundamental problem, but it will make the wraparound
arithmetic a little more complex.

Finally, I'm not sure if one can expect tricks to eliminate division to
be a big win. Except for the smallest sizes, division, and in particular
divappr, should only be a small constant slower than multiplication,
right? The division free iteration for 1/sqrt(A) needs additional
multiplications, and the trick mensioned the other day to compute an
inverse incrementally, will also introduce additional multiplications.

I'm thinking that an exact sqrt/sqrtrem for even size n will call
sqrt_nm1 with n/2+1 input limbs producing an n-limb root. And for odd n,
call sqrt_n with (n+1)/2 input limbs. In either case, we'll get at least
one bit of margin, with the worst case being odd n and a[n-1] == 1.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
```