sqrt algorithm

Niels Möller nisse at lysator.liu.se
Thu Jul 30 21:30:24 UTC 2015

"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

> Assume we need the square root of a 7-limb number. Is it better to shift
> it to 8 limbs and compute the higher 2 limbs of the result followed by the
> lowest 2 limbs

I think "expanding" to 8 limbs make sense. It's not really an expansion,
since the bulk of the work involves only the top 4 limbs in
either case.

I've tried doing a different sqrt_n function. It computes an
approximation of sqrt(B^n A), where A is an n-limb number, normalized so
that the highest input limb is >= B/4.

I'm having some concerns about accuracy, though. An example, with B = 2^64:

Input:
A = 7fffffffffff8001 fffffffffffff800 (hex)
a_1              a_0

Ideal output, floor(sqrt(B^2 A)):

X = b504f333f9de0a03 49ed732c20256bdd

The algorithm computes the first half of the square root as sqrt(B a_1),

x_1 <-- b504f333f9de0a02

Next, the residual E = A - x_1^2 = 1 d2968726c023cffc

We get the correction

B e / (2 x_1) = 1 49ed732c20256bdf (rounded downwards)

Adding it to B x_1 gives the return value

b504f333f9de0a03 49ed732c20256bdf

Which is *two* units larger than the correct X. I would have hoped for
an "appr" algorithm with a return value at most one off, and with
intermediate adjustment steps kept at a minimum.

But it may be better to add a few unlikely intermediate adjustment
steps, than to go back from the 2k = k+k split to 2k = (k+1) + (k-1)
split.

I would also like to understand if, for larger sizes, unadjusted errors
can grow from one iteration to the next, or if we can maintain a limit
of, say, at most 2 off in either direction without intermediate