sqrt algorithm

bodrato at student.dm.unipi.it bodrato at student.dm.unipi.it
Tue Jul 28 18:53:24 UTC 2015


Ciao,

> nisse at lysator.liu.se (Niels Möller) writes:

>> For the divisions in the sqrt algorithms, I'm not sure of exactly how
>> the sizes of the numerator E, the denominator H, and the quotient,
>> relate, but they ought to all be pretty close to k.

> Below is an updated version of the code, which seems to work correctly.

> I'll look into how these cases fit approximate division. I suspect that
> in a few of the cases we use one more limb of E and H than we really
> need.

Sizes are very important.

I added some TRACE(printf(...)) to your code, and I pushed some for the
current implementation of the library. Here are the results:

99999, size 20, A = 49...67

Current library:
tdiv_qr(,,,,2,,1) -> 2
sqr(,,1)
tdiv_qr(,,,,3,,2) -> 2
sqr(,,1)
tdiv_qr(,,,,5,,3) -> 3
sqr(,,2)
tdiv_qr(,,,,10,,5) -> 6
sqr(,,5)

The code you sent:
divmod_1 (,,3,) -> 2
sqr(,,2)
tdiv_qr(,,,,4,,2) -> 3
sqr(,,3)
tdiv_qr(,,,,5,,3) -> 3
sqr(,,4)
tdiv_qr(,,,,7,,4) -> 4
sqr(,,6)
tdiv_qr(,,,,11,,6) -> 6
sqr(,,10)

I think that the currently implemented algorithm wins on yours because it
splits evenly: when a 2n-limbs root is needed it starts computing the k
higher limbs then updates the lower k. Your code splits in k1+ and k-1.
Current: 10 <- 5 <- 3 <- 2; your 10 <- 6 <- 4 <- 3 <- 2, one more iteration.

But I like the idea that wraparound must be used. Even with the current
splitting, your proposal to use div(appr)_q instead of tdiv_qr (saving the
cost of the division remainder) and compute the sqrt remainder by a sqrmod
should give a faster code. The cost of the current sqr(,,n) should be
comparable with the needed sqrmod_bnm1 (,n+1,,2n,). For big enough sizes
at least... This threshold would need tuning...

Regards,
m



More information about the gmp-devel mailing list