sqrt algorithm
bodrato at student.dm.unipi.it
bodrato at student.dm.unipi.it
Tue Jul 28 18:53:24 UTC 2015
Ciao,
> nisse at lysator.liu.se (Niels Möller) writes:
>> For the divisions in the sqrt algorithms, I'm not sure of exactly how
>> the sizes of the numerator E, the denominator H, and the quotient,
>> relate, but they ought to all be pretty close to k.
> Below is an updated version of the code, which seems to work correctly.
> I'll look into how these cases fit approximate division. I suspect that
> in a few of the cases we use one more limb of E and H than we really
> need.
Sizes are very important.
I added some TRACE(printf(...)) to your code, and I pushed some for the
current implementation of the library. Here are the results:
99999, size 20, A = 49...67
Current library:
tdiv_qr(,,,,2,,1) -> 2
sqr(,,1)
tdiv_qr(,,,,3,,2) -> 2
sqr(,,1)
tdiv_qr(,,,,5,,3) -> 3
sqr(,,2)
tdiv_qr(,,,,10,,5) -> 6
sqr(,,5)
The code you sent:
divmod_1 (,,3,) -> 2
sqr(,,2)
tdiv_qr(,,,,4,,2) -> 3
sqr(,,3)
tdiv_qr(,,,,5,,3) -> 3
sqr(,,4)
tdiv_qr(,,,,7,,4) -> 4
sqr(,,6)
tdiv_qr(,,,,11,,6) -> 6
sqr(,,10)
I think that the currently implemented algorithm wins on yours because it
splits evenly: when a 2n-limbs root is needed it starts computing the k
higher limbs then updates the lower k. Your code splits in k1+ and k-1.
Current: 10 <- 5 <- 3 <- 2; your 10 <- 6 <- 4 <- 3 <- 2, one more iteration.
But I like the idea that wraparound must be used. Even with the current
splitting, your proposal to use div(appr)_q instead of tdiv_qr (saving the
cost of the division remainder) and compute the sqrt remainder by a sqrmod
should give a faster code. The cost of the current sqr(,,n) should be
comparable with the needed sqrmod_bnm1 (,n+1,,2n,). For big enough sizes
at least... This threshold would need tuning...
Regards,
m
More information about the gmp-devel
mailing list