mpn_sqrtrem1

Tue Dec 20 18:29:05 UTC 2016

Hi,

> Is there a reason why you defined three different invsqrt8_ arrays?
> Doesn't invsqrttab contain suitable values?

Unlike current GMP code which has only one table,
I couldn't find one unique table for all three uses.

Note that the tables invsqrt8_32b and invsqrt8_64b are different for sure,
but I didn't check whether invsqrt8_64x2 was identical to one of the other tables.

If you want to dig in the depths of the algorithms that generates these tables,
see these functions:
- sqrt32_inv_tests
- sqrt64_inv_tests
- sqrt64x2_inv_tests
There are corresponding commands:
./sqrt inv32-tests 10000
etc with 64 and 64x2

These functions implement the sqrt algorithm with some measurement points inside,
to count number of left bits at zero, observe precision etc.
Manually change values in calls to functions sqrt32_inv_getlz() to do some more specific investigation.
In the last pass, these functions dump the table values that give the most accurate sqrt results.

By the way, the initial test: if(val < 256) return sqrt8_arr[val];
could be replaced by : if(val <= 1) return val;
This would save a table and one fetch operation.
I didn't investigate.

> Reducing the number of multiplications is possible... but I bet a
> Karatsuba umul_ppmm() is not faster than the plain version (at least not
> on current 64-bits CPUs ;-)

Thank you for the analysis, I was curious :-)

Adrien