sqrt algorithm

Wed Aug 12 16:20:35 UTC 2015

Ciao,

On Wed, August 12, 2015 2:03 pm, Torbjörn Granlund wrote:
>   I tested this approach for sqrlo_basecase too, you can find the code
>   enclosed by
>   #ifdef SQRLO_SHORTCUT_MULTIPLICATIONS
>
>   But I'm not sure it is faster, so it is currently disabled.
>
> It will obviously be faster for machines where widening multiplication
> is expensive, and in some other hardware cases.

Ok, I can enable it for sqrlo. Are you working on an improved
mullo_basecase? Otherwise I can change that code using the same criteria I
used for sqrlo...

> I don't recall how we defined mullo/sqrlo wrt the size of the target
> operand.  Can we write above the defined result, i.e., can we write the
> full 2n product if it is convenient?

Current implementation of both mullo and sqrlo do write n limbs only,
possibly by full 2n product in a temporary area followed by MPN_COPY.

IIRC someone proposed the interface mullo(res, x, y, n, tmp); with 2n
limbs in tmp, supporting res == tmp, but we never switched to it.

Regards,
m

-- 
http://bodrato.it/