[patrick.pelissier@gmail.com: Fwd: Optimizing 1/x]

Tue Sep 13 23:30:42 CEST 2005

Paul Zimmermann wrote:
>    Well, I assume then that the benefits of such a method does *at least* 
>    pay-off. The next question would be how slower is the next fastest 
>    non-patented algorithm compared to HP's? Also, although I'm anything but 
>    a lawyer, I'd be surprised if law forbids using a patented algorithm for 
>    educational/non-profit purposes. So I assume that one could argue that 
>    making patch (with the said HP algorithm) available would benefit many 
>    and the burden of following the patent laws would be on the user, not 
>    the provider of the code.
> 
> Sorry, the HP patent only concerns division (y/x), not inversion (1/x).
> For division, I expect the next fastest non-patented algorithm to be
> about 10% to 20% slower. Also, it will use more memory.

Ok, I undertand that it's a division algorithm, but I though you had 
also experimented with inversion, no? If you have, I'm interested in 
knowing if there are substantial speed improvements over the current 
approach of creating an mpn from the the ui value of 1, and then 
dividing it with the divisor. May be even theoretically, you can give me 
some advise as if it is worth implementing an inversion algorithm. I 
expect to find some benefits from such an approach (a specialized path 
for inversion) but in practice there might be some hidden overhead that 
trashes the benefits, that's why I need advice before I dive in.

10-20% is not an astronomical difference. 10-20% variations are not 
unusual between CPU models and optimization parammeters. It's probably 
not even worth chasing a license. But how much more memory? Also 10-20% 
or much more?

> 
>    Have you considered making contributions with your implementation to GMP 
>    or seperate from GMP in form of patches to GMP?
> 
> When it will be finished, I'll submit my implementation to GMP.
> 

Great! Looking forward to it. Let me know if I can contribute with some 
of my idle cycles, mmm, time to the same cause. I can assembly-optimize 
pretty well. I'm actually currently playing with the AMD-64 port of the 
assembly modules. Will post a patch when my efforts start paying off.

>    Out of curiosity, how much faster is the HP algorithm compared to the 
>    current GMP division implementation. I assume it's sub linear (or is it 
>    not?), but in practice is it close to log2(n) + C? Its complexity is 
>    probably a function in the multiplication algorithm's complexity.
> 
>    -Ash
> 
> The current GMP division uses a recursive algorithm for large operands,
> with complexity O(M(n) log(n)), whereas Newton's division has complexity
> O(M(n)), so we can expect a more-than-constant speedup once Newton's division
> is in GMP.
> 

A substantial gain, I'd say. So basically we will be left with the Mul 
complexity, which floats at about (IIRC) O(n^1.125) for numbers with 
larger-than-fft-threshold bits.

> Paul
> 
>