Anomaly in mpn_sqrtrem and mpn_rottrem

Tue Jun 9 12:17:09 UTC 2015

Ciao,

Il Lun, 8 Giugno 2015 10:15 am, Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
>
>   I've had a quick look. Both mpn_dc_sqrtrem and mpn_rootrem_internal
>   (which seem to be the work horses for larger operands) do a division in
>   the loop. The latter is organized as a newton iteration, the former
>   isn't (but I guess the computation is more or less equivalent to a
>   Newton iteration).

To compare them I wrote a quick and dirty specialization of the rootrem
algorithm for the k==2 case (use sqr instead of mul, lshift instead of
mul_1...)

Current speed (on shell)

$ tune/speed -p 100000000  -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
           mpn_root.2      mpn_sqrt mpn_rootrem.2   mpn_sqrtrem
1         0.000000518   0.000000011   0.000000516  #0.000000009
16        0.000001103   0.000000244   0.000001117  #0.000000244
256       0.000009316  #0.000008321   0.000014106   0.000008374
4096     #0.000633967   0.000659180   0.000894513   0.000654562
65536    #0.026071234   0.026756061   0.033005667   0.026762939
1048576   0.732938835   0.734521311   0.962774086  #0.731488979

With the attached specialised function, rootrem.2 gains a 10% in speed and
is faster than sqrt when the reminder is not needed and size grows.

$ tune/speed -p 100000000  -s 1-1100000 -f16 mpn_root.2 mpn_sqrt
mpn_rootrem.2 mpn_sqrtrem
overhead 0.000000002 secs, precision 100000000 units of 2.86e-10 secs, CPU
freq 3500.08 MHz
           mpn_root.2      mpn_sqrt mpn_rootrem.2   mpn_sqrtrem
1         0.000000376  #0.000000010   0.000000383   0.000000010
16        0.000000884  #0.000000241   0.000000894   0.000000243
256      #0.000007743   0.000008262   0.000012138   0.000008282
4096     #0.000563298   0.000657382   0.000816271   0.000650694
65536    #0.023296001   0.026694774   0.030225570   0.026717529
1048576  #0.654117324   0.736489574   0.832002002   0.733319729

I added some TRACE (printf...) in the code to show the primitives with a
super-linear cost used in computation.
The last functions called by mpn_sqrtrem when computing the root of a 2000
limb operand are:

mpn_divrem nn=500/dn=250 -> qn=250,rn=250
mpn_sqr an=250 ^ 2 -> rn=500
mpn_divrem nn=1000/dn=500 -> qn=500,rn=500
mpn_sqr an=500 ^ 2 -> rn=1000

Similarly, for mpn_rootrem:

mpn_div_q nn=500/dn=251 -> qn=249
mpn_sqr an=501 ^ 2 -> rn=1002
mpn_div_q nn=1000/dn=501 -> qn=499
mpn_sqr an=1000 ^ 2 -> rn=2000

It uses div_q instead of divrem, but squares used by rootrem are doubled
in size...

When we do not ask for the reminder mpn_rootrem (say mpn_root) skips the
last squaring:

[mpn_sqr an=251 ^ 2 -> rn=502]
mpn_div_q nn=501/dn=251 -> qn=250
mpn_sqr an=501 ^ 2 -> rn=1002
mpn_div_q nn=1002/dn=501 -> qn=501

The operation are similar if compared to sqrt (the order changes),
probably div_q is faster than divrem. mpn_sqrt wins only when the huge
speed-up given by the specialised code for the first limb is relevant...

Best regards,
mb

-- 
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rootrem.c
Type: text/x-csrc
Size: 23291 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20150609/40caf3b1/attachment.bin>