[Gmp-commit] /var/hg/gmp: 2 new changesets

Thu Mar 3 15:28:48 CET 2011

details:   /var/hg/gmp/rev/6940a00bb3e7
changeset: 13991:6940a00bb3e7
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Thu Mar 03 15:25:47 2011 +0100
description:
Update/remove several itemised tasks.

details:   /var/hg/gmp/rev/8ef60993a0bc
changeset: 13992:8ef60993a0bc
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Thu Mar 03 15:28:18 2011 +0100
description:
(mpn_dc_sqrtrem): Use mpn_addlsh1_n.

diffstat:

 ChangeLog             |   4 ++++
 doc/tasks.html        |  43 +++++++++++++------------------------------
 mpn/generic/sqrtrem.c |   4 ++++
 3 files changed, 21 insertions(+), 30 deletions(-)

diffs (116 lines):

diff -r 14cd74efb9de -r 8ef60993a0bc ChangeLog

--- a/ChangeLog	Thu Mar 03 12:36:39 2011 +0100
+++ b/ChangeLog	Thu Mar 03 15:28:18 2011 +0100
@@ -1,3 +1,7 @@
+2011-03-03  Torbjorn Granlund  <tege at gmplib.org>
+
+	* mpn/generic/sqrtrem.c (mpn_dc_sqrtrem): Use mpn_addlsh1_n.
+
 2011-03-03  Niels Möller  <nisse at lysator.liu.se>
 
 	* mpn/generic/mod_1_1.c (add_mssaaaa): For x86 and x86_64, treat m
diff -r 14cd74efb9de -r 8ef60993a0bc doc/tasks.html
--- a/doc/tasks.html	Thu Mar 03 12:36:39 2011 +0100
+++ b/doc/tasks.html	Thu Mar 03 15:28:18 2011 +0100
@@ -37,7 +37,7 @@
 
 <hr>
 <!-- NB. timestamp updated automatically by emacs -->
-  This file current as of 28 Dec 2009.  An up-to-date version is available at
+  This file current as of 3 Mar 2011.  An up-to-date version is available at
   <a href="http://gmplib.org/tasks.html">http://gmplib.org/tasks.html</a>.
   Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
 
@@ -122,9 +122,6 @@
      subsequent operations, especially if the value is otherwise only small.
      If low bits of the low limb are zero, use <code>mpn_rshift</code> so as
      to not increase the size.
-<li> <code>mpn_dc_sqrtrem</code>: Don't use <code>mpn_addmul_1</code> with
-     multiplier==2, instead either <code>mpn_addlsh1_n</code> when available,
-     or <code>mpn_lshift</code>+<code>mpn_add_n</code> if not.
 <li> <code>mpn_dc_sqrtrem</code>, <code>mpn_sqrtrem2</code>: Don't use
      <code>mpn_add_1</code> and <code>mpn_sub_1</code> for 1 limb operations,
      instead <code>ADDC_LIMB</code> and <code>SUBC_LIMB</code>.
@@ -133,20 +130,12 @@
      aliasing between <code>sp</code> and <code>rp</code>.
 <li> <code>mpn_sqrtrem</code>: Some work can be saved in the last step when
      the remainder is not required, as noted in Paul's paper.
-<li> <code>mpq_add</code>, <code>mpq_add</code>: The division "op1.den / gcd"
-     is done twice, where of course only once is necessary.  Reported by Larry
-     Lambe.
 <li> <code>mpq_add</code>, <code>mpq_sub</code>: The gcd fits a single limb
-     with high probability and in this case <code>modlimb_invert</code> could
+     with high probability and in this case <code>binvert_limb</code> could
      be used to calculate the inverse just once for the two exact divisions
      "op1.den / gcd" and "op2.den / gcd", rather than letting
-     <code>mpn_divexact_1</code> do it each time.  This would require a new
-     <code>mpn_preinv_divexact_1</code> interface.  Not sure if it'd be worth
-     the trouble.
-<li> <code>mpq_add</code>, <code>mpq_sub</code>: The use of
-     <code>mpz_mul(x,y,x)</code> causes temp allocation or copying in
-     <code>mpz_mul</code> which can probably be avoided.  A rewrite using
-     <code>mpn</code> might be best.
+     <code>mpn_bdiv_q_1</code> do it each time.  This would require calling
+     <code>mpn_pi1_bdiv_q_1</code>.
 <li> <code>mpn_gcdext</code>: Don't test <code>count_leading_zeros</code> for
      zero, instead check the high bit of the operand and avoid invoking
      <code>count_leading_zeros</code>.  This is an optimization on all
@@ -173,26 +162,20 @@
      since there's no apparent way to get <code>SHRT_MAX</code> with an
      expression (since <code>short</code> and <code>unsigned short</code> can
      be different sizes).
-<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> aren't very
-     fast on one or two limb moduli, due to a lot of function call
-     overheads.  These could perhaps be handled as special cases.
-<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> want better
-     algorithm selection, and the latter should use REDC.  Both could
-     change to use an <code>mpn_powm</code> and <code>mpn_redc</code>.
+<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> aren't very fast on one
+     or two limb moduli, due to a lot of function call overheads.  These could
+     perhaps be handled as special cases.
+<li> Make sure <code>mpz_powm_ui</code> is never slower than the corresponding
+     computation using <code>mpz_powm</code>.
 <li> <code>mpz_powm</code> REDC should do multiplications by <code>g[]</code>
      using the division method when they're small, since the REDC form of a
      small multiplier is normally a full size product.  Probably would need a
      new tuned parameter to say what size multiplier is "small", as a function
      of the size of the modulus.
-<li> <code>mpz_powm</code> REDC should handle even moduli if possible.  Maybe
-     this would mean for m=n*2^k doing mod n using REDC and an auxiliary
-     calculation mod 2^k, then putting them together at the end.
-<li> <code>mpn_gcd</code> might be able to be sped up on small to
-     moderate sizes by improving <code>find_a</code>, possibly just by
-     providing an alternate implementation for CPUs with slowish
+<li> <code>mpn_gcd</code> might be able to be sped up on small to moderate
+     sizes by improving <code>find_a</code>, possibly just by providing an
+     alternate implementation for CPUs with slowish
      <code>count_leading_zeros</code>.
-<li> Toom3 could use a low to high cache localized evaluate and interpolate.
-     The necessary <code>mpn_divexact_by3c</code> exists.
 <li> <code>mpf_set_str</code> produces low zero limbs when a string has a
      fraction but is exactly representable, eg. 0.5 in decimal.  These could be
      stripped to save work in later operations.
@@ -371,7 +354,7 @@
 <li> UltraSPARC/32: <code>mpn_divexact_by3c</code> can work 64-bits at a time
      using <code>mulx</code>, in assembler.  This would be the same as for
      sparc64.
-<li> UltraSPARC: <code>modlimb_invert</code> might save a few cycles from
+<li> UltraSPARC: <code>binvert_limb</code> might save a few cycles from
      masking down to just the useful bits at each point in the calculation,
      since <code>mulx</code> speed depends on the highest bit set.  Either
      explicit masks or small types like <code>short</code> and
diff -r 14cd74efb9de -r 8ef60993a0bc mpn/generic/sqrtrem.c
--- a/mpn/generic/sqrtrem.c	Thu Mar 03 12:36:39 2011 +0100
+++ b/mpn/generic/sqrtrem.c	Thu Mar 03 15:28:18 2011 +0100
@@ -246,7 +246,11 @@
 
       if (c < 0)
 	{
+#if HAVE_NATIVE_mpn_addlsh1_n
+	  c += mpn_addlsh1_n (np, np, sp, n) + 2 * q;
+#else
 	  c += mpn_addmul_1 (np, sp, n, CNST_LIMB(2)) + 2 * q;
+#endif
 	  c -= mpn_sub_1 (np, np, n, CNST_LIMB(1));
 	  q -= mpn_sub_1 (sp, sp, n, CNST_LIMB(1));
 	}