[Gmp-commit] /var/hg/gmp: 2 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Thu Mar 3 15:28:48 CET 2011
details: /var/hg/gmp/rev/6940a00bb3e7
changeset: 13991:6940a00bb3e7
user: Torbjorn Granlund <tege at gmplib.org>
date: Thu Mar 03 15:25:47 2011 +0100
description:
Update/remove several itemised tasks.
details: /var/hg/gmp/rev/8ef60993a0bc
changeset: 13992:8ef60993a0bc
user: Torbjorn Granlund <tege at gmplib.org>
date: Thu Mar 03 15:28:18 2011 +0100
description:
(mpn_dc_sqrtrem): Use mpn_addlsh1_n.
diffstat:
ChangeLog | 4 ++++
doc/tasks.html | 43 +++++++++++++------------------------------
mpn/generic/sqrtrem.c | 4 ++++
3 files changed, 21 insertions(+), 30 deletions(-)
diffs (116 lines):
diff -r 14cd74efb9de -r 8ef60993a0bc ChangeLog
--- a/ChangeLog Thu Mar 03 12:36:39 2011 +0100
+++ b/ChangeLog Thu Mar 03 15:28:18 2011 +0100
@@ -1,3 +1,7 @@
+2011-03-03 Torbjorn Granlund <tege at gmplib.org>
+
+ * mpn/generic/sqrtrem.c (mpn_dc_sqrtrem): Use mpn_addlsh1_n.
+
2011-03-03 Niels Möller <nisse at lysator.liu.se>
* mpn/generic/mod_1_1.c (add_mssaaaa): For x86 and x86_64, treat m
diff -r 14cd74efb9de -r 8ef60993a0bc doc/tasks.html
--- a/doc/tasks.html Thu Mar 03 12:36:39 2011 +0100
+++ b/doc/tasks.html Thu Mar 03 15:28:18 2011 +0100
@@ -37,7 +37,7 @@
<hr>
<!-- NB. timestamp updated automatically by emacs -->
- This file current as of 28 Dec 2009. An up-to-date version is available at
+ This file current as of 3 Mar 2011. An up-to-date version is available at
<a href="http://gmplib.org/tasks.html">http://gmplib.org/tasks.html</a>.
Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
@@ -122,9 +122,6 @@
subsequent operations, especially if the value is otherwise only small.
If low bits of the low limb are zero, use <code>mpn_rshift</code> so as
to not increase the size.
-<li> <code>mpn_dc_sqrtrem</code>: Don't use <code>mpn_addmul_1</code> with
- multiplier==2, instead either <code>mpn_addlsh1_n</code> when available,
- or <code>mpn_lshift</code>+<code>mpn_add_n</code> if not.
<li> <code>mpn_dc_sqrtrem</code>, <code>mpn_sqrtrem2</code>: Don't use
<code>mpn_add_1</code> and <code>mpn_sub_1</code> for 1 limb operations,
instead <code>ADDC_LIMB</code> and <code>SUBC_LIMB</code>.
@@ -133,20 +130,12 @@
aliasing between <code>sp</code> and <code>rp</code>.
<li> <code>mpn_sqrtrem</code>: Some work can be saved in the last step when
the remainder is not required, as noted in Paul's paper.
-<li> <code>mpq_add</code>, <code>mpq_add</code>: The division "op1.den / gcd"
- is done twice, where of course only once is necessary. Reported by Larry
- Lambe.
<li> <code>mpq_add</code>, <code>mpq_sub</code>: The gcd fits a single limb
- with high probability and in this case <code>modlimb_invert</code> could
+ with high probability and in this case <code>binvert_limb</code> could
be used to calculate the inverse just once for the two exact divisions
"op1.den / gcd" and "op2.den / gcd", rather than letting
- <code>mpn_divexact_1</code> do it each time. This would require a new
- <code>mpn_preinv_divexact_1</code> interface. Not sure if it'd be worth
- the trouble.
-<li> <code>mpq_add</code>, <code>mpq_sub</code>: The use of
- <code>mpz_mul(x,y,x)</code> causes temp allocation or copying in
- <code>mpz_mul</code> which can probably be avoided. A rewrite using
- <code>mpn</code> might be best.
+ <code>mpn_bdiv_q_1</code> do it each time. This would require calling
+ <code>mpn_pi1_bdiv_q_1</code>.
<li> <code>mpn_gcdext</code>: Don't test <code>count_leading_zeros</code> for
zero, instead check the high bit of the operand and avoid invoking
<code>count_leading_zeros</code>. This is an optimization on all
@@ -173,26 +162,20 @@
since there's no apparent way to get <code>SHRT_MAX</code> with an
expression (since <code>short</code> and <code>unsigned short</code> can
be different sizes).
-<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> aren't very
- fast on one or two limb moduli, due to a lot of function call
- overheads. These could perhaps be handled as special cases.
-<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> want better
- algorithm selection, and the latter should use REDC. Both could
- change to use an <code>mpn_powm</code> and <code>mpn_redc</code>.
+<li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> aren't very fast on one
+ or two limb moduli, due to a lot of function call overheads. These could
+ perhaps be handled as special cases.
+<li> Make sure <code>mpz_powm_ui</code> is never slower than the corresponding
+ computation using <code>mpz_powm</code>.
<li> <code>mpz_powm</code> REDC should do multiplications by <code>g[]</code>
using the division method when they're small, since the REDC form of a
small multiplier is normally a full size product. Probably would need a
new tuned parameter to say what size multiplier is "small", as a function
of the size of the modulus.
-<li> <code>mpz_powm</code> REDC should handle even moduli if possible. Maybe
- this would mean for m=n*2^k doing mod n using REDC and an auxiliary
- calculation mod 2^k, then putting them together at the end.
-<li> <code>mpn_gcd</code> might be able to be sped up on small to
- moderate sizes by improving <code>find_a</code>, possibly just by
- providing an alternate implementation for CPUs with slowish
+<li> <code>mpn_gcd</code> might be able to be sped up on small to moderate
+ sizes by improving <code>find_a</code>, possibly just by providing an
+ alternate implementation for CPUs with slowish
<code>count_leading_zeros</code>.
-<li> Toom3 could use a low to high cache localized evaluate and interpolate.
- The necessary <code>mpn_divexact_by3c</code> exists.
<li> <code>mpf_set_str</code> produces low zero limbs when a string has a
fraction but is exactly representable, eg. 0.5 in decimal. These could be
stripped to save work in later operations.
@@ -371,7 +354,7 @@
<li> UltraSPARC/32: <code>mpn_divexact_by3c</code> can work 64-bits at a time
using <code>mulx</code>, in assembler. This would be the same as for
sparc64.
-<li> UltraSPARC: <code>modlimb_invert</code> might save a few cycles from
+<li> UltraSPARC: <code>binvert_limb</code> might save a few cycles from
masking down to just the useful bits at each point in the calculation,
since <code>mulx</code> speed depends on the highest bit set. Either
explicit masks or small types like <code>short</code> and
diff -r 14cd74efb9de -r 8ef60993a0bc mpn/generic/sqrtrem.c
--- a/mpn/generic/sqrtrem.c Thu Mar 03 12:36:39 2011 +0100
+++ b/mpn/generic/sqrtrem.c Thu Mar 03 15:28:18 2011 +0100
@@ -246,7 +246,11 @@
if (c < 0)
{
+#if HAVE_NATIVE_mpn_addlsh1_n
+ c += mpn_addlsh1_n (np, np, sp, n) + 2 * q;
+#else
c += mpn_addmul_1 (np, sp, n, CNST_LIMB(2)) + 2 * q;
+#endif
c -= mpn_sub_1 (np, np, n, CNST_LIMB(1));
q -= mpn_sub_1 (sp, sp, n, CNST_LIMB(1));
}
More information about the gmp-commit
mailing list