[Gmp-commit] /var/hg/gmp: Clean up TODO list.
mercurial at gmplib.org
mercurial at gmplib.org
Tue Mar 8 20:15:25 CET 2011
details: /var/hg/gmp/rev/d8752cfdaf47
changeset: 14013:d8752cfdaf47
user: Torbjorn Granlund <tege at gmplib.org>
date: Tue Mar 08 20:15:18 2011 +0100
description:
Clean up TODO list.
diffstat:
mpn/x86/atom/sse2/sqr_basecase.asm | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diffs (25 lines):
diff -r 7a815982e058 -r d8752cfdaf47 mpn/x86/atom/sse2/sqr_basecase.asm
--- a/mpn/x86/atom/sse2/sqr_basecase.asm Tue Mar 08 19:40:32 2011 +0100
+++ b/mpn/x86/atom/sse2/sqr_basecase.asm Tue Mar 08 20:15:18 2011 +0100
@@ -25,7 +25,6 @@
C * Check if 'jmp N(%esp)' is well-predicted enough to allow us to combine the
C 4 large loops into one; we could use it for the outer loop branch.
C * Optimise code outside of inner loops.
-C * Combine rp and up updates in outer loop to save a bunch of lea insns.
C * Write combined addmul_1 feed-in a wind-down code, and use when iterating
C outer each loop. ("Overlapping software pipelining")
C * Perhaps use caller-saves regs for inlined mul_1, allowing us to postpone
@@ -33,11 +32,8 @@
C * Perhaps write special code for n < M, for some small M.
C * Replace inlined addmul_1 with smaller code from aorsmul_1.asm, or perhaps
C with even less pipelined code.
-C * Fix function header code.
-C * We run the outer loop too long, until we perform a 1-limb by 1-limb
-C multiply. The main problem with this is that the decreasing inner loop
-C trip counts will cause poor exit branch prediction; this hurts short loops
-C VERY much.
+C * We run the outer loop until we have a 2-limb by 1-limb addmul_1 left.
+C Consider breaking out earlier, saving high the cost of short loops.
C void mpn_sqr_basecase (mp_ptr wp,
C mp_srcptr xp, mp_size_t xn);
More information about the gmp-commit
mailing list