GMP 4.3 multiplication performance
nisse at lysator.liu.se
Wed Jun 3 13:19:55 CEST 2009
Torbjorn Granlund <tg at gmplib.org> writes:
> This particular macro is not currently problem free. Let's compare
> lshift+add_n and submul_1 on some machines:
> lshift+sub_n submul_1
> athlon64 3.87 2.5
> core2 3.3 4.5
> pentium4/64 7.3 14.9
> ultrasparc 3 7.75 23
> power4/ppc970 5.0 10
So of these, submul_1 is a decent alternative only on athlon64. I was
just about to ask. Is submul_1 limited by the multiplier or the carry
recurrency? In the latter case, I imagine there's no point in
implementing a separate sub_lshift there.
> My solution is that we write mpn_sublsh_n for machines we care about, or
> that we define a USE_SUBMUL_1_FOR_SUB_LSHIFT in tune/tuneup.c.
And then one would put
# define mpn_sub_lshift(...) mpn_submul(...)
# define HAVE_NATIVE_mpn_lshift
in gmp-impl.h, so that code using it need only check for
> These might be tricky to get fast on x86, though. One problem is that a
> non-constant general register shift count must reside in the
> rcx/ecx/cx/cl register, which means we need to alternate it between cnt
> and 64-cnt in the loop. (Alternatively use could use MMX/SSE shift
Annoying restriction, that. But since top bits are ignored, two
negation instructions in the loop should be sufficient to switch back
and forth between those values.
> It should be renamed add_sub_n or perhaps add_n_sub_n, to avoid
> breaching the naming scheme.
Or "butterfly", unless you see a need to have a large family of
functions that compute two things in parallel.
More information about the gmp-devel