GMP 4.3 multiplication performance
nisse at lysator.liu.se
Wed Jun 3 15:55:47 CEST 2009
Torbjorn Granlund <tg at gmplib.org> writes:
> My thinking was that if mpn_submul_1 is the best way to compute
> sub_lshift on a particular machine, then mpn_submul_1 on that machine
> can be considered as a decent native implementation of *both* submul_1
> and sub_lshift.
Actually, in this cacse it should be trivial to add a new assembler
entrypoint to submul_1, which converts a input shift count to a
multiplier for submul_1, and then jumps to submul_1. But why do that
in machine-spepcific assembler, if you can just as well do it as a cpp
macro, and make it work generally and automatically based on
parameters from tuneup?
But sure, it could be done locally in each file that uses sub_lshift.
Talking about code cleanup, I think there's a lot of duplication in
the toom evaluation code. Would the overhead be aceptable if one
writes a function for, e.g., evaluating a degree four polynomial in
the points +1 and -1? Or would it make sense with macros (say,
collected into toom-macros.h, if we want to put it all into
gmp-impl.h)? This particular helper function would be used by toom43
(once), toom44 (twice), toom54 (once), assuming all these variants use
this point pair.
I'd expect evaluation at +1 and -1 to be quite independent of
evaluation in +2 and -2, even with an optimal evaluation sequence.
Besides the size and readability of the source code, size of the
resulting object code also has a cost, in the form of memory and cache
pressure, even if I don't fully understand the tradeoffs in that area.
More information about the gmp-devel