GMP 4.3 multiplication performance

Wed Jun 3 15:55:47 CEST 2009

Torbjorn Granlund <tg at gmplib.org> writes:

>   My thinking was that if mpn_submul_1 is the best way to compute
>   sub_lshift on a particular machine, then mpn_submul_1 on that machine
>   can be considered as a decent native implementation of *both* submul_1
>   and sub_lshift.
>   
> OK.  

Actually, in this cacse it should be trivial to add a new assembler
entrypoint to submul_1, which converts a input shift count to a
multiplier for submul_1, and then jumps to submul_1. But why do that
in machine-spepcific assembler, if you can just as well do it as a cpp
macro, and make it work generally and automatically based on
parameters from tuneup?

But sure, it could be done locally in each file that uses sub_lshift.

Talking about code cleanup, I think there's a lot of duplication in
the toom evaluation code. Would the overhead be aceptable if one
writes a function for, e.g., evaluating a degree four polynomial in
the points +1 and -1? Or would it make sense with macros (say,
collected into toom-macros.h, if we want to put it all into
gmp-impl.h)? This particular helper function would be used by toom43
(once), toom44 (twice), toom54 (once), assuming all these variants use
this point pair.

I'd expect evaluation at +1 and -1 to be quite independent of
evaluation in +2 and -2, even with an optimal evaluation sequence.

Besides the size and readability of the source code, size of the
resulting object code also has a cost, in the form of memory and cache
pressure, even if I don't fully understand the tradeoffs in that area.

/Niels