[PATCH] Optimize 32-bit sparc T1 multiply routines.

Fri Jan 4 02:57:50 CET 2013

Ciao,

Il Ven, 4 Gennaio 2013 1:49 am, David Miller ha scritto:
> Just FYI, I'm also working on an mpn_mul_basecase that makes use of
> the T4 'mpmul' instruction which can do NxN 64-bit limb multiplies
> for values of N from 1 to 32.

Great! Maybe it can be useful also for mul_2 or higher.

> It's an instruction that seems like it was designed specifically for
> libgmp :-)

If it support only balanced multiplication (NxN and not NxM), its target
probably is 2048-bit public-key crypto.

> I guess the ideal implementation would be to have gmp-mparam.h setup
> so that basecase only gets invoked for N <= 32.

With the current code we can not impose such a restriction.

mpn_sqr_basecase is allowed to support only sizes smaller than the TOOM2
threshold, but mpn_mul_basecase must be able to handle unbalanced operands
and big sizes of the longer one (the first).

Should we add a balanced only mul_basecase_n function, to be used by
mul_n, to fully exploit such an instruction? Modular arithmetic (crypto,
ECM, etc.) can benefit of such an approach. How much faster than a
fully-flexible mul_basecase would it be?

Best regards,
Marco

-- 
http://bodrato.it/