[PATCH] Optimize 32-bit sparc T1 multiply routines.
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Fri Jan 4 02:57:50 CET 2013
Ciao,
Il Ven, 4 Gennaio 2013 1:49 am, David Miller ha scritto:
> Just FYI, I'm also working on an mpn_mul_basecase that makes use of
> the T4 'mpmul' instruction which can do NxN 64-bit limb multiplies
> for values of N from 1 to 32.
Great! Maybe it can be useful also for mul_2 or higher.
> It's an instruction that seems like it was designed specifically for
> libgmp :-)
If it support only balanced multiplication (NxN and not NxM), its target
probably is 2048-bit public-key crypto.
> I guess the ideal implementation would be to have gmp-mparam.h setup
> so that basecase only gets invoked for N <= 32.
With the current code we can not impose such a restriction.
mpn_sqr_basecase is allowed to support only sizes smaller than the TOOM2
threshold, but mpn_mul_basecase must be able to handle unbalanced operands
and big sizes of the longer one (the first).
Should we add a balanced only mul_basecase_n function, to be used by
mul_n, to fully exploit such an instruction? Modular arithmetic (crypto,
ECM, etc.) can benefit of such an approach. How much faster than a
fully-flexible mul_basecase would it be?
Best regards,
Marco
--
http://bodrato.it/
More information about the gmp-devel
mailing list