VPMADD52

Victor Shoup shoup at cs.nyu.edu
Mon Oct 12 12:39:04 UTC 2015


There is a complement of " blend" instructions which should do the job.
For FFTs, it might also make sense to stick to 30 bit primes...

Sent from my iPad

> On Oct 12, 2015, at 3:26 AM, Niels Möller <nisse at lysator.liu.se> wrote:
> 
> Victor Shoup <shoup at cs.nyu.edu> writes:
> 
>> I'm also interested, because of potential applications 
>> to my NTL library for faster multi-modular FFTs.
> 
> I'm also thinking of small-prime FFT. I guess it's going to be a bit
> challenging to do efficient modulo p arithmetic. Besides efficient simd
> multiplication, I think one really need reasonable simd compare and
> conditional move. I'm not sure what's available there.
> 
>> One concrete issue: if one wanted to fully exploit VPMADD52 instructions,
>> then perhaps that would be a good reason to enable the "nails" feature
>> in GMP.
> 
> 12 nail bits (19% of a full word) is maybe a bit excessive.
> 
> Regards,
> /Niels
> 
> -- 
> Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
> Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list