VPMADD52
Victor Shoup
shoup at cs.nyu.edu
Mon Oct 12 12:39:04 UTC 2015
There is a complement of " blend" instructions which should do the job.
For FFTs, it might also make sense to stick to 30 bit primes...
Sent from my iPad
> On Oct 12, 2015, at 3:26 AM, Niels Möller <nisse at lysator.liu.se> wrote:
>
> Victor Shoup <shoup at cs.nyu.edu> writes:
>
>> I'm also interested, because of potential applications
>> to my NTL library for faster multi-modular FFTs.
>
> I'm also thinking of small-prime FFT. I guess it's going to be a bit
> challenging to do efficient modulo p arithmetic. Besides efficient simd
> multiplication, I think one really need reasonable simd compare and
> conditional move. I'm not sure what's available there.
>
>> One concrete issue: if one wanted to fully exploit VPMADD52 instructions,
>> then perhaps that would be a good reason to enable the "nails" feature
>> in GMP.
>
> 12 nail bits (19% of a full word) is maybe a bit excessive.
>
> Regards,
> /Niels
>
> --
> Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
> Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list