From bodrato at mail.dm.unipi.it Thu Jan 2 15:43:31 2020 From: bodrato at mail.dm.unipi.it (Marco Bodrato) Date: Thu, 2 Jan 2020 16:43:31 +0100 Subject: Saving a significant multiplication in odd order Toom-Cook negacyclic convolution In-Reply-To: References: <86o92aywo9.fsf@shell.gmplib.org> <86sgrkrewe.fsf@shell.gmplib.org> Message-ID: <41ecd45ccdf386f30676b66c834265f6.squirrel@student-web1.dm.unipi.it> Ciao Brian, sorry for the long delay, I did not read this message carefully enough when you sent it. And I happened to reread it in the last few days... Il Ven, 5 Luglio 2019 11:08 am, Brian Quach ha scritto: > Currently I am just trying to produce some 'evaluation' and > 'interpolation' matrices (not similar to Toom-Cook matrices). > Here is an example for N = 3: > cyclotomic factorisation: x^3 + 1 = (x + 1)(x^2 - x + 1) (number of > factors k = 2) > multiplications needed: 2N - k = 6 - 2 = 4 (better than the There is a general problem about specialised Toom-N functions to compute the product modulo 2^{K*b}+1, where I assume a "GMP-limb" (a register, usually) has $b$ bits. They can be applied ONLY if the number of limbs (K in the above formula) is a multiple of N, not one more, not one less... I mean: if you have two operands of 100 "GMP-limbs", and you need the product modulo 2^{100*b}+1, you can always multiply them with a generic multiplication function using Toom-3, then reduce the result modulo 2^{100*b}+1. But you can not use a specialised Toom-3 as the one you are proposing. For the latter to be usable, you must be working modulo 2^{99*b}+1 or modulo 2^{102*b}+1... Multiplication modulo 2^K+1 (for small K) is used basically by the FFT-based multiplication and by mulmod_bnm1, the function that compute a product mod (2^b)^n-1. Both are widely used in the library... We should understand how the limitation on the sizes of those Toom-N, and the recursive structure of the functions that would use them, can interact... > 'evaluation' matrix for each operand (array of rows): [[1, -1, 1], [1, 0, > -1], [1, 1, 0], [0, 1, 1]] (can involve 5 additions / subtractions) > This is probably better than Bodrato's optimal sequence for Toom-3, as it > has the same addition / subtraction count and saves 2 shifts. > 'interpolation' matrix: [[1, 1, 1, -2], [-1, -1, 2, -1], [1, -2, 1, 1]] / > 3 (6 additions / subtractions and 1 division by 3) Yes, using the notation I used in "Towards Optimal Toom-Cook Multiplication [...]", for Toom-3 we need: Evaluation: 5*2 add/sub, 2 shift; 5mul Interpolation: 8 add/sub, 3 shift, 1 Sdiv Recomposition: 4 overlapping parts (+ 3 for reduction mod x^3+1) (each one costs as half an interpolation add/sub) But the current implementation in GMP mixes interpolation and recomposition, using some tricks suggested in the paper "High Degree Toom'n'Half [...]" and perform the following operations. Interpolation: 7 add/sub, 3 shift, 1 Sdiv Recomposition: 5 overlap (+3 after the call) The direct computation of Toom-3 modulo x^3+1 you are proposing, would then need the following. Evaluation: 5*2 add/sub; 4mul Interpolation: 6 add/sub, 1 Sdiv Recomposition: 3 overlapping parts (or, by mixing Int. and Rec.) Interpolation: 4 add/sub, 1 Sdiv Recomposition: 5 overlapping parts As you suggested, one product is saved, and also the linear operations are lighter. It should also be possible to reduce memory usage, because not all the results of the products are needed for all the time... But the algorithm should really be implemented, to understand how it can interact with the current functions in the library. Unfortunately it can not unconditionally be used for all sizes... and not even for every size in a range as the current Toom-based implemented functions. ?is, m -- http://bodrato.it/papers/ -------------- next part -------------- A non-text attachment was scrubbed... Name: mat3.gp Type: application/x-gnuplot Size: 1308 bytes Desc: not available URL: