Toom symmetries

Tue Jun 9 23:19:17 CEST 2009

On Jun 9, 2009, at 4:11 PM, Niels Möller wrote:

> The simplest example, 4 points,
>
>   w0 = f(0)
>   w1 = f(1)
>   w2 = f(-1)
>   w3 = f(oo)
>
> corresponds to the matrix
>
>   1  0  0  0
>   1  1  1  1
>   1 -1  1 -1
>   0  0  0  1
>
> After addsub/2, we have
>
>   1  0  0  0
>   0  1  0  1
>   1  0  1  0
>   0  0  0  1

Hmmm... sorry to interrupt your train of thought, this gives me an  
idea. Maybe you can delay some of the division by two until after  
part of the recomposition?

i.e.: instead of addsub/2, just do addsub, then you have

   1  0  0  0
   0  2  0  2
   2  0  2  0
   0  0  0  1

i.e. if the output poly coefficients are c0, c1, c2, c3, you have

c0
2c1 + 2c3
2c0 + 2c2
c3

Now start doing recomposition:

          |=== 2c1 + 2c3 ===|
                   |=== 2c0 + 2c2 ===|

Then divide by 2:

          |==== c1 + c3 ====|
                   |==== c0 + c2 ====|

Then continue as usual, i.e. subtract c3, c0 from right places, add  
them at beginning and end (and also use the hi(c0) - lo(c3)  
redundancy trick...), to get

|====== c0 =======|
          |====== c1 =======|
                   |====== c2 =======|
                            |====== c3 =======|

I haven't checked the operation count, but it seems plausible that  
something like this could save half a shift.

> In total 12 subtractions, 2 left shifts, 5 right shifts and 2
> divisions. This actually not so bad compared to the current code,
> there's just one more left shift (and then evaluation obviously also
> needs more shifts). Using the four points corresponding to a = sqrt(B)
> = 2^{GMP_NUMB_BITS /2} might also be worth investigating, in this case
> the same reduction procedure will involved only division by (B-1).

I did try the 2^(GMP_NUMB_BITS/2) idea at some stage --- one  
difficulty is that the pointwise multiplications get slightly bigger.

david