Toom symmetries

Tue Jun 9 22:11:51 CEST 2009

Hi,

I've been looking into switching points for interpolate_7pts, from (0,
oo, 1, -1, 2, 1/2, -1/2) to (0, oo, +1, -1, 2, -2, 1/2). Evaluation
and interpolation is equivalent for most practical purposes, so the
reason I tried this was very mundane: Then I can reuse the same
helper function for evaluation in 2 and -2 for both toom44 and toom43
(the latter uses interpolation with the points 0, oo, 1, -1, 2, -2).

This got me to look a little closer to the symmetries of the choosen
points.

The most symmetry we can have is if the set of points is invariant
under both negation and inversion. Then the point set is a union of
sets of the types

  {0, oo}       Each point invariant under negation, and they are swapped
                by inversion.

  {1, -1}       Each point invariant under inversion, and they are
                swapped by negation.

  {a, -a, 1/a, -1/a} Swapped pairwise by the two operations.

For a point pair (a, -a), a reasonable interpolation strategy (which seems to
be part of at least some of the optimal sequences) is to split in even
and odd parts, by a butterfly-like operation

  (w_i, w_j) <- ( (w_i + w_j)/2, (w_i - w_j)/(2a) )

Then one question is how to best do this, with current operations that
would be either one addsub followed by two shifts, or one rshadd and
and rshsub. A combined operation might be useful. Currently, I don't
think any interpolation code uses addsub.

Furthermore, if we also have symmetry under inversion for all points,
then the resulting matrices for even and odd parts are the same!

The simplest example, 4 points,

  w0 = f(0)
  w1 = f(1)
  w2 = f(-1)
  w3 = f(oo)

corresponds to the matrix

  1  0  0  0
  1  1  1  1
  1 -1  1 -1
  0  0  0  1

After addsub/2, we have

  1  0  0  0
  0  1  0  1
  1  0  1  0
  0  0  0  1

We get (1, 0; 1,1) for the even coefficients and (1,1; 0,1) for the
even coefficients, which only differ by permutation.

For the next example, say we use eight points, 0, oo, 1, -1, 2, -2,
1/2, -1/2. The full matrix is

   1     0  0  0   0   0  0    0
   1     1  1  1   1   1  1    1
   1    -1  1 -1   1  -1  1   -1
   1     2  4  8  16  32 64  128
   1    -2  4 -8  16 -32 64 -128
   128  64 32  16  8   4  2    1
   128 -64 32 -16  8  -4  2   -1
   0     0  0   0  0   0  0    1

After butterfly operations on the ± pairs, we get

   1     0  0   0  0   0  0    0
   0     1  0   1  0   1  0    1
   1     0  1   0  1   0  1    0
   0     1  0   4  0  16  0   64
   1     0  4   0  16  0 64    0
   0    64  0  16  0   4  0    1
  64     0  16  0  4   0   1   0
   0     0  0   0  0   0  0    1

We get an even part

   1  0  0  0
   1  1  1  1
   1  4 16 64
  64 16  4  1

and an odd part

   1  1  1  1
   1  4 16 64
  64 16  4  1
   0  0  0  1

which again are equivalent. We can use the same row operations for both
matrices, which means that 8 point interpolation can be implemented 
compactly using butterfly operations and reductions of the above
4x4 matrix as building blocks.

Then the difficult question is choice of points. Next example with
full symmetry like above would by 12 points, where we might dd the
four points corresponding to a = 4, or a = 2^(GMP_NUMB_BITS) / 2.

Also for 6 points, we will get more symmetry (and hence compactness of
interpolation) if we use 1/2, -1/2 rather than +1 and -1. This is
likely slower than the current strategy. Anyway, after butterflies, we
get the same matrix

   1 0  0
   1 4 16
  16 4  1

for odd and even parts, which can be reduced by

  w1 = w1 - w0
  w2 = w2 - 16*w0
  w1 = (w1 - w2)/15
  w2 = (w2 - w1)/4

(I think this sequence is close to optimal). The butterfly operations
(with no special addsub primitive) would be

  w2 = (w1 - w2)/2
  w1 = w1 - w2
  w4 = (w3 - w4)/2
  w3 = (w3 - w4)/2

In total 12 subtractions, 2 left shifts, 5 right shifts and 2
divisions. This actually not so bad compared to the current code,
there's just one more left shift (and then evaluation obviously also
needs more shifts). Using the four points corresponding to a = sqrt(B)
= 2^{GMP_NUMB_BITS /2} might also be worth investigating, in this case
the same reduction procedure will involved only division by (B-1).

/Niels