> results for "almost all" parameter sets. Then it is extremely slow. (2) We
> have to optimize the code and make it faster by a speedup of hundreds to
> thousands times.

 Such a speedup can only be reached by an algorithm optimization (if possible).
 Assembly optimizations provide only a 1x to 10x speedup (generally < 2x).

