Improvements to powerpc32 asm code

Mark Rodenkirch mrodenkirch@wi.rr.com
Mon, 2 Jun 2003 20:41:49 -0500


On Monday, June 2, 2003, at 08:33 PM, Kevin Ryde wrote:

> Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
>>
>> Yes, that was -C.  Here are the -CD results if you are interested:
>>
>> 1           (21.1270)    (#10.0622)
>> 2              5.0496       #4.0293
>> 3              4.0124       #3.0227
>> 4             #4.0166        8.0599
>
> No, you need to apply it over steps of 16 limbs or similar, especially
> if the code is unrolled to a size like that and hence has special case
> finish-ups for various modulo sizes.  See tune/README,
>
> 	./speed -s 16-64 -t 16 -C -D mpn_add_n

This output looks much better (and makes more sense):
             mpn_add_n mpn_add_n_new
16          (82.5206)    (#64.4188)
32             4.0290       #3.2692
48             4.0322       #3.2662
64             4.0131       #3.2816
80             4.0142       #3.2718
96             4.0381       #3.2712
112            4.0161       #3.2642
128            4.0496       #3.2593
144            4.0188       #3.2857
160            4.0300       #3.2648
176            4.0142       #3.2628
192            4.0207       #3.2581
208            4.0322       #3.2851
224            4.0111       #3.2908
240            4.0024       #3.2520
256            4.0257       #3.2626
272            4.0599       #3.2919
288            4.0719       #3.2804
304            3.9912       #3.2590
320            4.0109       #3.2430

--Mark