Improvements to powerpc32 asm code
Mark Rodenkirch
mrodenkirch@wi.rr.com
Mon, 2 Jun 2003 20:41:49 -0500
On Monday, June 2, 2003, at 08:33 PM, Kevin Ryde wrote:
> Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
>>
>> Yes, that was -C. Here are the -CD results if you are interested:
>>
>> 1 (21.1270) (#10.0622)
>> 2 5.0496 #4.0293
>> 3 4.0124 #3.0227
>> 4 #4.0166 8.0599
>
> No, you need to apply it over steps of 16 limbs or similar, especially
> if the code is unrolled to a size like that and hence has special case
> finish-ups for various modulo sizes. See tune/README,
>
> ./speed -s 16-64 -t 16 -C -D mpn_add_n
This output looks much better (and makes more sense):
mpn_add_n mpn_add_n_new
16 (82.5206) (#64.4188)
32 4.0290 #3.2692
48 4.0322 #3.2662
64 4.0131 #3.2816
80 4.0142 #3.2718
96 4.0381 #3.2712
112 4.0161 #3.2642
128 4.0496 #3.2593
144 4.0188 #3.2857
160 4.0300 #3.2648
176 4.0142 #3.2628
192 4.0207 #3.2581
208 4.0322 #3.2851
224 4.0111 #3.2908
240 4.0024 #3.2520
256 4.0257 #3.2626
272 4.0599 #3.2919
288 4.0719 #3.2804
304 3.9912 #3.2590
320 4.0109 #3.2430
--Mark