bdiv_q_2.c improved

Thu Oct 8 06:37:02 UTC 2015

Joe keane <jgk at panix.com> writes:

>>How do you measure the speed gain of your new code?
>
> I haven't measured it.

I'm curious. And measurements are important when developing high
performance code. The relevant comparison is against mpn_sbpi1_bdiv_q,
with size 2, I think. Or mpn_divexact with size 2, before and after the
new code is hooked in.

>>To be able include new code in GMP, we ask for assignment of copyright
>>to the FSF. Are you willing to do that?
>
> That's no problem.

Good. I'm going to be mostly offline for the next few days, and I don't
remember the procedure off the top of my head, so I'm sorry I can't
provide you with the right forms right away. But it can take some time,
so it's good to get that process started pretty soon.

>>Maybe add_ssaaaa or sub_ddmmss could be used for some of the additions.
>
> I need to use carry/add-with-carry somehow.  My 'solution' is to code the
> whole thing in assembler, but maybe that is not necessary.

The add_ssaaaa and sub_ddmmss macros do add/sub with carry on two-limb
numbers, and use inline assembly on many architectures. So when
applicable, they may help to both gain a little performance, and make
the code more structured and easier to read. There are also some
extensions, e.g, look at the macros in mpn/generic/mod_1_1.c.

But sure, coding the whole function in assembler is good too, we do that
a lot in gmp. For assembly implementation, I think it's preferable to
let the inverse be an input to the function, and not have the assembly
code call or duplicate binvert_limb.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.