Reflections on mulmod_bnm1

Wed Dec 9 13:45:03 CET 2009

> Are you sure? To me, it looks like the code produces values mod B^n + 1
> by code equivalent to

You are right! Both basecase functions do work also for semi-normalised
inputs, but current code calls the bnp1 one with perfectly normalised.

>>> mpn_mul + mpn_add. Say, an = rn = 20, bn = 12, or rn = 20, an = bn = 15
>>> (and threshold > 20).
>>
>> In both cases you mention bn > rn/2, this means: no copy, no padding.
>
> Both cases have rn below the MULMOD_BNM1_THRESHOLD. Then mpn_mulmod_bnm1
> zero-pads both operands to size rn, and calls mpn_bc_mulmod_bnm1.

Sorry, on my laptop MULMOD_BNM1_THRESHOLD=16, that's why I did not
understand the examples :-D

But I do not understand anyway. Why should one call mulmod_bnm1 with rn
below the MULMOD_BNM1_THRESHOLD... and expect a fast answer?
You need to allocate more memory than for the direct use of mpn_mul. It is
slower (the threshold basically certify this). Why should one use it?
Don't tell me that it happens after recursion, or mention rn=40 as I did
in my previous mail.

> With
> even smaller bn/rn, you can get padding also for rn >
> MULMOD_BNM1_THRESHOLD.

Yes, but an unlikely padding of a few limbs (wrt rn) is the price to have
a simple interface used _always_. The other way you can avoid an unlikely
padding, but you trade it with a more complex code (mul instead of mul_n)
used all the times.

We should measure the difference...

Regards,
Marco

-- 
http://bodrato.it/