mpn_sqrtrem{1,2} - patch for pure C implem
Adrien Prost-Boucle
adrien.prost-boucle at laposte.net
Sun Apr 2 10:52:14 UTC 2017
Hi,
On my side, I do observe a little improvement (64-bit).
With original GMP correction:
1000000x 100 values of 64 bits
Time: 4.77369 s
1000000x 100 values of 48 bits
Time: 5.07621 s
1000000x 100 values of 32 bits
Time: 5.07722 s
1000000x 100 values of 16 bits
Time: 5.24327 s
With my new proposal (let's name that proposal 2):
1000000x 100 values of 64 bits
Time: 4.68028 s
1000000x 100 values of 48 bits
Time: 4.91651 s
1000000x 100 values of 32 bits
Time: 4.99838 s
1000000x 100 values of 16 bits
Time: 5.16515 s
Note that for each bit size, my test first generates 100 random values
and then does sqrt repeatedly, changing value each time.
I think this pseudo-random data should look like real-world usage...
Execution times vary slightly between runs, but the general trend is there.
I also tried a third, and fourth way of writing the correction code.
It uses another variable accu2 to pre-compute the correction.
Proposal 3:
(force sharing intermediate results)
(same speed as my proposal 2)
/* Correction: add 1 to the root if needed (test: r^2 + 2r + 1 <= a) */
accu = a0 - root * root;
accu2 = 2*root + 1;
if ((mp_limb_signed_t)accu2 <= (mp_limb_signed_t)accu) {
accu -= accu2;
root++;
}
*rp = accu;
Proposal 4:
(should better fit conditional assignment instructions)
(but appears slightly slower on my laptop)
accu = a0 - root * root;
accu2 = accu - (2*root + 1);
if ((mp_limb_signed_t)accu2 >= 0) {
accu = accu2;
root++;
}
*rp = accu;
Probably for each implementation there is a combination of compiler+machine for which it's best.
Maybe this part not critical, we just keep the version that is clearest to read?
The real gain will be with machine-specific sqrtrem1 implementation.
I'll benchmark with ABI=32 as you suggest.
Regards,
Adrien
On Sun, 2017-04-02 at 06:44 +0200, Marco Bodrato wrote:
> Ciao,
>
> Il Sab, 1 Aprile 2017 9:02 pm, Adrien Prost-Boucle ha scritto:
> > On Sat, 2017-04-01 at 18:15 +0200, Marco Bodrato wrote:
> > > After the patch:
> > > $ (cd tests/devel; make sqrtrem_1_2)&&time tests/devel/sqrtrem_1_2 c
> > > Corner cases tested, up to bits:
> > > \ 63
> > > Values of a single limb, tested.
> > >
> > > > user 1m47.889s
> > I also saved one instruction in the final correction step (for now lightly
> > tested, just make check).
> > + /* Correction: add 1 to the root if needed (test: r^2 + 2r - a < 0) */
> > + accu = a0 - root * root;
> > + if ( (mp_limb_signed_t)(2*root - accu) < 0) {
> > + accu -= 2*root + 1;
> > + root++;
> > + }
>
> Compared to your previous proposal, that's a small regression, I get:
> > user 1m49.209s
>
> I refined a little bit the exhaustive testing program for sqrtrem[12]:
> https://gmplib.org/repo/gmp/rev/c32d616089b4
>
> For ABI=32, can you please tell us the timings obtained with:
>
> make&&(cd tests/devel/;make sqrtrem_1_2&&time ./sqrtrem_1_2 x 1)
>
> before, and after your patch (maybe playing with the different flavours of
> the correction step :-)?
>
> Best regards,
> m
>
More information about the gmp-devel
mailing list