Recent changes to mpn_get_str/mpn_set_str

Torbjörn Granlund tg at gmplib.org
Tue Feb 28 22:04:47 UTC 2017


"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

  >   It is not specialised for base 10, and it is not faster than
  >   current code if only a few limbs must be converted, but around
  >   10 limbs it should be a gain.
  >
  > That's expected, I think.
  
  Well, the graphs on the paper show an earlier gain... but much depend on
  the base, on architecture...
  
  Anyway, I attach a second attempt (the full substitute for file
  mpn/generic/get_str.c). I specialised code for base==10, refined some
  details, and I moved some of the internal logic of mpn_div_q directly into
  this code.
  Now the crossover is around 5 limbs.
  
The cross-over between old base-case code for base 10 and the new
"base-case" code?

  The code I extracted from mpn_div_q gave me the following:
  
        if (bkn == 1)
  	mpn_divrem_1 (y0, 0L, tmp, un + 2, bkh);
        else if (bkn == 2)
  	y0 [un + 1] = mpn_divrem_2 (y0, 0L, tmp, un + 3, bk);
        else
  	{
  	  gmp_pi1_t dinv;
  	  invert_pi1 (dinv, bk[bkn - 1], bk[bkn - 2]);
  	  y0 [un + 1] = mpn_sbpi1_div_q (y0, tmp, un + bkn + 1, bk, bkn,
  dinv.inv32);
  	}
  
  Are divrem_[12] the best function we currently have to obtain the quotient
  only when we divide by 1 or 2 limbs?
  
I think so, as mpn_div_qr_1n_pi2/mpn_div_qr_1u_pi2 in assembly are not
yet checked for any machine.

For cutoff point of 5, will divrem_1/divrem_2 ever get used?  I'd expect
the choice to be between mpn_sbpi1_div_q and mpn_dcpi1_div_q.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list