div_qr_1 interface

Thu Oct 17 13:24:58 CEST 2013

nisse at lysator.liu.se (Niels Möller) writes:

> To get going, I've written C implementations of mpn_div_qr_1n_pi1 and
> mpn_divf_qr1n_pi1, and made divrem_1 call them.

Below, also an mpn_div_qr_1, using these primitives (and with some
inspiration from divrem_1). For return value, I use the type

  typedef struct { mp_limb_t r; mp_limb_t qh; } gmp_qr_1_t;

The order here is a micro-optimization. I expect that on most ABI:s, for
the typical code fragment

  gmp_qr_1_t res;
  ...
  res.r = mpn_foo (...);
  return res;

the return value from mpn_foo will already be in the right register, and
only qh needs to be moved from some callee-save register to the second
return value register.

Are there any problems with using a struct return values like here, in
terms of performance, portability, style...?

If you agree this makes sense, I'm considering checking in the div_qr_1*
functions fairly soon (maybe divrem_1 changes can wait until
div_qr_1*_pi1 has settled. The interface should allow for additional
precomputed values).

I also found some old x86_64 assembly code for the new div_qr_1
algorithm. With perfect scheduling, it could run at 10 cycles on opteron
(the main cpu of interest at the time). Main loop is 28 instructions,
two independent mul, and decoder limited).

Regards,
/Niels

/* mpn_div_qr_1 -- mpn by limb division.

   Contributed to the GNU project by Niels Möller and Torbjörn Granlund

Copyright 1991, 1993, 1994, 1996, 1998, 1999, 2000, 2002, 2003, 2013 Free Software
Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.  */

#include "gmp.h"
#include "gmp-impl.h"
#include "longlong.h"

#ifndef DIV_QR_1_NORM_THRESHOLD
#define DIV_QR_1_NORM_THRESHOLD 3
#endif
#ifndef DIV_QR_1_UNNORM_THRESHOLD
#define DIV_QR_1_UNNORM_THRESHOLD 3
#endif

#if GMP_NAIL_BITS > 0
#error Nail bits not supported
#endif

/* Divides {up, n} by d. Writes the n-1 low quotient limbs at {qp,
 * n-1}. Returns remainder and high quotient limb. */
gmp_qr_1_t
mpn_div_qr_1 (mp_ptr qp, mp_srcptr up, mp_size_t n,
	      mp_limb_t d)
{
  gmp_qr_1_t res;

  unsigned cnt;
  mp_limb_t uh;

  ASSERT (n > 0);
  ASSERT (d > 0);

  if (d & GMP_NUMB_HIGHBIT)
    {
      /* Normalized case */
      mp_limb_t dinv, q;

      uh = up[--n];

      q = (uh >= d);
      res.qh = q;
      uh -= (-q) & d;

      if (BELOW_THRESHOLD (n, DIV_QR_1_NORM_THRESHOLD))
	{
	  cnt = 0;
	plain:
	  while (n > 0)
	    {
	      mp_limb_t ul = up[--n];
	      udiv_qrnnd (qp[n], uh, uh, ul, d);
	    }
	  res.r = uh >> cnt;
	  return res;
	}
      invert_limb (dinv, d);
      res.r = mpn_div_qr_1n_pi1 (qp, up, n, uh, d, dinv);
      return res;
    }
  else
    {
      /* Unnormalized case */
      mp_limb_t dinv, ul;

      if (! UDIV_NEEDS_NORMALIZATION
	  && BELOW_THRESHOLD (n, DIV_QR_1_UNNORM_THRESHOLD))
	{
	  uh = up[--n];
	  udiv_qrnnd (res.qh, uh, 0, uh, d);
	  cnt = 0;
	  goto plain;
	}

      count_leading_zeros (cnt, d);
      d <<= cnt;

#if HAVE_NATIVE_div_qr_1u_pi1
      /* FIXME: Call loop doing on-the-fly normalization */
#endif

      /* Shift up front, use qp area for shifted copy. A bit messy,
	 since we have only n-1 limbs available, and shift the high
	 limb manually. */
      uh = up[--n];
      ul = (uh << cnt) | mpn_lshift (qp, up, n, cnt);
      uh >>= (GMP_LIMB_BITS - cnt);

      if (UDIV_NEEDS_NORMALIZATION
	  && BELOW_THRESHOLD (n, DIV_QR_1_UNNORM_THRESHOLD))
	{
	  udiv_qrnnd (res.qh, uh, uh, ul, d);
	  up = qp;
	  goto plain;
	}
      invert_limb (dinv, d);

      udiv_qrnnd_preinv (res.qh, uh, uh, ul, d, dinv);
      res.r = mpn_div_qr_1n_pi1 (qp, qp, n, uh, d, dinv) >> cnt;
      return res;
    }
}

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.