nisse at (Niels Möller) writes:

  BTW, below is one (untested) way to organize gcd_22. Wants an sub_mddmmss,
  with output carry as a mask, analogous to the add_mssaaaa defined in

  typedef struct {
    mp_limb_t d[2];
  } mp_double_limb_t;

I believe one should use separate mp_limb_t variables, not an array, as
an array will force things to memory before field accesses.

I haven't looked at the rest of the code yet.  Note that we already have
asm variants for armv6t2, armv8a, power9, and x86-64.  The loops are
proably OK, the exit states are somewhat confused.

