Memory barrier for fat initialization

Torbjörn Granlund tg at gmplib.org
Wed Jan 14 09:09:16 UTC 2015


nisse at lysator.liu.se (Niels Möller) writes:

  Sounds doable for the sqr_basecase threshold, at least.
  
  On the other hand, on x86_64, maybe all chips we care about have the
  needed extensions, so it's *easy* to add an mfence or sfence instruction
  and not have to worry? I guess 32-bit x86 is more painful, since I guess
  we'd have to check if the instructions are available.
  
I take it that you're suggesting a fix to a store ordering problem which
exist on many platforms, but not the one we use (for fat binaries).

It could become a problem if all these things are true:

(1) Intel's weak ordering claims of rep;movs are in fact realised in
    actual hardware.

(2) Compilers start coalescing our field-by-field struct assignments
    into a rep;movs.  I haven't studied our code in detail now to tell
    if that is even remotely likely.  We surely don't just assign one
    structure to another.

(3) Compilers start using the high-overhead rep;movs for small blocks
    under optimise-for-speed conditions.  (Gcc uses rep;movs for small
    blocks when optimising for size.)

I suggest that we don't fix anything in this area, since I don't think a
problem exists.

We will need synchronisation primitives for some platforms, if we enable
fat builds for them.  Sparc under the kernel linux is one example (but
sparc under solaris uses TSO (total store ordering).  We will need it
for ARMs and POWER.  Etc.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list