Memory barrier for fat initialization
Torbjörn Granlund
tg at gmplib.org
Wed Jan 14 09:09:16 UTC 2015
nisse at lysator.liu.se (Niels Möller) writes:
Sounds doable for the sqr_basecase threshold, at least.
On the other hand, on x86_64, maybe all chips we care about have the
needed extensions, so it's *easy* to add an mfence or sfence instruction
and not have to worry? I guess 32-bit x86 is more painful, since I guess
we'd have to check if the instructions are available.
I take it that you're suggesting a fix to a store ordering problem which
exist on many platforms, but not the one we use (for fat binaries).
It could become a problem if all these things are true:
(1) Intel's weak ordering claims of rep;movs are in fact realised in
actual hardware.
(2) Compilers start coalescing our field-by-field struct assignments
into a rep;movs. I haven't studied our code in detail now to tell
if that is even remotely likely. We surely don't just assign one
structure to another.
(3) Compilers start using the high-overhead rep;movs for small blocks
under optimise-for-speed conditions. (Gcc uses rep;movs for small
blocks when optimising for size.)
I suggest that we don't fix anything in this area, since I don't think a
problem exists.
We will need synchronisation primitives for some platforms, if we enable
fat builds for them. Sparc under the kernel linux is one example (but
sparc under solaris uses TSO (total store ordering). We will need it
for ARMs and POWER. Etc.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list