Memory barrier for fat initialization

Tue Jan 13 16:10:45 UTC 2015

nisse at lysator.liu.se (Niels Möller) writes:

  I'm a bit confused about this. Then when are memory barriers (mfence and
  friends) ever needed? I have a pretty vague idea about how memory models
  work in both theory and practice. I'm thinking about something like:

    cpu0 for some reason has parts of cpuvec cached in a local L1 cache.

    cpu1 writes cpuvec and __gmpn_cpuvec_initialized = 1.

    cpu0 reads __gmpn_cpuvec_initialized, from shared L2 cache,
    gets the updated value, 1.

    cpu0 reads en cpuvec entry. Gets the old value from its local L1
    cache. Do the architecture specs rule out this possibility?

This scenario is only possible if cpu1's store sequence is seen as
reordered by cpu0.  That's not supposed to happen, if plain instructions
are used.  More modern architectures make no such store ordering
guarantees.

Apparently rep;movs can mess things up (at least on Intel chips, I
haven't confirmed this with AMD) and so can some weakly ordered
instructions under the sse hat.  If the dear compiler decides to
initialise the cpuvec structure using any of these instructions, we
might be screwed.

  I had a quick look at what linux does. It seems to use arch-specific
  inline assembly to implement the barriers it needs.

Not neat.  It might also require configure-time tests for arch level.

Torbjörn
Please encrypt, key id 0xC8601622