If I am interacting with shared memory, gcc doesn't know if another
process changes values. This creates issues for optimization. I know
that "volatile" can help address this, but it winds up causing a giant
mess of other problems (like, for instance, I can't pass a volatile
into memcpy.. or pretty much anything else from the standard
So for instance, if I do something stupidly simple like busy-wait on a
shm value to change:
while (shm->index == oldindex) ;
gcc will kill that loop.
For simple things like this, and much more complicated things beyond
it, is there a way to cast something to "volatile" for one particular
No, volatile can not address this. This is not what the volatile
qualifier is for. The volatile qualifier is designed for working with
memory mapped hardware. It is not designed for multi-processor shared
memory. If a program is not multi-processor safe, then adding volatile
will never make it multi-processor safe.
This is because the issues related to making code multi-processor safe
are related to memory barriers and memory cache behaviour. Adding a
volatile qualifier will not change the program's behaviour with respect
Just as well, since the program is most likely incorrect anyhow.
Without any memory barrier, that expression could become true even
though the executing processor can't see any of the other changes made
Don't think volatile. Think memory barriers invoked via asm
constructs. Use the new atomic builtins.
And I can't help but add that most programs that work at this level get
it wrong. Better to use mutexes and condition variables. Or even
higher level constructs.
Ian's answer is the right one. For this particular subquestion, yes, it is
possible, just do a cast at the pointer level and access the object
through that int volatile*.
Do you have to use volatile if you're writing to memory mapped
hardware, or just reading?
Is caching the reason that makes another process sharing a memory
address different than a piece of hardware sharing a memory address?
I thought (probably incorrectly) that the atomic builtins were only
for atomic actions between threads in a process, not between separate
processes. Do they really work with the latter? The information I
got on freenode's #gcc (not oftc) was that gcc can't do anything to
protect shared memory between processes, that you have to use a system
Would you be able to paste the verbiage from the C or C standard? (I
don't have a coy of the standard).
The reason I ask is Microsoft appears to have a different
interpretation of the qualifier , and does not limit 'volatile' to
memory mapped hardware.
 http://msdn.microsoft.com/en-us/library/12a04hfd (v=vs.90).aspx
It says "Microsoft Specific" ... "End Microsoft Specific" for
a reason, y'know.
See Volatile: Almost Useless for Multi-Threaded Programming
But the page does describe the keyword as "... a type qualifier used
to declare that an object can be modified in the program by something
such as the operating system, the hardware, or a concurrently
Ian's description is consistent with the GCC manual's description :
"Both the C and C standard have the concept of volatile objects.
These are normally accessed by pointers and used for accessing
hardware." (Sorry about the 3.4.6 link in  - it was the first
Google hit from GCC man pages).
Unfortunately, I've never seen the C or C definition of the keyword.
Is 'volatile' implementation defined?
That part is true, kinda sorta.
ISO/IEC 9899:1999 6.7.3 Type qualifiers
An object that has volatile-qualified type may be modified in ways
unknown to the implementation or have other unknown side
effects. Therefore any expression referring to such an object shall be
evaluated strictly according to the rules of the abstract machine, as
described in 188.8.131.52. Furthermore, at every sequence point the value
last stored in the object shall agree with that prescribed by the
abstract machine, except as modified by the unknown factors mentioned
previously. What constitutes an access to an object that has
volatile-qualified type is implementation-defined.
There is a new standard which has proper support for atomics.
You do not /have/ to use volatile for writing or reading memory mapped
hardware, but it is usually a good idea.
When you use a volatile access, you are telling the compiler "do a read
or write here, exactly once - don't do it speculatively, and don't
re-use old results, and keep the specified ordering between all other
You can get the same results by using inline assembly, or external
function calls, or anything else that also specifically forces the read
or write. Most operating systems have some sort of calls or macros to
volatile write - but you have no guarantees. In particular, the
compiler can "save up" the write and do it later, it can roll together
as with file or function static data) omit the write altogether or use
Caching is one of the reasons, and is usually the culprit when reading
data does not give the expected results - even if the read is marked
"volatile". The compiler cannot force the memory system to give the
results from main memory - that depends on things like the MMU setup,
the cache hardware, cache snooping and cache consistency hardware and
setup, etc. Reads are also affected by speculation - speculative reads,
speculative execution, branch prediction, etc., which can wildly
followed by a memory barrier) - the compiler knows nothing about caches.
Writes typically have additional buffers and queues, and are
re-ordered for memory bus efficiency.
This is why "volatile" is generally not enough - and why you are
normally better off using the OS's API for such shared data. The
/implementation/ of such API's typically makes use of "volatile"
accesses - but also often other things, such as cache control assembly
For memory-mapped hardware, the MMU is /usually/ configured so that such
thus "volatile" accesses are /usually/ enough. But you would have to
check the details for your OS and target to see how it configures such
Most memory mapped hardware responds to both read and write requests, so
you normally have to use volatile in any cases. Of course the details
are going to depend on the specific hardware in question.
I'm not sure I completely understand the question. Memory mapped
hardware is not memory in the conventional sense. It's hardware that is
is not cached, but that is not the most important difference.
Shared memory on a multiprocessor machine is shared memory. It really
doesn't matter whether the memory is shared between threads or between
processes. The only significant difference between a thread and a
process on a modern OS is whether memory is shared by default or not
(there are other differences regarding signal delivery that are
irrelevant here). Once you create memory shared between processes, you
are effectively dealing with threads.
So, yes, the atomic builtins work fine. However, my observation is that
very few people can use them correctly. I would never use them myself,
except for the limiting cases of atomic increment and atomic compare and
swap with __ATOMIC_SEQ_CST.
Use mutexes instead. Not all operating systems support mutexes in
process shared memory, but they should work fine on GNU/Linux. You do
have to be careful to ensure that only one process initializes the
Ah, thanks Andrew.
I see they are referring to their wonderful abstract machine. That's
the same machine that has unlimited register sizes and never suffers
overflow or wrap. I should have known there was trouble afoot with the
differences in definitions between MS and GCC.
Drafts of the standards are freely available and for something like
the definition of 'volatile' the drafts match the final standards.
No it isn't.
"Certain aspects and operations of the abstract machine are described
in this International Standard as implementation-defined (for example,
Overflow: "If during the evaluation of an expression, the result is
not mathematically defined or not in the range of representable values
for its type, the behavior is undefined."
Wrapping: "Unsigned integers, declared unsigned, shall obey the laws
of arithmetic modulo 2^n where n is the number of bits in the value
representation of that particular size of integer"
Registers are outside the scope of the standard.
Section 6.7.3, paragraph 6
An object that has volatile-qualified type may be modified in ways
unknown to the implementation or have other unknown side effects.
Therefore any expression referring to such an object shall be
evaluated strictly according to the rules of the abstract machine,
as described in 184.108.40.206. Furthermore, at every sequence point the
value last stored in the object shall agree with that prescribed by
the abstract machine, except as modified by the unknown factors
mentioned previously.(114) What constitutes an access to an object
that has volatile-qualified type is implementation-defined.
Footnote 114 (non-normative):
(114) A volatile declaration may be used to describe an object
corresponding to a memory-mapped input/output port or an object
accessed by an asynchronously interrupting function. Actions on
objects so declared shall not be "optimized out" by an
implementation or reordered except as permitted by the rules for
That is a choice made by Microsoft (in my opinion, an unfortunate one).
You will note that all the relevant docs are in a section labelled
"Microsoft Specific." The important addition that Microsoft is making
a volatile object have acquire semantics. That is not in the language