Arm Memory ModelArm Architecture Formal Team |
The Armv8 memory model gives ordering requirements over memory accesses, or more precisely, the Effects generated by AArch64 instructions. The External visibility requirement intuitively states which local orderings must be respected on a given processor, or thread.
Formally, those constraints forbid certain shapes in program executions. For example the External visibility requirement forbids cycles in the Ordered-before relation, which consists of local orderings (Locally-ordered-before relation) and interactions between threads (Observed-by relation).
In this chapter, will try to build such a model progressively.
The following model aarch64-tso.cat is a simplified TSO model for AArch64 code.
The name TSO (Total Store Order) has been introduced by Sun microsystems in the eighties. Modern Intel systems have a very similar model.
Here we have chosen to write this cat model following the style
adopted by Arm for their actual memory model, namely by defining a partial
order ob (Ordered-before) that is made of two components:
obs (Observed-by), and
lob (Locally-ordered-before) which represents the local orderings a thread or a processor must respect.
The TSO model relaxes the SC model, in the following sense: writes that precede reads in program order can be reordered before those reads. In other words, write-to-read pairs in program order are not considered ordered by TSO, whereas they are on SC.
This relaxation is illustrated by running the six
basic tests of the previous section on top of the TSO model.
The non-SC executions of the tests SB and S are now
allowed, because those execution contain write-to-read pairs in program
order. The four other tests remain forbiden because they do not
contain such po pairs.
At Arm, we define our memory model in cat and then transliterate it in English to appear in the Arm Architecture Reference Manual. We do so using a tool called miaou.
Similarly here, we can ask miaou for an English transliteration of the relations defined by the AArch64-TSO model.
Let us start with the relation obs (Observed-by), which reads:
Try and do:
% miaou7 -tex ./catdefs.tex -show obs aarch64-tso.cat
This should give you the following text (but in LaTex):
An Effect E1 is Observed-by an Effect E2 if and only if one of the following applies:
Apart from using the term “Read-before” as an alternative to “From-read” for
naming for the relation fr,
one novelty is the rfe relation (Reads-from-external). Two Effects
are related by rfe when they are related by rf and are
generated by different threads.
The central relation of the model is ob (Ordered-before), which reads
Try and do:
% miaou7 -tex ./catdefs.tex -show ob aarch64-tso.cat
This should give you the following text (but in LaTex):
An Effect E1 is Ordered-before an Effect E2 if and only if one of the following applies:
The text essentially paraphrases the code. Both describe the
transive closure of the union of the relations obs and lob.
We now turn to the definition of Locally-ordered-before, which reads:
Try and do:
% miaou7 -tex ./catdefs.tex -show lob aarch64-tso.cat
This should give you the following text (but in LaTex): An Effect E1 is Locally-ordered-before an Effect E2 if and only if one of the following applies:
The definition of the relation lob introduces new cat
constructs.
The uppercase notations conventionally stand for sets of Effects. Memory Effects represent accesses to the memory:
R collects Memory Read Effects,
W Memory Write Effects, and
M all Memory Effects (let M = R|W).
As a distinctive feature of TSO, writes that precede reads in program order (write-to-read pairs for short) are not ordered by the model.
All other pairs of accesses are ordered. Our rules follow this positive
formulation: read-to-write and read-to-read pairs (or in other words read-to-all-memory-access pairs, written [R];po;[M] in cat), as well as write-to-write
pairs ([W]; po; [W]), are locally ordered.
About the sequence [S1];r;[S2]: this
can be read out loud as “starting on an Effect from the set S1, I do one
step of the relation r and land on an Effect from the set S2”.
Formally, given a set of Effects S the expression
[S] evaluates to the identity relation over
the set S. Then, the sequence
[S1];r;[S2] is a neat notation for the
restriction of relation r from the Effects of S1 to the Effects of S2.
The AArch64 architecture provides fence or barrier instructions,
such as dmb (Data Memory Barrier). Fence instructions are here for
restoring order where the model has none.
Unsurprisingly, executing fence instructions generate fence
Effects. Here, the “full-fence” instruction dmb sy generates
the Effects that the DMB.SY set collects.
Again, the notation [M];po;[DMB.SY];po;[M] neatly follows the
intuition that inserting a full-fence instruction in between two
instructions will order the Memory Effects generated by these instructions.
As an example, inserting dmb sy instructions
in between the write-to-read pairs of the tests SB
and R suffices to suppress their non-SC behaviours:
| |||
|
|
The model aarch64-tso is much too strong for AArch64, as the actual AArch64 model allows all the six basic tests. Hence, we need to weaken it in order to allow those basic tests. To that aim, as a first step, we simply define an empty Locally-ordered-before relation, resulting in the model aarch64-weak.cat.
This new model is barely usable as programmers have very little order guarantees to rely upon. At least, the effect of some fence instructions have to be accounted for. Here is the transliteration of a stronger Locally-observed-before relation.
An Effect E1 is Barrier-ordered-before an Effect E2 if and only if one of the following applies:
An Effect E1 is Locally-ordered-before an Effect E2 if and only if E1 is Barrier-ordered-before E2.
Implement the new lob relation, resulting in the new model
model aarch64-fence.cat. The tool herd pre-defines
the fence Effect sets DMB.SY, DMB.ST, DMB.LD,
which are generated by the approprtiate fence instructions.
Check your model by comparing its output with the reference file.
% herd7 -cat aarch64-fence.cat @fence > FEN.txt % mcompare7 FEN-REF.txt FEN.txt
Many languages (such as C) and architectures provide another mean to restore order: special store and load instructions, sometimes called “atomic”, that impact the memory model.
The architecture AArch64 provides Store-Release (Effect set L) and two
varieties of Load-Acquire: Acquire (Effect set A) and the weaker
Acquire-PC (Effect set Q).
Here is the English transliteration of the memory model impact of the new Effect
sets, as additions to the definition of bob:
An Effect E1 is Barrier-ordered-before an Effect E2 if and only if one of the following applies:
One may notice that atomic instructions have a “half-barrier” behaviour, as Memory Write Release Effects order Effects before them in program order, while Memory Read Acquire Effects order Effects after them in program order.
Extend your model aarch64-fence.cat (or ours) as a new model aarch64-atomic.cat that accounts for the existence of the atomic instructions.
Check your model by comparing its output with the reference file.
% herd7 -cat aarch64-atomic.cat > ATO.txt % mcompare7 ATO-REF.txt ATO.txt
This document was translated from LATEX by HEVEA.