Arm Memory Model

Arm Architecture Formal Team

Overall

The Armv8 memory model gives ordering requirements over memory accesses, or more precisely, the Effects generated by AArch64 instructions. The External visibility requirement intuitively states which local orderings must be respected on a given processor, or thread.

Formally, those constraints forbid certain shapes in program executions. For example the External visibility requirement forbids cycles in the Ordered-before relation, which consists of local orderings (Locally-ordered-before relation) and interactions between threads (Observed-by relation).

In this chapter, will try to build such a model progressively.

First steps: defining TSO in cat

The following model aarch64-tso.cat is a simplified TSO model for AArch64 code.

"AArch64, TSO-like" include "cos.cat" // SC per-location acyclic po&loc|rf|co|fr as internal let lob = [R];po;[M] | [W];po;[W] | [M];po;[DMB.SY];po;[M] let obs = rfe|co|fr let rec ob = obs | lob | ob;ob irreflexive ob as external

The name TSO (Total Store Order) has been introduced by Sun microsystems in the eighties. Modern Intel systems have a very similar model.

Here we have chosen to write this cat model following the style adopted by Arm for their actual memory model, namely by defining a partial order ⁠ob (Ordered-before) that is made of two components:

The TSO model relaxes the SC model, in the following sense: writes that precede reads in program order can be reordered before those reads. In other words, write-to-read pairs in program order are not considered ordered by TSO, whereas they are on SC.

This relaxation is illustrated by running the six basic tests of the previous section on top of the TSO model. The non-SC executions of the tests SB and S are now allowed, because those execution contain write-to-read pairs in program order. The four other tests remain forbiden because they do not contain such po pairs.

  
      

Further steps: fundamental relations of the Arm memory model

At Arm, we define our memory model in cat and then transliterate it in English to appear in the Arm Architecture Reference Manual. We do so using a tool called miaou.

Similarly here, we can ask miaou for an English transliteration of the relations defined by the AArch64-TSO model.

Observed-by

Let us start with the relation ⁠obs (Observed-by), which reads:

let obs = rfe|co|fr

Try and do:

% miaou7 -tex ./catdefs.tex -show obs aarch64-tso.cat

This should give you the following text (but in LaTex):

An Effect E1 is Observed-by an Effect E2 if and only if one of the following applies:

Apart from using the term “Read-before” as an alternative to “From-read” for naming for the relation fr, one novelty is the rfe relation (Reads-from-external). Two Effects are related by rfe when they are related by rf and are generated by different threads.

Ordered-before

The central relation of the model is ⁠ob (Ordered-before), which reads

let rec ob = obs | lob | ob;ob

Try and do:

% miaou7 -tex ./catdefs.tex -show ob aarch64-tso.cat

This should give you the following text (but in LaTex):

An Effect E1 is Ordered-before an Effect E2 if and only if one of the following applies:

The text essentially paraphrases the code. Both describe the transive closure of the union of the relations ⁠obs and ⁠lob.

Locally-ordered-before

We now turn to the definition of Locally-ordered-before, which reads:

let lob = [R];po;[M] | [W];po;[W] | [M];po;[DMB.SY];po;[M]

Try and do:

% miaou7 -tex ./catdefs.tex -show lob aarch64-tso.cat

This should give you the following text (but in LaTex): An Effect E1 is Locally-ordered-before an Effect E2 if and only if one of the following applies:

The definition of the relation ⁠lob introduces new cat constructs.

Some new cat constructs

The uppercase notations conventionally stand for sets of Effects. Memory Effects represent accesses to the memory:

As a distinctive feature of TSO, writes that precede reads in program order (write-to-read pairs for short) are not ordered by the model.

All other pairs of accesses are ordered. Our rules follow this positive formulation: read-to-write and read-to-read pairs (or in other words read-to-all-memory-access pairs, written [R];po;[M] in cat), as well as write-to-write pairs ([W]; po; [W]), are locally ordered.

About the sequence ⁠[S1⁠];r⁠;[S2⁠]: this can be read out loud as “starting on an Effect from the set S1, I do one step of the relation r and land on an Effect from the set S2”.

Formally, given a set of Effects S the expression ⁠[S⁠] evaluates to the identity relation over the set S. Then, the sequence ⁠[S1⁠];r⁠;[S2⁠] is a neat notation for the restriction of relation r from the Effects of S1 to the Effects of S2.

Further steps: Barriers and Release/Acquire

The AArch64 architecture provides fence or barrier instructions, such as dmb (Data Memory Barrier). Fence instructions are here for restoring order where the model has none.

Unsurprisingly, executing fence instructions generate fence Effects. Here, the “full-fence” instruction dmb sy generates the Effects that the DMB.SY set collects.

Again, the notation ⁠[⁠M⁠];⁠po⁠;[⁠DMB.SY⁠];⁠po⁠;[⁠M⁠] neatly follows the intuition that inserting a full-fence instruction in between two instructions will order the Memory Effects generated by these instructions.

As an example, inserting dmb sy instructions in between the write-to-read pairs of the tests SB and R suffices to suppress their non-SC behaviours:

AArch64 SB+dmb.sys
{
0:X1=x; 0:X2=y;
1:X1=y; 1:X2=x;
}
 P0          | P1          ;
 MOV W0,#1   | MOV W0,#1   ;
 STR W0,[X1] | STR W0,[X1] ;
 DMB SY      | DMB SY      ;
 LDR W3,[X2] | LDR W3,[X2] ;
exists (0:X3=0 /\ 1:X3=0)
    
AArch64 R+po+dmb.sy
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X2=x;
}
 P0          | P1          ;
 MOV W0,#1   | MOV W0,#2   ;
 STR W0,[X1] | STR W0,[X1] ;
 MOV W2,#1   | DMB SY      ;
 STR W2,[X3] | LDR W3,[X2] ;
exists ([y]=2 /\ 1:X3=0)
 
    

Question

The model aarch64-tso is much too strong for AArch64, as the actual AArch64 model allows all the six basic tests. Hence, we need to weaken it in order to allow those basic tests. To that aim, as a first step, we simply define an empty Locally-ordered-before relation, resulting in the model aarch64-weak.cat.

"AArch64, (too) weak" include "cos.cat" // SC per-location acyclic po&loc|rf|co|fr as internal let lob = 0 let obs = rfe|co|fr let rec ob = obs | lob | ob;ob irreflexive ob as external

This new model is barely usable as programmers have very little order guarantees to rely upon. At least, the effect of some fence instructions have to be accounted for. Here is the transliteration of a stronger Locally-observed-before relation.

An Effect E1 is Barrier-ordered-before an Effect E2 if and only if one of the following applies:

An Effect E1 is Locally-ordered-before an Effect E2 if and only if E1 is Barrier-ordered-before E2.

Implement the new ⁠lob relation, resulting in the new model model aarch64-fence.cat. The tool herd pre-defines the fence Effect sets DMB.SY, DMB.ST, DMB.LD, which are generated by the approprtiate fence instructions.

Check your model by comparing its output with the reference file.

% herd7 -cat aarch64-fence.cat @fence > FEN.txt
% mcompare7  FEN-REF.txt FEN.txt
Solution.

Question

Many languages (such as C) and architectures provide another mean to restore order: special store and load instructions, sometimes called “atomic”, that impact the memory model.

The architecture AArch64 provides Store-Release (Effect set ⁠L) and two varieties of Load-Acquire: Acquire (Effect set ⁠A) and the weaker Acquire-PC (Effect set ⁠Q).

Here is the English transliteration of the memory model impact of the new Effect sets, as additions to the definition of ⁠bob:

An Effect E1 is Barrier-ordered-before an Effect E2 if and only if one of the following applies:

One may notice that atomic instructions have a “half-barrier” behaviour, as Memory Write Release Effects order Effects before them in program order, while Memory Read Acquire Effects order Effects after them in program order.

Extend your model aarch64-fence.cat (or ours) as a new model aarch64-atomic.cat that accounts for the existence of the atomic instructions.

Check your model by comparing its output with the reference file.

% herd7 -cat aarch64-atomic.cat > ATO.txt
% mcompare7  ATO-REF.txt ATO.txt
Solution.

This document was translated from LATEX by HEVEA.