Part III
Simulating memory models with herd7

The tool herd7 is a memory model simulator. Users may write simple, single events, axiomatic models of their own and run litmus tests on top of their model. The herd7 distribution already includes some models.

The authors of herd7 are Jade Alglave and Luc Maranget.

14 Writing simple models

This section introduces cat, our language for describing memory models. The cat language is a domain specific language for writing and executing memory models. From the language perspective, cat is loosely inspired by OCaml. That is, it is a functional language, with similar syntax and constructs. The basic values of cat are sets of events, which include memory events but also additional events such as fence events, and relations over events.

14.1 Sequential consistency

The simulator herd7 accepts models written in text files. For instance here is sc.cat, the definition of the sequentially consistent (SC) model in the partial-order style:

SC

include "fences.cat"
include "cos.cat"

(* Atomic *)
empty rmw & (fre;coe) as atom

(* Sequential consistency *)
show sm\id as si
acyclic po | ((fr | rf | co);sm) as sc

The model above illustrates some features of model definitions:

A model file starts with a tag (here SC), which can also be a string (in double quotes) in case the tag includes special characters or spaces.
Pre-defined bindings. Here po (program order) and rf (read from) are pre-defined. The remaining two communication relations (co and fr) are computed by the included file cos.cat, which we describe later — See Sec. 14.4. For simplicity, we may as well assume that co and fr are pre-defined.
The computation of new relations from other relations, and their binding to a name with the let construct. Here, a new relation com is the union “|” of the three pre-defined communication relations.
The performance of some checks. Here the relation “po | com” (i.e. the union of program order po and of communication relations) is required to be acyclic. Checks can be given names by suffixing them with “as name”. This last feature will be used in Sec. 15.2

One can then run some litmus test, for instance SB (for Store Buffering, see also Sec. 1.1), on top of the SC model:

% herd7 -model ./sc.cat SB.litmus
Test SB Allowed
States 3
0:EAX=0; 1:EAX=1;
0:EAX=1; 1:EAX=0;
0:EAX=1; 1:EAX=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:EAX=0 /\ 1:EAX=0)
Observation SB Never 0 3
Hash=7dbd6b8e6dd4abc2ef3d48b0376fb2e3

The output of herd7 mainly consists in the list of final states that are allowed by the simulated model. Additional output relates to the test condition. One sees that the test condition does not validate on top of SC, as “No” appears just after the list of final states and as there is no “Positive” witness. Namely, the condition “exists (0:EAX=0 /\ 1:EAX=0)” reflects a non-SC behaviour, see Sec. 14.1.

The simulator herd7 works by generating all candidate executions of a given test. By “candidate execution” we mean a choice of events, program order po, of the read-from relation rf and of final writes to memory (last write to a given location)⁶. In the case of the SB example, we get the following four executions:

Indeed, there is no choice for the program order po, as there are no conditional jumps in this example; and no choice for the final writes either, as there is only one store per location, which must be co-after the initial stores (pictured as small red dots). Then, there are two read events from locations x and y respectively, which take their values either from the initial stores or from the stores in program. As a result, there are four possible executions. The model sc.cat gets executed on each of the four candidate executions. The three first executions are accepted and the last one is rejected, as it presents a cycle in po | fr. On the following diagram, the cycle is obvious:

14.2 Total Store Order (TSO)

However, the non-SC execution shows up on x86 machines, whose memory model is TSO. As TSO relaxes the write-to-read order, we attempt to write a TSO model tso-00.cat, by simply removing write-to-read pairs from the acyclicity check:

"A first attempt for TSO"

include "cos.cat"

(* Communication relations that order events*)
let com-tso = rf | co | fr
(* Program order that orders events *)
let po-tso = po & (W*W | R*M)

(* TSO global-happens-before *)
let ghb = po-tso | com-tso
acyclic ghb as tso
show ghb

This model illustrates several features of model definitions:

New predefined sets: W, R and M, which are the sets of write events, read events and of memory events, respectively.
The cartesian product operator “*” that returns the cartesian product of two event sets as a relation.
The intersection operator “&” that operates on sets and relations.

As a result, the effect of the declaration let po-tso = po & (W*W | R*M) is to define po-tso as the program order on memory events minus write-to-read pairs.

We run SB on top of the tentative TSO model:

% herd7 -model tso-00.cat SB.litmus
Test SB Allowed
States 4
0:EAX=0; 1:EAX=0;
0:EAX=0; 1:EAX=1;
0:EAX=1; 1:EAX=0;
0:EAX=1; 1:EAX=1;
Ok
Witnesses
Positive: 1 Negative: 3
...

The non-SC behaviour is now accepted, as write-to-read po-pairs do not participate to the acyclicity check any more. In effect, this allows the last execution above, as ghb (i.e. po-tso | com-tso) is acyclic.

However, our model tso-00.cat is flawed: it is still to strict, forbidding some behaviours that the TSO model should accept. Consider the test SB+rfi-pos, which is test STFW-PPC for X86 from Sec. 1.3 with a normalised name (see Sec. 12.1). This test targets the following execution:

Namely the test condition exists (0:EAX=1 /\ 0:EBX=0 /\ 1:EAX=1 /\ 1:EBX=0) specifies that Thread 0 writes 1 into location x, reads the value 1 from the location x (possibly by store forwarding) and then reads the value 0 from the location y; while Thread 1 writes 1 into y, reads 1 from y and then reads 0 from x. Hence, this test derives from the previous SB by adding loads in the middle, those loads being satisfied from local stores. As can be seen by running the test on top of the tso-00.cat model, the target execution is forbidden:

% herd7 -model tso-00.cat SB+rfi-pos.litmus
Test SB+rfi-pos Allowed
States 15
0:EAX=0; 0:EBX=0; 1:EAX=0; 1:EBX=0;
...
0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=1;
No
Witnesses
Positive: 0 Negative: 15
..

However, running the test with litmus demonstrates that the behaviour is observed on some X86 machine:

% arch
x86_64
% litmus7 -mach x86 SB+rfi-pos.litmus
...
Test SB+rfi-pos Allowed
Histogram (4 states)
11589 *>0:EAX=1; 0:EBX=0; 1:EAX=1; 1:EBX=0;
3993715:>0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=0;
3994308:>0:EAX=1; 0:EBX=0; 1:EAX=1; 1:EBX=1;
388   :>0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=1;
Ok

Witnesses
Positive: 11589, Negative: 7988411
Condition exists (0:EAX=1 /\ 0:EBX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
...

As a conclusion, our tentative TSO model is too strong. The following diagram pictures its ghb relation:

One easily sees that ghb is cyclic, whereas it should not. Namely, the internal read-from relation rfi does not create global order in the TSO model. Hence, rfi is not included in ghb. We rephrase our tentative TSO model, resulting into the new model tso-01.cat:

"A second attempt for TSO"

include "cos.cat"

(* Communication relations that order events*)
let com-tso = rfe | co | fr
(* Program order that orders events *)
let po-tso = po & (W*W | R*M)

(* TSP global-happens-before *)
let ghb = po-tso | com-tso
acyclic ghb
show ghb

As can be observed above rfi (internal read-from) is no longer included in ghb. However, rfe (external read-from) still is. Notice that rfe and rfi are pre-defined.

As intended, this new tentative TSO model allows the behaviour of test SB+rfi-pos:

%  herd7 -model tso-01.cat SB+rfi-pos.litmus
Test SB+rfi-pos Allowed
States 16
...
0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=0;
...
Ok
Witnesses
Positive: 1 Negative: 15
...

And indeed, the global-happens-before relation is no-longer cyclic:

We are not done yet, as our model is too weak in two aspects. First, it has no semantics for fences. As a result the test SB+mfences is allowed, whereas it should be forbidden, as this is the very purpose of the fence mfence.

One easily solves this issue by first defining the mfence that relates events with a MFENCE event po-in-between them; and then by adding mfence to the definition of po-tso:

let mfence = po & (_ * MFENCE) ; po
let po-tso = po & (W*W | R*M) | mfence

Notice how the relation mfence is defined from two pre-defined sets: “_” the universal set of all events and MFENCE the set of fence events generated by the X86 mfence instruction. An alternative, more precise definition, is possible:

let mem-to-mfence = po & M * MFENCE
let mfence-to-mem = po & MFENCE * M
let mfence = mem-to-mfence; mfence-to-mem

This alternative definition of mfence, although yielding a smaller relation, is equivalent to the original one for our purpose of checking ghb acyclicity.

But the resulting model is still too weak, as it allows some behaviours that any model must reject for the sake of single thread correctness. The following test CoRWR illustrates the issue:

X86 CoRWR
{ }
 P0          ;
 MOV EAX,[x] ;
 MOV [x],$1  ;
 MOV EBX,[x] ;
exists (0:EAX=1 /\ 0:EBX=0)

The test final condition targets the following execution candidate:

The TSO check “acyclic po-tso|com-tso” does not suffice to reject two absurd behaviours pictured in the execution diagram above: (1) the read a is allowed to read from the po-after write b, as rfi is not included in com-tso; and (2) the read c is allowed to read the initial value of location x although the initial write d is co-before the write b, since po & (W * R) is not in po-tso.

For any model, we rule out those very untimely behaviours by the so-called uniproc check that states that executions projected on events that access one variable only are SC. In practice, having defined po-loc as po restricted to events that touch the same address (i.e. as po & loc), we further require the acyclicity of the relation po-loc|fr|rf|co. In the TSO case, the uniproc check can be somehow simplified by considering only the cycles in po-loc|fr|rf|co that are not already rejected by the main check of the model. This amounts to design specific checks for the two relations that are not global in TSO: rfi and po & (W*R). Doing so, we finally produce a correct model for TSO tso-02.cat:

"A third attempt for TSO"

include "cos.cat"

(* Uniproc check specialized for TSO *)
irreflexive po-loc & (R*W); rfi as uniprocRW
irreflexive po-loc & (W*R); fri as uniprocWR

(* Communication relations that order events*)
let com-tso = rfe | co | fr

(* Program order that orders events *)
let mfence = po & (_ * MFENCE) ; po
let po-tso = po & (W*W | R*M) | mfence

(* TSP global-happens-before *)
let ghb = po-tso | com-tso
show mfence,ghb
acyclic ghb as tso

This last model illustrates another feature of cat: herd7 may also performs irreflexivity checks with the keyword “irreflexive”.

14.3 Sequential consistency, total order definition

We now illustrate another style of model. We consider the original definition of sequential consistency [3]. An execution is SC when there exists a total (strict) order S on events such that:

S includes the program order po;
and read events read from the most recent write events in the past, i.e. a read r from location x reads the value stored by the S-maximal write amongst those writes to location x that S-precede r.

So we could just generate all total strict orders amongst events, and filter those “scheduling order candidates” according to the two rules above.

Things are a bit more complex in herd7, due to the presence of initial and final writes. Up to now we have ignored those writes, we are now going to integrate them explicitly.

Initial writes are write events that initialise the memory locations. Initial writes are not generated by the instructions of the test. Instead, they are created by herd7 machinery, and are available from model text as the set IW.

Final writes may be generated by program instructions, and, when such, they must be ordered by S. A final write is a write to a phantom read performed once program execution is over. The constraint on final writes originates from herd7 technique to enumerate execution candidates: actual execution candidates also include a choice of final writes for the locations that are observed in the test final condition⁷. As test outcome (i.e. the final values of observed locations) is settled before executing the model, it is important not to accept executions that yield a different outcome. Doing so may validate outcomes that should be rejected. In practice, the final write w_f to location x must follow all other writes to x in S. Considering that the set of final writes is available to cat models as the pre-defined set FW, the constraint on final writes can be expressed as a relation:

let preSC = loc & (W \ FW) * FW

Where loc is a predefined relation that relates all events that access the same location.

By contrast with final writes, initial writes are not generated by program instructions, and it is possible not to order them completely. In particular, it is not useful to order initial writes to different locations, nor the initial write to location x with any access to location y. Notice that we could include initial writes in S as we did for final writes. Not doing so will slightly improve efficiency.

Finally, the strict order S is not just any order on events, it is a topological order of the events generated by threads (implemented as the set ~IW. i.e. the complement of the set of initial writes) that extends the pre-order preSC. We can generate all such topological orders with the cat primitive linearisations:

let allS = linearisations(~IW,preSC)

The call linearisations(E,r), where E is a set of events and r is a relation on events, returns the set of all total strict orders defined on E that extend r. Notice that if r is cyclic, the empty set is returned.

We now need to iterate over the set allS. We do so with the with construct:

with S from allS

It is important to notice that the construct above extends the current execution candidate (i.e. a choice of events, plus a choice of two relations po and rf) with a candidate strict order S. In other words, the scope of the iteration is the remainder of the model text. Once model execution terminates for a choice of S (some element of allS), model execution restarts just after the with construct, with variable S bound to the next choice picked in allS.

As a first consistency check, we check that S includes the program order:

empty po \ S as PoCons

Notice that, to check for inclusion, we test the emptiness of relation difference (operator “\”).

It remains to check that the rf relation of the execution candidate is the same as the one defined by condition 2. To that aim, we complement S with the constraint over initial writes that must precede all events to their location:

let S = S | loc & IW * (M \ IW)

Observe that S is no longer a total strict order. However, it is still a total strict order when restricted to events that access a given location, which is all that matters for condition 2 to give a value to all reads. As regards our SC model, we define rf-S the read-from relation induced by S as follows:

let WRS = W * R & S & loc  (* Writes from the past, same location *)
let rf-S = WRS \ (S;WRS)   (* Most recent amongst them *)

The definition is a two-step process: we first define a relation WRS from writes to reads (to the same location) that follow them in S. Observe that, by complementing S with initial writes, we achieve that for any read r there exists at least a write w such that (w,r) ∈ WRS. It then remains to filter out non-maximal writes in WRS as we do in the definition of rf-S, by the means of the difference operator “\”. We then check the equality of rf (pre-defined as part of the candidate execution) and of rf-S by double inclusion:

empty rf \ rf-S as RfCons
empty rf-S \ rf as RfCons

As an example, he show six attempts of po compatible S orders for the non-SC outcome of the test SB in figure 5.

Figure 5: Failed attempts of SC scheduling orders S.

Observe that all attempts fail as rf and rf-S are different in all diagrams.

We also show all successful SC scheduling in figure 6.

Figure 6: SC executions of test SB.

For reference we provide our complete model lamport.cat

"SC, L. Lamport style"

(* writes to location x precede final write to location x *)
let preSC =  loc & (W \ FW) * FW

(* Compute the set of total orders that extend preSC on program events *)
let allS = linearisations(~IW,preSC)

(* For all such orders *)
with S from allS

(* Check compatibility with po *)
empty po \ S as ScPo

(* Add initial writes *)
let S = S | loc & (IW * (M \ IW))


(* Define most recent write in the past  *)
let WRS = W * R & S & loc  (* Writes from the past, same location *)
let rf-S = WRS \ (S;WRS)   (* Most recent amongst them *)

(* Check equality with rf *)
empty rf \ rf-S as RfCons
empty rf-S \ rf as RfCons

14.4 Computing coherence orders

All the models seen so far include the file cos.cat that define “coherence relations”, written co. This section describes a possible computation of coherence orders⁸. It can be skipped in first reading, as users may find sufficient to include the file.

For a given location x the coherence order is a total strict order on the write events to location x. The coherence relation co is the union of those orders for all locations. In this section, we show how to compute all possible coherence orders for a candidate execution. We seize the opportunity to introduce advanced features of the cat language, such as functions and pattern matching over sets.

Possible coherence orders for a given location x are not totally arbitrary in two aspects:

The write events to location x include the initial write event to location x. The initial write to x must come first in any coherence order for x.
One of the writes to x performed by the test (may) have been declared to be final by herd7 machinery prior to model execution. In that case, the final write to x must come last in any coherence order for x.

See Sec. 14.3 for details on initial and final writes.

We can express the two conditions above for all locations of the program as a relation co0:

let co0 = loc & (IW*(W\IW)|(W\FW)*FW)

Where the pre-defined sets IW and FW are the sets of all initial and final writes respectively.

Then, assuming that W_x is the set of all writes to location x, one can compute the set of all possible coherence orders for x with the linearisations primitive as linearisations(W_x,co0). In practice, we define a function that takes the set W_x as an argument:

let makeCoX(Wx) = linearisations(Wx,co0)

The linearisations primitive is introduced in Sec. 14.3. It returns all topological sorts of the events of the set Wx that are compatible with the relation co0.

In fact, we want to compute the set of all possible co relations, i.e. all the unions of all the possible coherence orders for all locations x. To that end we use another cat primitive: partition(S), which takes a set of events as argument and returns a set of set of events T = {S₁,…,S_n}, where each S_i is the set of all events in S that act on location L_i, and, of course S is the union ∪_i=1ⁱ⁼ⁿ S_i. Hence we shall compute the set of all Wx sets as partition(W), where W is the pre-defined set of all writes (including initial writes).

For combining the effect of the partition and linearisations primitives, we first define a map function⁹ that, given a set S= {e₁,…,e_n} and a function f, returns the set {f(e₁),…,f(e_n)}:

let map f  =
  let rec do_map S = match S with
  || {} -> {}
  || e ++ S -> f e ++ do_map S
  end in
  do_map

The map function is written in curried style. That is one calls it as map f S, parsed as (map f) S. More precisely, the left-most function call (map f) returns a function. Here it returns do_map with free variable f being bound to the argument f. The definition of map illustrate several new features:

The empty set constant “{}”, and the set addition operator e ++ S that returns the set S augmented with element e.
Recursive function definitions. The function do_map is recursive as it calls itself.
Pattern matching on sets. This construct, similar to OCaml pattern matching on lists, discriminates between empty (|| {} -> e₀) and non-empty (|| e ++ es -> e₁) sets. In the second case of a non-empty set, the expression e₁ is evaluated in a context extended with two bindings: a binding from the variable e to an arbitrary element of the matched set, and a binding from the variable es to the matched set minus the arbitrary element.

Then, we generate the set of all possible coherence orders for all locations x as follows:

let allCoX = map makeCoX (partition(W))

Notice that allCoX is a set of sets of relations, each element being the set of all possible coherence orders for a specific x.

We still need to generate all possible co relations, that is all unions of the possible coherence orders for all locations x. It can be done by another cat function: cross, which takes a set of sets S = {S₁, S₂, …, S_n} as argument and returns all possible unions built by picking elements from each of the S_i:

⎧
⎨
⎩

  e₁ ⋃ e₂ ⋃ ⋯ ⋃ e_n ∣ e₁ ∈ S₁, e₂ ∈ S₂, …, e_n ∈ S_n  

⎫
⎬
⎭

One may notice that if S is empty, then cross should return one relation exactly: the empty relation, i.e. the neutral element of the union operator. This choice for cross(∅) is natural when we define cross inductively:

cross(S₁ ++S) =

∪

e₁ ∈ S₁, t ∈ cross(S)

⎧
⎨
⎩

e₁ ⋃ t

⎫
⎬
⎭

In the definition above, we simply build cross(S₁ ++S) by building the set of all unions of one relation e₁ picked in S₁ and of one relation t picked in cross(S).

So as to write cross, we first define a classical fold function over sets: given a set S = { e₁, e₂, …, e_n}, an initial value y₀ and a function f that takes a pair (e,y) as argument, fold computes:

f (e_i₁,f (e_i₂, …, f(e_{i_n},y₀)))

where i₁, i₂, …, i_n defines a permutation of the indices 1, 2, …, n.

let fold f =
  let rec fold_rec (es,y) = match es with
  || {} -> y
  || e ++ es -> fold_rec (es,f (e,y))
  end in
  fold_rec

The function fold is written in the same curried style as map. Notice that the inner function fold_rec takes one argument. However this argument is a pair. As a gentle example of fold usage, we could have defined map as:

let map f = fun S -> fold (fun (e,y) -> f e ++ y) (S,{})

This example also introduce “anonymous” functions.

As a more involved example of fold usage, we write the function cross.

let rec cross S = match S with
  || {} -> { 0 } (* 0 is the empty relation *)
  || S1 ++ S ->
      let ts = cross S in
      fold
        (fun (e1,r) -> map (fun t -> e1 | t) ts | r)
        (S1,{})
  end

The function cross is a recursive function over a set (of sets). Its code follows the inductive definition given above.

Finally, we generate all possible co relations by:

let allCo = cross allCoX

Computation goes on by iterating over allCo using the with x from S construct:

with co from allCo

See Sec. 14.3 for details on this construct.

Once co has been defined, one defines fr and internal and external variations:

(* From now, co is a coherence relation *)
let coi = co & int
let coe = co & ext

(* Compute fr *)
let fr = rf^-1 ; co
let fri = fr & int
let fre = fr & ext

The pre-defined relation ext (resp. int) relates events generated by different (resp. the same) threads.

15 Producing pictures of executions

The simulator herd7 can be instructed to produce pictures of executions. Those pictures are instrumental in understanding and debugging models. It is important to understand that herd7 does not produce pictures by default. To get pictures one must instruct herd7 to produce pictures of some executions with the -show option. This option accepts specific keywords, its default being “none”, instructing herd7 not to produce any picture.

A frequently used keyword is “prop” that means “show the executions that validate the proposition in the final condition”. Namely, the final condition in litmus test is a quantified boolean proposition as for instance “exists (0:EAX=0 /\ 1:EAX=0)” at the end of test SB.

But this is not enough, users also have to specify what to do with the picture: save it in file in the DOT format of the graphviz graph visualization software, or display the image,¹⁰ or both. One instructs herd7 to save images with the -o dirname option, where dirname is the name of a directory, which must exists. Then, when processing the file name.litmus, herd7 will create a file name.dot into the directory dirname. For displaying images, one uses the -gv option.

As an example, so as to display the image of the non-SC behaviour of SB, one should invoke herd7 as:

% herd7 -model tso-02.cat -show prop -gv SB.litmus

As a result, users should see a window popping and displaying this image:

Notice that we got the PNG version of this image as follows:

% herd7 -model tso-02.cat -show prop -o /tmp SB.litmus
% dot -Tpng /tmp/SB.dot -o SB+CLUSTER.png

That is, we applied the dot tool from the graphviz package, using the appropriate option to produce a PNG image.

One may observe that there are ghb arrows in the diagram. This results from the show ghb instruction at the end of the model file tso-02.cat.

15.1 Graph modes

The image above much differs from the one in Sec. 14.2 that describes the same execution and that is reproduced in Fig. 7

Figure 7: The non-SC behaviour of SB is allowed by TSO

In effect, herd7 can produce three styles of pictures, dot clustered pictures, dot free pictures, and neato pictures with explicit placement of the events of one thread as a column. The style is commanded by the -graph option that accepts three possible arguments: cluster (default), free and columns. The following pictures show the effect of graph styles on the SB example:

-graph cluster	-graph free	-graph columns

Notice that we used another option -squished true that much reduces the information displayed in nodes. Also notice that the first two pictures are formatted by dot, while the rightmost picture is formatted by neato.

One may also observe that the “-graph columns” picture does not look exactly like Fig. 7. For instance the ghb arrows are thicker in the figure. There are many parameters to control neato (and dot), many of which are accessible to herd7 users by the means of appropriate options. We do not intend to describe them all. However, users can reproduce the style of the diagram of this manual using yet another feature of herd7: configuration files that contains settings for herd7 options and that are loaded with the -conf name option. In this manual we mostly used the doc.cfg configuration file. As this file is present in herd7 distribution, users can use the diagram style of this manual:

% herd7 -conf doc.cfg ...

15.2 Showing forbidden executions

Images are produced or displayed once the model has been executed. As a consequence, forbidden executions won’t appear by default. Consider for instance the test SB+mfences, where the mfence instruction is used to forbid SB non-SC execution. Running herd7 as

% herd7 -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus

will produce no picture, as the TSO model forbids the target execution of SB+mfences.

To get a picture, we can run SB+mfences on top of the minimal model, a pre-defined model that allows all executions:

% herd7 -model minimal -conf doc.cfg -show prop -gv SB+mfences.litmus

And we get the picture:

It is worth mentioning again that although the minimal model allows all executions, the final condition selects the displayed picture, as we have specified the -show prop option.

The picture above shows mfence arrows, as all fence relations are displayed by the minimal model. However, it does not show the ghb relation, as the minimal model knows nothing of it. To display ghb we could write another model file that would be just as tso-02.cat, with checks erased. The simulator herd7 provides a simpler technique: one can instruct herd7 to ignore either all checks (-through invalid), or a selection of checks (-skipchecks name₁,…,name_n). Thus, either of the following two commands

% herd7 -through invalid -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus
% herd7 -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus

will produce the picture we wish:

Notice that mfence and ghb are displayed because of the instruction “show mfence ghb” (fence relation are not shown by default); while -skipcheck tso works because the tso-02.cat model names its main check with “as tso”.

The image above is barely readable. For such graphs with many relations, the cluster and free modes are worth a try. The commands:

% herd7 -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -graph cluster -gv SB+mfences.litmus
% herd7 -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -graph free -gv SB+mfences.litmus

will produce the images:

Namely, command line options are scanned left-to-right, so that most of the settings of doc.cfg are kept¹¹ (for instance thick ghb arrows), while the graph mode is overridden.

16 Model definitions

We describe our cat language for defining models. The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (like this). Non-terminal symbols are set in italic font (like that). A vertical bar …∣ … denotes alternative. Square brackets […] denote optional components. Curly brackets {…} denotes zero, one or several repetitions of the enclosed components. Parentheses (…) denote grouping.

Model source files may contain comments of the OCaml type ((*…*), can be nested), or line comments starting with “//” and running until end of line.

16.1 Overview

The cat language is much inspired by OCaml, featuring immutable bindings, first-class functions, pattern matching, etc. However, cat is a domain specific language, with important differences from OCaml.

Base values are specialised, they are sets of events and relations over events. There are also tags, akin to C enumerations or OCaml “constant” constructors and first class functions. Moreover, events can be extracted from sets and pair of events (element of relations) from relations.
There are two structured values: tuples of values and sets of values. One should notice that primitive set of events and structured set of events are not the same thing. In fact, the language prevents the construction of structured set of events. Similarily, there are no structured sets of elements of relations, there are only relations.
There is a distinction between expressions that evaluate to some value, and instructions that are executed for their effect.

A model, or cat program is a sequence of instructions. At startup, pre-defined identifiers are bound to event sets and relations over events. Those pre-defined identifiers describe a candidate execution (in the sense of the memory model). Executing the model means allowing or forbidding that candidate execution.

16.2 Identifiers

letter	::=	a … z ∣ A … Z

digit	::=	0 … 9

id	::=	letter { letter ∣ digit ∣ _ ∣ . ∣ - }

Identifiers are rather standard: they are a sequence of letters, digits, “_” (the underscore character), “.” (the dot character) and “-” (the minus character), starting with a letter. Using the minus character inside identifiers may look a bit surprising. We did so as to allow identifiers such as po-loc.

At startup, pre-defined identifiers are bound to event sets and to relations between events.

Those pre-defined identifiers first describe the events of the candidate execution as various sets, as described by the first table of figure 8.

Figure 8: Pre-defined event sets.

identifier name description

emptyset empty set of events

W write events

R read events

M memory events we have M = W ∪ R

IW initial writes feed reads that read from the initial state

FW final writes writes that are observed at the end of test execution

B branch events

RMW read-modify-write events

F fence events

NAME specific fence events those depend on the test architecture

architecture fence sets

X86 MFENCE, SFENCE, LFENCE

PPC SYNC, LWSYNC, EIEIO, ISYNC

ARM DMB, DMB.ST, DSB, DSB.ST, ISB

MIPS SYNC

AArch64 DMB.SY, DMB.ST, DMB.ST, …DSB.SY, DSB.ST, DSB.ST, …

Specific fence event sets depend on the test architecture, their name is always uppercase and derives from the mnemonic of the instruction that generates them. The second table of figure 8 shows a (non-exhaustive) list.

Other pre-defined identifiers are relations. Most of those are the program order po and its refinements:

identifier	name	description

po	program order	instruction order lifted to events
addr	address dependency	the address of the second event depends on the value loaded by the first (read) event
data	data dependency	the value stored by the second (write) event depends on the value loaded by the first (read) event
ctrl	control dependency	the second event is in a branch controlled by the value loaded by the commfirst (read) event
rmw	read-exclusive write-exclusive pair	relate the read and write events emitted by matching successful load-reserve store conditional instructions, or atomic rmw instructions.
amo	atomic modify	relate the read and write events emitted by atomic rmw instructions.

Finally, a few pre-defined relations describe the execution candidate structure and write-to-read communication:

identifier	name	description

id	identity	relates each event to itself
loc	same location	events that touch the same address
ext	external	events from different threads
int	internal	events from the same thread
rf	read-from	links a write w to a read r taking its value from w

Some additional relations are defined by library files written in the cat language, see Sec. 16.7.

16.3 Expressions

Expressions are evaluated by herd7, yielding a value.

expr	::=	0
	∣	id
	∣	tag
	∣	( ) ∣ ( expr , expr { , expr } )
	∣	{ } ∣ { expr { , expr } }
	∣	expr * ∣ expr + ∣ expr ? ∣ expr ^-1
	∣	~ expr
	∣	[ expr ]
	∣	expr \| expr ∣ expr ++ expr ∣ expr ; expr ∣ expr \ expr ∣ expr & expr ∣ expr * expr
	∣	expr expr
	∣	fun pat -> expr
	∣	let [ rec ] binding { and binding } in expr
	∣	match expr with clauses end
	∣	( expr ) ∣ begin expr end
	∣	instructions id [ taglist ]



tag	::=	’ id

taglist	::=	tag , taglist

pat	::=	id ∣ ( ) ∣ ( id { , id } )

binding	::=	valbinding ∣ funbinding

valbinding	::=	id = expr

funbinding	::=	id pat = expr



clauses	::=	tagclauses ∣ setclauses

tagclauses	::=	[ \|\| ] tag -> expr { \|\| tag -> expr } [ _ -> expr ]

setclauses	::=	[ \|\| ] { } -> expr \|\| id ++ id -> expr

Simple expressions

Simple expressions are the empty relation (keyword 0), identifiers id and tags tag. Identifiers are bound to values, either before the execution (see pre-defined identifiers in Sec. 16.2), or by the model itself. Tags are constants similar to C enum values or OCaml constant constructors. Tags must be declared with the enum instruction. We go back to enum and tags in Sec. 16.4 and 16.4.

Tuples

Tuples include a constant, the empty tuple (), and constructed tuples ( expr₁ , expr₁ ,… , expr_n), with n ≥ 2. In other words there is no tuple of size one. Syntax ( expr ) denotes grouping and has the same value as expr.

Explicit sets of values

Explicit sets are written as the comma separated list of their elements between curly braces: { expr₁ , expr₁ ,… , expr_n}, with n ≥ 0. Sets are homogenous, in the sense that sets hold elements of the same type.

In case the values expr_k are events, the result will be a primitive event set. In case they are elements of relations, the result will be a relation (and not a set of event pairs).

The empty set {} is the empty set of events and the empty relation.

Operator expressions

Most operators are overloaded and apply to event sets, relations and explicit sets. However, by nature, some operators apply to relations only (as sequence and transitive closure below) or to sets only (as the cartesian product). Additionally, an event in a context where an event set is expected will be promoted to a singleton event set. The situation of an elementary relation in a relation context is similar.

The transitive and reflexive-transitive closure of an expression are performed by the postfix operators + and *. The postfix operator ^-1 performs relation inversion. The construct expr? (option) evaluates to the union of expr value and of the identity relation. Notice that postfix operators operate on relations only.

There is one prefix operator ~ that performs relation and set complement.

Finally, there is one last unary operator: [expr] that evaluate expr to an event set and returns the identity relation over this set.

Infix operators are | (union), ++ (set addition), ; (sequence), & (intersection), \ (set difference), and * (cartesian product). Infix operators are listed in order of increasing precedence, while postfix and prefix operators bind tighter than infix operators. All infix operators are right-associative, except set difference which is left-associative, and cartesian product which is non-associative.

The union, intersection and difference operators apply to relations and all kinds of sets. The cartesian product takes two event sets as arguments and returns a relation.

The addition operator expr₁ ++ expr₂ operates on sets: the value of expr₂ must be a set of values (or an event, or an element of relation) S and the operator returns the set S augmented with the value of expr₁ (or a new event set, or a new relation). By exception, the arguments to the addition operator can also be an elementary relation. It then yields a relation.

For the record, given two relations r₁ and r₂, the sequence r₁; r₂ is defined as { (x,y) ∣ ∃ z, (x,z) ∈ r₁ ∧ (z,y) ∈ r₂}.

Function calls

Functions calls are written expr₁ expr₂. That is, functions are of arity one and the application operator is left implicit. Notice that function application binds tighter than all binary operators and looser that postfix operators. Furthermore the implicit application operator is left-associative.

The cat language has call-by-value semantics. That is, the effective parameter expr₂ is evaluated before being bound to the function formal parameter(s).

N-ary functions can be encoded either using tuples as arguments or by curryfication (i.e. as functions that return functions). Considering binary functions, in the former case, a function call is written expr₁ ( expr₂ , expr₃); while in the latter case, a function call is written expr₁ expr₂ expr₃ (which by left-associativity, is to be understood as (expr₁ expr₂) expr₃). The two forms of function call are not interchangeable, using one or the other depends on the definition of the function.

Functions

Functions are first class values, as reflected by the anonymous function construct fun pat -> expr. A function takes one argument only.

In the case where this argument is a tuple, it may be destructured by the means of a tuple pattern. That is pat above is ( id₁ , … id_n). For instance here is a function that takes a tuple of relations (or sets) as argument and return their symmetric difference:

fun (a,b) -> (a\b)|(b\a)

Functions have the usual static scoping semantics: variables that appear free in function bodies (expr above) are bound to the value of such free variable at function creation time. As a result one may also write the symmetric difference function as follows:

fun a -> fun b -> (a\b)|(b\a)

Local bindings

The local binding construct let [ rec ] bindings in expr binds the names defined by bindings for evaluating the expression expr. Both non-recursive and recursive bindings are allowed. The function binding id pat = expr is syntactic sugar for id = fun pat -> expr.

The construct

let pat₁ = expr₁ and … and pat_n = expr_n in expr

evaluates expr₁,…, expr_n, and binds the names in the patterns pat₁,…, pat_n to the resulting values. The bindings for pat = expr are as follows: if pat is (), then expr must evaluate to the empty tuple; if pat is id or (id), then id is bound to the value of expr; if pat is a proper tuple pattern (id₁,… ,id_n) with n ≥ 2, then expr must evaluate to a tuple value of size n (v₁,…,v_n) and the names id₁,…,id_n are bound to the values v₁,…,v_n. By exception, in the case n=2, the expression expr may evaluate to an elementary relation. If so, the value behaves as a tuple of arity two.

The construct

let rec pat₁ = expr₁ and … and pat_n = expr_n in expr

computes the least fixpoint of the equations pat₁ = expr₁,…, pat_n = expr_n. It then binds the names in the patterns pat₁,…, pat_n to the resulting values. The least fixpoint computation applies to set and relation values, (using inclusion for ordering); and to functions (using the usual definition ordering).

Pattern matching over tags

The syntax for pattern matching over tags is:

match expr with tag₁ -> expr₁ || ⋯ || tag_n -> expr_n || _ -> expr_d end

The value of the match expression is computed as follow: first evaluate expr to some value v, which must be a tag t. Then v is compared with the tags tag₁,…,tag_n, in that order. If some tag pattern tag_i equals t, then the value of the match is the value of the corresponding expression expr_i. Otherwise, the value of the match is the value of the default expression expr_d. As the default clause _ -> expr_d is optional, the match construct may fail.

Pattern matching over sets

The syntax for pattern matching over sets is:

match expr with {} -> expr₁ || id₁ ++ id₂ -> expr₂ end

The value of the match expression is computed as follow: first evaluate expr to some value v, which must be a set of values. If v is the empty set, that the value of the match is the value of the corresponding expression expr₁. Otherwise, v is a non-empty set, then let v_e be some element in v and v_r be the set v minus the element v_e. The value of the match is the value of expr₂ in a context where id₁ is bound to v_e and id₂ is bound to v_r.

The construct also also applies to primitive event sets and relations. If the matched expression is non-empty, then the bound element is an event and an element of relation, respectively. One easily rebuilds an event set (or a relation) for instance by using the singleton construct as { id₁ }.

Parenthesised expressions

The expression (expr) has the same value as expr. Notice that a parenthesised expression can also be written as begin expr end.

16.4 Instructions

Instruction are executed for their effect. There are three kinds of effects: adding new bindings, checking a condition, and specifying relations that are shown in pictures.

instruction	::=	let [ rec ] binding { and binding }
	∣	[ flag ] check expr [ as id]
	∣	enum id = [ \|\| tag { \|\| tag }
	∣	procedure id pat = { instruction } end
	∣	call id expr [ as id]
	∣	show expr as id
	∣	show id { , id }
	∣	unshow id { , id }
	∣	forall id in expr do { instruction } end
	∣	with id from expr
	∣	include string
	∣	if variant string { instruction } [ else { instruction } ] end

check	::=	checkname ∣ ~ checkname

checkname	::=	acyclic ∣ irreflexive ∣ empty

Bindings

The let and let rec constructs bind value names for the rest of model execution. See the subsection on bindings in Section 16.3 for additional information on the syntax and semantics of bindings.

Recursive definitions computes fixpoints of relations. For instance, the following fragment computes the transitive closure of all communication relations:

let com = rf | co | fr
let rec complus = com | (complus ; complus)

Notice that the instruction let complus = (rf|co|fr)+ is equivalent. Notice that herd7 assumes that recursive definitions are well-formed, i.e. that they yield an increasing functional. The result of ill-formed definitions is undefined.

Although herd7 features recursive functions, those cannot be used to compute a transitive closure, due to the lack of some construct say to test relation equality. Nevertheless, one can write a generic transitive closure function by using a local recursive binding:

let tr(r) = let rec t = r | (t;t) in t

Again, notice that the instruction let tr (r) = r+ is equivalent.

Thanks to pattern matching constructs, recursive functions are useful to compute over sets (and tags). For instance here is the definition of a function power that compute power sets:

let rec power S = match S with
|| {} -> { {} }
|| e ++ S ->
    let rec add_e RR = match RR with
    || {} -> { }
    || R ++ RR -> R ++ (e ++ R) ++ add_e RR
    end in
    add_e (power S)
end

Checks

The construct

check expr

evaluates expr and applies the check check. There are six checks: the three basic acyclicity (keyword acyclic), irreflexivity (keyword irreflexive) and emptyness (keyword empty); and their negations. If the check succeeds, execution goes on. Otherwise, execution stops.

The performance of a check can optionally be named by appending as id after it. The feature permits not to perform some checks at user’s will, thanks to the -skipchecks id command line option.

A check can also be flagged, by prefixing it with the flag keyword. Flagged checks must be named with the as construct. Failed flagged checks do not stop execution. Instead successful flagged checks are recorded under their name, for herd7 machinery to handle flagged executions later. Flagged checks are useful for models that define conditions over executions that impact the semantics of the whole program. This is typically the case of data races. Let us assume that some relation race has been defined, such that an non-empty race relation in some execution would make the whole program undefined. We would then write:

flag ~empty race as undefined

Then, herd7 will indicate in its output that some execution have been flagged as undefined.

Procedure definition and call

Procedures are similar to functions except that they have no results: the body of a procedure is a list of instructions and the procedure will be called for the effect of executing those instructions. Intended usage of procedures is to define checks that are executed later. However, the body of a procedure may consist in any kind of instructions. Notice that procedure calls can be named with the as keyword. The intention is to control the performance of procedure calls from the command line, exactly as for checks (see above).

As an example of procedure, one may define the following uniproc procedure with no arguments:

procedure uniproc() =
  let com = fr | rf | co in
  acyclic com | po
end

Then one can perform the acyclicity check (see previous section) by executing the instruction:

call uniproc()

As a result the execution will stop if the acyclicity check fails, or continue otherwise.

Procedures are lexically scoped as functions are. Additionally, the bindings performed during the execution of a procedure call are discarded when the procedure returns, all other effects performed (namely flags and shows) are retained.

Show (and unshow) directives

The constructs:

show id { , id } and unshow id { , id }

take (non-empty, comma separated) lists of identifiers as arguments. The show construct adds the present values of identifiers for being shown in pictures. The unshow construct removes the identifiers from shown relations.

The more sophisticated construct

show expr as id

evaluates expr to a relation, which will be shown in pictures with label id. Hence show id can be viewed as a shorthand for show id as id

Iteration over sets

The forall iteration construct permits the iteration of checks (in fact of any kind of instructions) over a set. Syntax is:

forall id in expr do instructions end

The expression expr must evaluate to a set S. Then, the list of instructions instructions is executed for all bindings of the name id to some element of S. In practice, as failed checks stop execution, this amounts to check the conjunction of the checks performed by instructions for all the elements of S. Similarly to procedure calls, the bindings performed during the execution of an iteration are discarded at iteration ends, all other effects performed are retained.

Candidate execution extension

This construct permits the extension of the current candidate execution by one binding. Syntax is with id from expr. The expression expr is evaluated to a set S. Then the remainder of the model is executed for each choice of element e in S in a context extended by a binding of the name id to e. An example of the construct usage is described in Sec. 14.3.

Model inclusion

The construct include "filename" is interpreted as the inclusion of the model contained in the file whose name is given as an argument to the include instruction. In practice, the list of instructions defined by the included model file are executed. The string argument is delimited by double quotes “"”, which, of course, are not part of the filename. Files are searched according to herd7 rules — see Sec. 17.4.

Inclusion is performed only once. Subsequent include instructions are not executed and a warning is issued.

Conditional execution

The conditional instruction allows some control over model execution by setting variants. Variants are predefined tags set on the command line or in configuration files.

The construct if variant "tag" instructions₁ else instructions₂ end behaves as follows: if the variant tag is set, then the (possibly empty) list of instructions instructions₁ is executed, otherwise the (optional, possibly empty) list of instructions instructions₂ is executed. If tag is not a recognised variant tag, a warning is issued and the non-existent variant is assumed to be unset.

Bell extensions

Users can attain more genericity in their models by defining a bell file, as an addendum, or rather preamble, to a cat file.

Enumerations

The enum construct defines a set of enumerated values or tags. Syntax is

enum id = tag₁ || ⋯ || tag_n

The construct has two main effects. It first defines the tags tag₁,…,tag_n. Notice that tags do not exist before being defined, that is evaluating the expression tag is an error without a prior enum that defines the tag tag. Tags are typed in the sense that they belong to the tag type id and that tags from different types cannot be members of the same set. The second effect of the construct is to define a set of tags id as the set of all tags listed in the construct. That is, the enum construct performs the binding of id to { tag₁,…,tag_n}.

Scopes are a special case of enumeration: the construct enum scopes must be used to define hierarchical models such as Nvidia GPUs.

An enum scopes declaration must be paired with two functions narrower and wider that implement the hierarchy amongst scopes. For example:

enum scopes = 'discography || 'I || 'II || 'III || 'IV

let narrower(t) = match t with
  || 'discography -> {'I, 'II, 'III, 'IV}
end

let wider(t) = match t with
  || 'I -> 'discography
  || 'II -> 'discography
  || 'III -> 'discography
  || 'IV -> 'discography
end

Here we define five scopes, where the first one, discography, is wider than all the other ones.

Instructions

The predefined sets of events W, R, RMW, F, and B can be annotated with user-defined tags (see Sec. 16.4).

The construct :

instructions id[ taglist ]

takes the identifier of a pre-defined set and a possibly empty, square bracketed list of tags.

The primitive tag2instrs yields, given a tag ’t, the set of instructions bearing the annotation t that was previously declared in an enumeration type.

The primitive tag2scope yields, given a tag ’t, the relation between instructions TODO

16.5 Models

model	::=	[ id ] [ model-comment ] { instruction }

model-comment	::=	id ∣ string

A model is a list of instructions preceded by an optional architecture specification and an optional comment. Architecture specification is a name that follows herd7 conventions for identifiers. Identifiers that are not valid architecture names, such as AArch64, PPC etc., are silently ignored. The following model-comment can be either a name or a string enclosed in double quotes “"”. If only one name is present, it will act both as tentative architecture specification and as a comment. When present, model architecture specifications will be checked against test architectures, see also option -archcheck.

Models operate on candidate executions (see Sec. 16.2), instructions are executed in sequence, until one instruction stops, or until the end of the instruction list. In that latter case, the model accepts the execution. The accepted execution is then passed over to the rest of herd7 engine, in order to collect final states of locations and to display pictures. Notice that the model-comment will appear in those picture legends.

16.6 Primitives

TODO:

16.7 Library

Standard library

The standard library is a cat file stdlib.cat which all models include by default. It defines a few convenient relations that are thus available to all models.

identifier	name	description

po-loc	po restricted to the same address	events are in po and touch the same address, namely po ∩ loc
rfe	external read-from	read-from by different threads, namely rf ∩ ext
rfi	internal read-from	read-from by the same thread, namely rf ∩ int

Coherence orders

For most models, a complete list of communication relations would also include co and fr. Those can be defined by including the file cos.cat (see Sec. 14.4).

identifier	name	description

co	coherence	total strict order over writes to the same address
fr	from-read	links a read r to a write w′ co-after the write w from which r takes its value
coi, fri	internal communications	communication between events of the same thread
coe, fre	external communications	communication between events of different threads

Notice that the internal and external sub-relations of co and fr are also defined.

Fences

Fence relations denote the presence of a specific fence (or barrier) in-between two events. Those can be defined by including architecture specific files.

file		relations

x86fences.cat		mfence, sfence, lfence
ppcfences.cat		sync, lwsync, eieio, isync, ctrlisync
armfences.cat		dsb, dmb, dsb.st, dmb.st, isb, ctrlisb
mipsfences.cat		sync
aarch64fences.cat		…

In other words, models for, say, ARM machines should include the following instruction:

include "armfences.cat"

Notice that for the Power (PPC) (resp. ARM) architecture, an additional relation ctrlisync (res. ctrlisb) is defined. The relation ctrlisync reads control +isync. It means that the branch to the instruction that generates the second event additionally contains a isync fence preceding that instruction. For reference, here is a possible definition of ctrlisync:

let ctrlisync  = ctrl & (_ * ISYNC); po

One may define all fence relations by including the file fences.cat. As a result, fence relations that are relevant to the architecture of the test being simulated are properly defined, while irrelevant fence relations are the empty relation. This feature proves convenient for writing generic models that apply to several concrete architectures.

17 Usage of herd7

17.1 Arguments

The command herd7 handles its arguments like litmus7. That is, herd7 interprets its argument as file names. Those files are either a single litmus test when having extension .litmus, or a list of file names when prefixed by @.

17.2 Options

There are many command line options. We describe the more useful ones:

General behaviour

-version: Show version number and exit.
-libdir: Show installation directory and exit.
-I <name>: Add directory name at the end of herd search path. If the given directory starts with + it is taken relative to herd7 library directory.
-v: Be verbose, can be repeated to increase verbosity.
-q: Be quiet, suppress any diagnostic message.
-conf <name>: Read configuration file name. Configuration files have a very simple syntax: a line “opt arg” has the same effect as the command-line option “-opt arg”.
-o <dest>: Output files into directory <dest>. Notice that <dest> must exist. At the moment herd7 may output one .dot file per processed test: the file for test base.litmus is named base.dot. By default herd7 does not generate .dot files.
-suffix <suf>: Change the name of .dot files into basesuff.dot. Useful when several .dot files derive from the same test. Default is the empty string (no suffix).
-gv: Fork the gv Postscript viewer to display execution diagrams.
-evince: Fork the evince document viewer to display execution diagrams. This option provides an alternative to the gv viewer.
-dumpes <bool>: Dump generated event structures and exit. Default is false. Event structures will be dumped in a .dot file whose name is determined as usual — See options -o and -suffix above. Optionally the event structures can be displayed with the -gv option.
-unroll <int>: The setting -unroll n performs backwards jumps n times. This is a workaround for one of herd7 main limitations: herd7 does not really handle loops. Default is 2.
-hexa <bool>: Print numbers in hexadecimal. Default is false (numbers are printed in decimal).

Engine control

The main purpose of herd7 is to run tests on top of memory models. For a given test, herd7 performs a three stage process:

Generate candidate executions.
For each candidate execution, run the model. The model may reject or accept the execution.
For each candidate execution that the model accepts, record observed locations and, if so instructed, a diagram of the execution.

We now describe options that control those three stages.

-model (cav12|minimal|uniproc|<filename>.cat)

Select model, this option accepts one tag or one file name with extension .cat. Tags instruct herd7 to select an internal model, while file names are read for a model definition. Documented model tags are:

cav12, the model of [4] (Power);
minimal, the minimal model that allows all executions;
uniproc, the uniproc model that checks single-thread correctness.

In fact, herd7 accepts potentially infinitely many models, as models can be given in text files in an adhoc language described in Sec. 16. The herd7 distribution includes several such models: minimal.cat and uniproc.cat are the text file versions of the homonymous internal models, but may produce pictures that show different relations. The herd7 distribution also includes models for a variety of architectures and several models for the C language. Model files are searched according to the same rules as configuration files. Some architectures have a default model: arm.cat model for ARM, ppc.cat model for PPC, x86tso.cat for X86, and aarch64.cat for AArch64 (ARMv8).

-through (all|invalid|none)

Let additional executions reach the final stage of herd7 engine. This option permits users to generate pictures of forbidden executions, which are otherwise rejected at an early stage of herd7 engine — see Sec. 15.2. Namely, the default “none” let only valid (according to the active model) executions through. The behaviour of this option differs between internal and text file models:

For internal models: the tag all let all executions go through; while the tag invalid will reject executions that violate uniproc, while letting other forbidden execution go through.
Text file models: the tags all and invalid let all executions go through. For such models, a more precise control over executions that reach herd7 final phase can be achieved with the option -skipcheck — see next option.

Default is none.

-skipchecks <name₁,…,name_n>

This option applies to text file models. It instructs herd7 to ignore the outcomes of the given checks. For the option to operate, checks must be named in the model file with the as name construct – see Sec. 16.4. Notice that the arguments to -skipcheck options cumulate. That is, “-skipcheck name₁ -skipcheck name₂” acts like “-skipcheck name₁,name₂”.

-strictskip <bool>

Setting this option (-strictskip true), will change the behaviour of the previous option -skipcheck: it will let executions go through when the skipped checks yield false and the unskipped checks yield true. This option comes in handy when one wants to observe the executions that fail one (or several) checks while passing others. Default is false.

-optace <bool>

Optimise the axiomatic candidate execution stage. When enabled by -optace true, herd7 does not generate candidate executions that fail the uniproc test. The default is “true” for internal models (except the minimal model), and “false” for text file models. Notice that -model uniproc.cat and -model minimal.cat -optace true should yield identical results, the second being faster. Setting -optace true can lower the execution time significantly, but one should pay attention not to design models that forget the uniproc condition.

-archcheck <bool>

Control compatibily check of test and cat file architecture. More precisely, some cat files are specific to an architecture. Running tests of a different architecture on them will result in cryptic error messages. In that situation, compatibility check will yield a more understandable diagnostic. Default is true.

-show (prop|neg|all|cond|wit|none)

Select execution diagrams for picture display and generation. Execution diagrams are shown according to the final condition of test. The final condition is a quantified boolean proposition exists p, ~exists p, or forall p. The semantics of recognised tags is as follows:

prop Picture executions for which p is true.
neg Picture executions for which p is false.
all Picture all executions.
cond Picture executions that validate the condition, i.e. p is true for exists and forall, and false for ~exists.
wit Picture “interesting” executions, i.e. p is true for exists and ~exists, and false for forall.
none Picture no execution.

Default is none.

-initwrites <bool>

Represent init writes as plain write events, default is false except for specifically tagged generic models — see “Model options” in Sec. 16.5.

-variant <tag₁,…,tag_n>

Activate variation(s) of models. Most variations are minor change in instruction semantics, are used for experimental purpose and remain undocumented. With the exception of tag “mixed” that command mixed-size mode. Mixed-size mode is appropriate for mixed-size tests that perform accesses of different sizes, as well as indexed accesses.

Discard some observations

Those options intentionally omit some of the final states that herd7 would normally generate.

-speedcheck (false|true|fast): When enabled by -speedcheck true or -speedcheck fast, attempt to settle the test condition. That is, herd7 will generate a subset of executions (those named “interesting” above) in place of all executions. With setting -speedcheck fast, herd7 will additionally stop as soon as a condition exists p is validated, and as soon as a condition ~exists p or forall p is invalidated. Default is false.
-nshow <int>: Stop once <int> pictures have been collected. Default is to collect all (specified, see option -show) pictures.

Control dot pictures

These options control the content of DOT images.

We first describe options that act at the general level.

-graph (cluster|free|columns)

Select main mode for graphs. See Sec. 15.1. The default is cluster.

-dotmode (plain|fig)

The setting -dotmode fig produces output that includes the proper escape sequence for translating .dot files to .fig files (e.g. with dot -Tfig…). Default is plain.

-dotcom (dot|neato|circo)

Select the command that formats graphs displayed by the -gv option. The default is dot for the cluster and free graph modes, and neato for the columns graph mode.

-showevents (all|mem|noregs)

Control which events are pictured:

all Picture all events.
mem Picture memory events.
noregs Picture all events except register events, i.e. memory, fences and branch events.

Default is noregs.

-showinitwrites <bool>

Show initial write events (when existing, see option -initwrites) in pictures. Default is true.

-mono <bool>

The setting -mono true commands monochrome pictures. This option acts upon default color selection. Thus, it has no effect on colors given explicitly with the -edgeattr option.

-scale <float>

Global scale factor for graphs in columns mode. Default is 1.0.

-xscale <float>

Global scale factor for graphs in columns mode, x direction. Default is 1.0.

-yscale <float>

Global scale factor for graphs in columns mode, y direction. Default is 1.0.

-showthread <bool>

Show thread numbers in figures. In cluster mode where the events of a thread are clustered, thread cluster have a label. In free mode po edges are suffixed by a thread number. In columns mode, columhs have a header node that shows the thread number. Default is true.

-texmacros <bool>

Use latex commands in some text of pictures. If activated (-showthread true), thread numbers are shown as \myth{n}. Assembler instructions are locations in nodes are argument to an \asm command. It user responsability to define those commands in their L^AT_EX documents that include the pictures. Possible definitions are \newcommand{\myth}[1]{Thread~#1} and \newcommand{\asm}[1]{\texttt{#1}}. Default is false.

A few options control picture legends.

-showlegend <bool>: Add a legend to pictures. By default legends show the test name and a comment from the executed model. This comment is the first item of model syntax — see Sec 16.5. Default is true.
-showkind <bool>: Show test kind in legend. The kind derive from the quantifier of test final condition, kind Allow being exists, kind Forbid being ~exists, and kind Require being forall. Default is false.
-shortlegend <bool>: Limit legend to test name. Default is false.

A few options control what is shown in nodes and on their sizes, i.e. on how events are pictured.

-squished <bool>: The setting -squished true drastically limits the information displayed in graph nodes. This is usually what is wanted in modes free and columns. Default is false.
-fixedsize <bool>: This setting is meaningfull in columns graph mode and for squished nodes. When set by -fixedsize true it forces node width to be 65% of the space between columns. This may sometime yield a nice edge routing. Default is false
-extrachars <float>: This setting is meaningful in columns graph mode and for squished nodes. When the size of nodes is not fixed (i.e. -fixedsize false and default), herd7 computes the width of nodes by counting characters in node labels and scaling the result by the font size. The setting -extrachars v commands adding the value v before scaling. Negative values are of course accepted. Default is 0.0.
-showobserved <bool>: Highlight observed memory read events with stars “*”. A memory read is observed when the value it reads is stored in a register that appears in final states. Default is false.
-brackets <bool>: Show brackets around locations. Default is false.

Then we list options that offer some control on which edges are shown. We recall that the main controls over the shown and unshown edges are the show and unshow directives in model definitions — see Sec. 16.4. However, some edges can be controlled only with options (or configuration files) and the -unshow option proves convenient.

-showpo <bool>: Show program order (po) edges. Default is true. Default is false.
-showinitrf <bool>: Show read-from edges from initial state. Default is false.
-showfinalrf <bool>: Show read-from edges to the final state, i.e show the last store to locations. Default is false. i.e show the last store to locations. Default is false.
-showfr <bool>: Show from-read edges. Default is true.
-doshow <name₁,…,name_n>: Do show edges labelled with name₁,…,name_n. This setting applies when names are bound in model definition.
-unshow <name₁,…,name_n>: Do not show edges labelled with name₁,…,name_n. This setting applies at the very last moment and thus cancels any show directive in model definition and any -doshow command line option.

Other options offer some control over some of the attributes defined in Graphviz software documentation. Notice that the controlled attributes are omitted from DOT files when no setting is present. For instance in the absence of a -spline <tag> option, herd7 will generate no definition for the splines attribute thus resorting to DOT tools defaults. Most of the following options accept the none argument that restores their default behaviour.

-splines (spline|true|line|false|polyline|ortho|curved|none): Define the value of the splines attribute. Tags are replicated in output files as the value of the attribute, except for none.
-margin <float|none>: Specifies the margin attribute of graphs.
-pad <float|none>: Specifies the pad attribute of graphs.
-sep <string|none>: Specifies the sep attribute of graphs. Notice that the argument is an arbitrary string, so as to allow DOT general syntax for this attribute.
-fontname <string|none>: Specifies the graph fontname attribute.
-fontsize <int|none>: Specifies the fontsize attribute n of all text in the graph.
-edgefontsizedelta <int>: option -edgefontsizedelta m sets the fontsize attributes of edges to n+m, where n is the argument to the -fontsize option. Default is 0. This option has no effect if fontsize is unset.
-penwidth <float|none>: Specifies the penwidth attribute of edges.
-arrowsize <float|none>: Specifies the arrowsize attribute of edges.
-edgeattr <label,attribute,value>: Give value value to attribute attribute of all edges labelled label. This powerful option permits alternative styles for edges. For instance, the ghb edges of the diagrams of this document are thick purple (blueviolet) arrows thanks to the settings: -edgeattr ghb,color,blueviolet -edgeattr ghb,penwidth,3.0 -edgeattr ghb,arrowsize,1.2. Notice that the settings performed by the -edgeattr option override other settings. This option has no default.

Change input

Those options are the same as the ones of litmus7 — see Sec. 4.

-names <file>: Run herd7 only on tests whose names are listed in <file>.
-rename <file>: Change test names.
-kinds <file>: Change test kinds. This amonts to changing the quantifier of final conditions, with kind Allow being exists, kind Forbid being ~exists and kind Require being forall.
-conds <file>: Change the final condition of tests. This is by far the most useful of these options: in combination with option -show prop it permits a fine grain selection of execution pictures.

17.3 Configuration files

The syntax of configuration files is minimal: lines “key arg” are interpreted as setting the value of parameter key to arg. Each parameter has a corresponding option, usually -key, except for the single letter option -v whose parameter is verbose.

As command line option are processed left-to-right, settings from a configuration file (option -conf) can be overridden by a later command line option. Configuration files will be used mostly for controling pictures. Some configuration files are are present in the distribution. As an example, here is the configuration file apoil.cfg, which can be used to display images in free mode.

#Main graph mode
graph free
#Show memory events only
showevents memory
#Minimal information in nodes
squished true
#Do not show a legend at all
showlegend false

The configuration above is commented with line comments that starts with “#”. The above configuration file comes handy to eye-proof model output, even for relatively complex tests, such as IRIW+lwsyncs and IRIW+syncs:

% herd7 -conf apoil.cfg -show prop -doshow prop -gv IRIW+lwsyncs.litmus
% herd7 -through invalid -conf apoil.cfg -show prop -doshow prop -gv IRIW+syncs.litmus

In the command lines above, -show prop instructs herd7 to produce images of the executions that validate the final test condition, while -doshow prop instructs herd7 to display the relation named “prop”. In spite of the unfortunate name clashes, those are not to be confused…

We run the two tests on top of the default Power model that computes, amongst others, a prop relation. The model rejects executions with a cyclic prop. One can then see that the relation prop is acyclic for IRIW+lwsyncs and cyclic for IRIW+syncs:

Notice that we used the option -through invalid in the case of IRIW+syncs as we would otherwise have no image.

17.4 File searching

Configuration and model files are searched first in the current directory; then in the search path; then in any directory specified by setting the shell environment variable HERDLIB; and then in herd installation directory, which is defined while compiling herd7.

6: Alternatively, we may adopt the simpler view that a candidate execution includes a choice of all communication relations.
7: Doing so permits pruning executions that are irrelevant to the test final condition, see herd7 option -speedcheck
8: The actual computation of coherence orders slightly differs. Interested readers can look at the cross.cat file.
9: Function map is defined at the end of herd7standard library stdlib.cat
10: This option requires the Postscript visualiser gv.
11: The setting of showthread is also changed, by the omitted -showthread true command line option

architecture		fence sets

X86		MFENCE, SFENCE, LFENCE
PPC		SYNC, LWSYNC, EIEIO, ISYNC
ARM		DMB, DMB.ST, DSB, DSB.ST, ISB
MIPS		SYNC
AArch64		DMB.SY, DMB.ST, DMB.ST, …DSB.SY, DSB.ST, DSB.ST, …

Part III Simulating memory models with herd7

14 Writing simple models

14.1 Sequential consistency

14.2 Total Store Order (TSO)

14.3 Sequential consistency, total order definition

14.4 Computing coherence orders

15 Producing pictures of executions

15.1 Graph modes

15.2 Showing forbidden executions

16 Model definitions

16.1 Overview

16.2 Identifiers

16.3 Expressions

Simple expressions

Tuples

Explicit sets of values

Operator expressions

Function calls

Functions

Local bindings

Pattern matching over tags

Pattern matching over sets

Parenthesised expressions

16.4 Instructions

Bindings

Checks

Procedure definition and call

Show (and unshow) directives

Iteration over sets

Candidate execution extension

Model inclusion

Conditional execution

Bell extensions

Enumerations

Instructions

16.5 Models

16.6 Primitives

16.7 Library

Standard library

Coherence orders

Fences

17 Usage of herd7

17.1 Arguments

17.2 Options

General behaviour

Engine control

Discard some observations

Control dot pictures

Change input

17.3 Configuration files

17.4 File searching

Part III
Simulating memory models with herd7