Part III
Simulating memory models with herd

The tool herd is a memory model simulator. Users may write simple, single events, axiomatic models of their own and run litmus tests on top of their model. The herd distribution already includes some models.

The authors of herd are Jade Alglave and Luc Maranget.

12 Writing simple models

12.1 Sequential consistency

The simulator herd accepts models written in text files. For instance here is sc.cat, the definition of the sequentially consistent (SC) model:

SC
(* Atomic *)
empty atom & (fre;coe) as atom

(* Sequential consistency *)
acyclic po | fr | rf | co as sc

The model above illustrate two features of model definitions:

The computation of new relations from other relations, and their binding to a name with the let construct. Here, a new relation com is the union “|” of the three pre-defined communication relations.
The peformance of some checks. Here the relation “po|com” (i.e. the union of program order po and of communication relations) is required to be acyclic.

We postpone the discussion of the show instruction, see Sec. 13.

One can then run some litmus test, for instance SB (for Store Buffering, see also Sec. 1.1), on top of the SC model:

% herd -model ./sc.cat SB.litmus
Test SB Allowed
States 3
0:EAX=0; 1:EAX=1;
0:EAX=1; 1:EAX=0;
0:EAX=1; 1:EAX=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:EAX=0 /\ 1:EAX=0)
Observation SB Never 0 3
Hash=7dbd6b8e6dd4abc2ef3d48b0376fb2e3

The output of herd mainly consists in the list of final states that are allowed by the simulated model. Additional output relates to the test condition. One sees that the test condition does not validate on top of SC, as “No” appears just after the list of final states and as there is no “Positive” witness. Namely, the condition “exists (0:EAX=0 /\ 1:EAX=0)” reflects a non-SC behaviour, see Sec. 5.2.

The simulator herd works by generating all executions of a given test. By “execution” we mean a choice of events, program order po, read-from rf, and coherence orders co⁴. In the case of the SB example, we get the following four executions:

Indeed, there is no choice for the program order po, as there are no conditional jumps in this example; and no choice for the coherence orders co either, as there is only one store per location, which must be co-after the initial stores (not pictured for clarity). Then, there are two read events from locations x and y respectively, which take their values either from the initial stores or from the stores in program. As a result, there are four possible executions. The model sc.cat gets executed on each of the four (candidate) executions. The three first executions are accepted and the last one is rejected, as it presents a cycle in po ∪ fr. The following diagram pictures the ghb relation. The cycle is obvious:

12.2 Total Store Order (TSO)

However, the non-SC execution shows up on x86 machines, whose memory model is TSO. As TSO relaxes the write-to-read order, we attempt to write a TSO model tso-00.cat, by simply removing write-to-read pairs from the acyclicity check:

"A first attempt for TSO"

(* Communication relations that order events*)
let com-tso = rf | co | fr
(* Program order that orders events *)
let po-tso = WW(po) | RM(po)

(* TSP global-happens-before *)
let ghb = po-tso | com-tso
acyclic ghb
show ghb

This model illustrates another feature of model definitions: filters such as WW and RM restrict their argument by selecting some sort of events. As W stands for write events, R for read events and M for all memory events, the effect of let po-tso = WW(po)|RM(po) is to define po-tso as the program order minus write-to-read pairs.

We run SB on top of the tentative TSO model:

% herd -model tso-00.cat SB.litmus 
Test SB Allowed
States 4
0:EAX=0; 1:EAX=0;
0:EAX=0; 1:EAX=1;
0:EAX=1; 1:EAX=0;
0:EAX=1; 1:EAX=1;
Ok
Witnesses
Positive: 1 Negative: 3
...

The non-SC behaviour is now accepted, as write-to-read po-pairs do not participate to the acyclicity check any more. In effect, this allows the last execution above, as ghb (i.e. po-tso ∪ com-tso) is acyclic.

However, our model tso-00.cat is flawed: it is still to strict, forbidding some behaviours that the TSO model should accept. Consider the test SB+rfi-pos, which is test STFW-PPC for X86 from Sec. 1.3 with a normalised name (see Sec. 10.1). This test targets the following execution:

Namely the test condition exists (0:EAX=1 /\ 0:EBX=0 /\ 1:EAX=1 /\ 1:EBX=0) specifies that Thread 0 writes 1 into location x, reads the value 1 from the location x (possibly by store forwarding) and then reads the value 0 from the location y; while Thread 1 writes 1 into y, reads 1 from y and then reads 0 from x. Hence, this test derives from the previour SB by adding loads in the middle, those loads being satisfied from local stores. As can be seen by running the test on top of the tso-00.cat model, the target execution is forbidden:

% herd -model tso-00.cat SB+rfi-pos.litmus 
Test SB+rfi-pos Allowed
States 15
0:EAX=0; 0:EBX=0; 1:EAX=0; 1:EBX=0;
...
0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=1;
No
Witnesses
Positive: 0 Negative: 15
..

However, running the test with litmus demonstrates that the behaviour is observed on some X86 machine:

% arch
x86_64
% litmus -mach x86 SB+rfi-pos.litmus
...
Test SB+rfi-pos Allowed
Histogram (4 states)
11589 *>0:EAX=1; 0:EBX=0; 1:EAX=1; 1:EBX=0;
3993715:>0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=0;
3994308:>0:EAX=1; 0:EBX=0; 1:EAX=1; 1:EBX=1;
388   :>0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=1;
Ok

Witnesses
Positive: 11589, Negative: 7988411
Condition exists (0:EAX=1 /\ 0:EBX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
...

As a conclusion, our tentative TSO model is too strong. The following diagram pictures its ghb relation:

One easily sees that ghb is cyclic, wheras it should not. Namely, the internal read-from relation rfi does not create global order in the TSO model. Hence, rfi is not included in ghb. We rephrase our tentative TSO model, resulting into the new model tso-01.cat:

"A second attempt for TSO"

(* Communication relations that order events*)
let com-tso = rfe | co | fr
(* Program order that orders events *)
let po-tso = WW(po) | RM(po)

(* TSP global-happens-before *)
let ghb = po-tso | com-tso
acyclic ghb
show ghb

As can be observed above rfi (internal read-from) is no longer included in ghb. However, rfe (external read-from) still is.

As intended, this new tentative TSO model allows the behaviour of test SB+rfi-pos:

%  herd -model tso-01.cat SB+rfi-pos.litmus
Test SB+rfi-pos Allowed
States 16
...
0:EAX=1; 0:EBX=1; 1:EAX=1; 1:EBX=0;
...
Ok
Witnesses
Positive: 1 Negative: 15
...

And indeed, the global-happens-before relation is no-longer cyclic:

We are not done yet, as our model is too weak in two aspects. First, it has no semantics for fences. As a result the test SB+mfences is allowed, whereas it should be forbidden, as this is the very purpose of the fence mfence.

One easily solves this issue by adding the mfence relation to the definition of po-tso:

let po-tso = WW(po) | RM(po) | mfence

But the resulting model is still too weak, as it allows some behaviours that any model must reject for the sake of single thread correctness. The following test CoRWR illustrates the issue:

X86 CoRWR
{ }
 P0          ;
 MOV EAX,[x] ;
 MOV [x],$1  ;
 MOV EBX,[x] ;
exists (0:EAX=1 /\ 0:EBX=0)

The TSO check “acyclic po-tso|com-tso” does not suffice to reject two absurd behaviours pictured in the execution diagram above: (1) the read a is allowed to read from the po-after write b, as rfi is not included in com-tso; and (2) the read c is allowed to read the initial value of location x although this (unpictured) write is co-before the write b, as WR(po) is not in po-tso.

For any model, we rule out those very untimely behaviours by the so-called uniproc check that states that executions projected on events that access one variable only are SC. In practice, having defined po-loc as po restricted to events that touch the same address, we further require the acyclicity of the relation po-loc|fr|rf|co. In the TSO case, the uniproc check can be somehow simplified by considering only the cycles in po-loc|fr|rf|co that are not already rejected by the main check of the model. This amounts to design specific checks for the two relations that are not global in TSO: rfi and WR(po). Doing so, we finally produce a correct model for TSO tso-02.cat:

"A third attempt for TSO"

(* Uniproc check specialized for TSO *)
irreflexive RW(po-loc);rfi as uniprocRW
irreflexive WR(po-loc);fri as uniprocWR

(* Communication relations that order events*)
let com-tso = rfe | co | fr
(* Program order that orders events *)
let po-tso = WW(po) | RM(po) | mfence

(* TSP global-happens-before *)
let ghb = po-tso | com-tso
show mfence ghb
acyclic ghb as tso

This last model illustrates other features of herd: herd may also performs irreflexivity checks with the keyword “irreflexive”; and the checks can be given names by suffixing them with “as name”. This last feature will be used in Sec. 13.2

13 Producing pictures of executions

The simulator herd can be instructed to produce pictures of executions. Those pictures are instrumental in understanding and debugging models. It is important to understand that herd does not produce pictures by default. To get pictures one must instruct herd to produce pictures of some executions with the -show option. This option accepts specific keywords, its default being “none”, instructing herd not to produce any picture.

A frequentlty used keyword is “prop” that means “show the executions that validate the proposition in the final condition”. Namely, the final condition in litmus test is a quantified boolean proposition as for instance “exists (0:EAX=0 /\ 1:EAX=0)” at the end of test SB.

But this is not enough, users also have to specify what to do with the picture: save it in file in the DOT format of the graphviz graph visualization software, or display the image,⁵ or both. One instructs herd to save images with the -o dirname option, where dirname is the name of a directory, which must exists. Then, when processing the file name.litmus, herd will create a file name.dot into the directory dirname. For displaying images, one uses the -gv option.

As an example, so as to display the image of the non-SC behaviour of SB, one should invoke herd as:

% herd -model tso-02.cat -show prop -gv SB.litmus

As a result, users should see a window popping and displaying this image:

Notice that we got the PNG version of this image as follows:

% herd -model tso-02.cat -show prop -o /tmp SB.litmus
% dot -Tpng /tmp/SB.dot -o SB+CLUSTER.png

That is, we applied the dot tool from the graphviz package, using the appropriate option to produce a PNG image.

One may observe that there are ghb arrows in the diagram. This results from the show ghb instruction at the end of the model file tso-02.cat.

13.1 Graph modes

The image above much differs from the one in Sec. 12.2 that describes the same execution and that is reproduced in Fig. 3

Figure 3: The non-SC behaviour of SB is allowed by TSO

In effect, herd can produce three styles of pictures, dot clustered pictures, dot free pictures, and neato pictures with explicit placement of the events of one thread as a colum. The style is commanded by the -graph option that accepts three possible arguments: cluster (default), free and columns. The following pictures show the effect of graph styles on the SB example:

-graph cluster	-graph free	-graph columns

Notice that we used another option -squished true that much reduces the information displayed in nodes. Also notice that the first two pictures are formatted by dot, while the rightmost picture is formatted by neato.

One may also observe that the “-graph columns” picture does not look exactly like Fig. 3. For instance the ghb arrows are thicker in the figure. There are many parameters to control neato (and dot), many of which are accessible to herd users by the means of appropriate options. We do not intend to describe them all. However, users can reproduce the style of the diagram of this manual using yet another feature of herd: configuration files that contains settings for herd options and that are loaded with the -conf name option. In this manual we mostly used the doc.cfg configuration file. As this file is present in herd distribution, users

can use the diagram style of this manual:

% herd -conf doc.cfg ...

13.2 Showing forbidden executions

Images are produced or displayed once the model has been executed. As a consequence, forbidden executions won’t appear by default. Consider for instance the test SB+mfences, where the mfence instruction is used to forbid SB non-SC execution. Runing herd as

% herd -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus

will produce no picture, as the TSO model forbids the target execution of SB+mfences.

To get a picture, we can run SB+mfences on top of the mininal model, a pre-defined model that allows all executions:

% herd -model minimal -conf doc.cfg -show prop -gv SB+mfences.litmus

And we get the picture:

It is worth mentioning again that although the minimal model allows all executions, the final condition selects the displayed picture, as we have specified the -show prop option.

The picture above shows mfence arrows, as all fence relations are displayed by the minimal model. However, it does not show the ghb relation, as the minimal model knows nothing of it. To display ghb we could write another model file that would be just as tso-02.cat, with checks erased. The simulator herd provides a simpler technique: one can instruct herd to ignore either all checks (-through invalid), or a selection of checks (-skipcheck name₁… name_n). Thus, either of the following two commands

% herd -through invalid -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus
% herd -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -gv SB+mfences.litmus

will produce the picture we wish:

Notice that mfence and ghb are displayed because of the instruction “show mfence ghb” (fence relation are not shown by default); while -skipcheck tso works because the tso-02.cat model names its main check with “as tso”.

The image above is barely readable. For such graphs with many relations, the cluster and free modes are worth a try. The commands:

% herd -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -graph cluster -gv SB+mfences.litmus
% herd -skipcheck tso -model tso-02.cat -conf doc.cfg -show prop -graph free -gv SB+mfences.litmus

will produce the images:

Namely, command line options are scanned left-to-right, so that most of the settings of doc.cfg are kept⁶ (for instance thick ghb arrows), while the graph mode is overriden.

14 Model definitions

We describe our langage for defining models. The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (like this). Non-terminal symbols are set in italic font (like that). An unformatted vertical bar …∣ … denotes alternative. Square brackets […] denote optional components. Curly brackets {…} denotes zero, one or several repetitions of the enclosed components.

Model source files may contain comments of the OCaml type ((*…*), can be nested), or line comments starting with “#” and running until end of line.

14.1 Identifiers

letter	::=	a … z ∣ A … Z

digit	::=	0 … 9

id	::=	letter { letter ∣ digit ∣ _ ∣ . ∣ - }

Identifiers are rather standard: they are a sequence of letters, digits, “_” (the underscore character), “.” (the dot character) and “-” (the minus character), starting with a letter. Using the minus character inside identifiers may look a bit surprising. We did so as to allow identifiers such as po-loc.

At startup, pre-defined identifiers are bound to relations between memory events. Those relations describe a candidate execution. Executing the model means allowing or forbiding that candidate execution.

A first pre-defined identifier is id, the identity. Other pre-defined identifiers are the program order po and its refinements:

identifier	name	description

po	program order	instruction order lifted to events
po-loc	po restricted to the same address	events are in po and touch the same address
addr	address dependency	the address of the second event depends on the value loaded by the first (read) event
data	data dependency	the value stored by the second (write) event depends on the value loaded by the first (read) event
ctrl	control dependency	the second event is in a branch controled by the value loaded by the commfirst (read) event
ctrlisync/ctrlisb	control dependency + isync/isb	the branch additionally contains a isync/isb barrier before the second event

Other pre-defined relations denote the presence of a specific fence (or barrier) in-between two events, those are mfence, sfence, lfence (X86); sync, lwsync, lwsync, isync (Power); and dsb, dmb, dsb.st, dmb.st, isb (ARM).

Finally, pre-defined identifiers are bound the communication relations:

identifier	name	description

rf	read-from	links a write w to a read r taking its value from w
co	coherence	total order over writes to the same address
fr	from-read	links a read r to a write w′ co-after the write w from which r takes its value
rfi, fri, coi	internal communications	communication between events of the same thread
rfe, fre, coe	external communications	communication between events of different threads

14.2 Expressions

Expressions are evaluated by herd, yielding a relation over memory events as a value. A relation can be seen as a set of pairs of memory events.

expr	::=	0
	∣	id
	∣	expr * ∣ expr + ∣ expr ? ∣ expr ^-1
	∣	expr \| expr ∣ expr ; expr ∣ expr \ expr ∣ expr & expr
	∣	selector ( expr )
	∣	id ( args )
	∣	fun ( params ) -> expr
	∣	let binding { and binding } in expr
	∣	let rec valbinding { and valbinding } in expr
	∣	( expr ) ∣ begin expr end

selector	::=	MM ∣ MR ∣ RM ∣ MW ∣ WM ∣ RR ∣ WR ∣ RW ∣ WW
	∣	AA ∣ AP ∣ PA ∣ PP
	∣	ext ∣ int ∣ noid

args	::=	є
	∣	expr { , expr }

params	::=	є
	∣	id { , id }

binding	::=	valbinding ∣ funbinding

valbinding	::=	id = expr

funbinding	::=	id ( params ) = expr

Simple expressions

Simple expressions are the empty relation (keyword 0) and identifiers id. Identifiers are bound to values, either before the execution (see pre-defined identifiers in Sec. 14.1), or by the model itself.

Operator expressions

The transitive and reflexive-transitive closure of an expression are performed by the postfix operators + and *. The postfix operator ^-1 performs relation inversion. The construct expr? (option) evaluates to the union of expr value and of the identity relation.

Infix operators are | (union), ; (sequence), & (intersection) and \ (set difference). The precedence of operators is as follows: postfix operators bind tighter than infix operators. Infix operators from higher precedence to lower precedence are: &, \, ; and |. All infix operators are right-associative, except set difference which is left-associative.

For the record, given two relations r₁ and r₂, the sequence r₁; r₂ is defined as { (x,y) ∣ ∃ z, (x,z) ∈ r₁ ∧ (z,y) ∈ r₂}.

Selectors

Selectors are filters that operate on relations. The value of selector( expr) is some subrelation of the value of expr. Which subrelation depends upon selector.

Most selectors are two-letters keywords with the first letter operating on the first component of pairs and the second letter operating on the second component of pairs. More precisely, we define the semantics of selector letters as predicates over memory events: M always yield true, R yields true on read events, W yields true on write events, A yields true on atomic events (produced by X86 locked instructions, or ARM/Power load reserve and store conditional instructions), and P yields true on ordinary (plain) events. Then, we define:

XY(r) = { (x,y) ∈ r ∣ X(x) ∧ Y(y)}

For instance, the expression RW(po) yields the read-to-write pairs in program order.

Three additional selectors ext, int, and noid performs specific operations. The first selects pairs of events in different threads (“external”), the second selects pairs of events in the same thread (“internal”), and the third selects pairs of events that are different (“not identical”).

Function calls

Functions calls are written id( expr₁ , … , expr_n). That is, functions have an arity (n above, which can be zero) and arguments are given as a comma separated list of expressions. Our language have call-by-value semantics. That is, the effective parameters expr₁ , … , expr_n are evaluated before being bound to formal parameters.

Functions

Functions are first class values, as reflected by the anonymous function construct fun ( params ) -> expr. Function arity is defined by the length of its formal parameter list params, which can be empty. Notice that functions have the usual static scoping semantics: variables that appear free in function bodies (expr above) are bound to the value of such free variable at function creation time.

Local bindings

The local binding construct let [ rec ] bindings in expr binds the names defined by bindings for evaluating the expression expr.

Both non-recursive and recursive bindings are allowed. The function binding id ( params ) = expr is syntactic sugar for id = fun ( params ) -> expr. It is allowed in non-recursive bindings only as there is little point in defining recursive function in our setting.

In the following we only consider value bindings id = expr. The construct

let id₁ = expr₁ and … and id_n = expr_n in expr

evaluates expr₁,…, expr_n, and binds the names id₁,…, id_n to the resulting values.

The construct

let rec id₁ = expr₁ and … and id_n = expr_n in expr

computes the least fixpoint of the equations id₁ = expr₁,…, id_n = expr_n. It then binds the names id₁,…, id_n to the resulting values. The least fixpoint computation applies only to relations, using inclusion for ordering. Notice that let rec id = fun ( params ) -> expr in… is legal syntax. Such a definition will yield a runtime error.

Parenthesised expressions

The expression (expr) has the same value as expr. Notice that a parenthesised expression can also be written as begin expr end.

14.3 Instructions

Instruction are executed for their effect. There are three kind of effects: adding new bindings, checking a condition, and specifying relations that are shown in pictures.

instruction	::=	let binding { and binding }
	∣	let rec valbinding { and valbinding }
	∣	check expr [ as id]
	∣	procedure id ( params ) = { instruction } END
	∣	call id ( args )
	∣	show expr as id
	∣	show id { , id }
	∣	unshow id { , id }
	∣	include string

check	::=	acyclic ∣ irreflexive ∣ empty

Bindings

The let and let rec constructs bind value names for the rest of model execution. See the subsection on bindings in Section 14.2 for additional information on the syntax and semantics of bindings.

Recursive definitions computes fixpoints of relations. For instance, the following fragment computes the transitive closure of all communication relations:

let com = rf | co | fr
let rec complus = com | (complus ; complus)

Notice that the instruction let complus = (rf|co|fr)+ is equivalent.

There are no recursive functions, as those would not be very useful in our limited language. Nevertheless, one may for instance write a generic transitive closure function by using a local recursive binding:

let tr(r) = let rec t = r | (t;t) in t
let complus = tr(rf|co|fr)

Again notice that the instruction let complus = (rf|co|fr)+ is equivalent.

Checks

The construct

check expr

evaluates expr and applies check. If the check succeeds, that is if the relation is acyclic, irreflexive or empty; depending on check being acyclic, irreflexive or empty, execution goes on. Otherwise, execution stops.

The performance of a check can optionally be named by appending as id after it. The feature permits not to perform some checks at user’s will, thanks to the -skipcheck id command line option.

Procedure definition and call

Procedures are similar to functions except that they have no results: the body of a procedure is a list of instructions and the procedure will be called for the effect of executing those instructions. Intended usage of procedures is to define checks that are executed later. However, the body of a procedure may consist in any kind of instructions.

For instance, one may define the following uniproc procedure with no arguments:

procedure uniproc() =
  let com = fr | rf | co in
  acyclic com | po

Then one can perform the acyclicity check (see previous section) by executing the instruction:

call uniproc()

As a result the execution will stop if the acyclicity check fails, or continue otherwise.

Procedures are lexically scoped as functions are. Additionally, the bindings performed during the execution of a procedure call are discarded when the procedure returns, all other effects performed are retained.

Show (and unshow) directives

The constructs:

show id { , id } and unshow id { , id }

take (non-empty, comma separated) lists of identifiers as arguments. The show construct adds the present values of identifiers for being shown in pictures. The unshow construct removes the identifiers from shown relations.

The more sophisticated construct

show expr as id

evaluates expr to a relation, which will be shown in pictures with label id. Hence show id can be viewed as a shorthand for show id as id

Model inclusion

The construct include "filename" is interpreted as the inclusion of the model contained in the file whose name is given as an argument to the include instruction. In practice the list of intructions defined by the included model file are executed. The string argument is delimited by double quotes “"”, which, of course, are not part of the filename.

Notice that:

Files are searched according to herd rules — see Sec. 15.4.
Model options defined in the model file filename are ignored.

14.4 Models

model	::=	model-comment { model-option } { instruction }

model-comment	::=	id ∣ string

model-option	::=	withco ∣ withoutco ∣ withinit ∣ withoutinit

A model is a a list of instruction preceded by a small comment, which can be either a name that follows herd conventions for identifiers, or a string enclosed in double quotes “"”, and a list of options.

Models operate on candidate executions (see Sec. 14.1), instructions are executed in sequence, until one instruction stops, or until the end of the instruction list. In that latter case, the model accepts the execution. The accepted execution is then passed over to the rest of herd engine, in order to collect final states of locations and to display pictures.

Model options

Model options control some experimental features of herd. More precisely, by default, herd includes a complete coherence order relation in every candidate execution, and does not represent initial writes by plain memory write events. Said otherwise, by default, model files have options withco and whithoutinit.

The generation of all possible coherence orders by herd engine is a source of inefficiency that can be alleviated by having the model itself compute the sub-relation of co that is really useful. Such models must have option withoutco, so as to prevent herd engine from generating all coherence orders. Instead, herd will represent initial writes as plain write events (i.e. option withoutco implies withinit) identify last writes in coherence oders, and pass the model a reduced co relation, named co0, that will, for any memory location x, relate the initial write to x to all writes to x, and all writes to x to the final write to x. It is then the model responsability to compute the remainder of co from the program read events. The model uniproccat.cat from the distribution gives an example of such an advanced model.

The option withinit can also be given alone so as to instruct herd engine to represent initial writes as plain write events. In such a situation, herd will compute complete coherence orders co that include those explicit initial writes as minimal elements. Observe that the representation of initial writes as events can be also controlled from the command-line (see option -initwrites) and that command line settings override model options.

15 Usage of herd

15.1 Arguments

The command herd handles its arguments like litmus. That is, herd interprets its argument as file names. Those files are either a single litmus test when having extension .litmus, or a list of file names when prefixed by @.

15.2 Options

There are many command line options. We describe the more useful ones:

General behaviour

-version: Show version number and exit.
-libdir: Show installation directory and exit.
-v: Be verbose, can be repeated to increase verbosity.
-q: Be quiet, suppress any diagnostic message.
-conf <name>: Read configuration file name. Configuration files have a very simple syntax: a line “opt arg” has the same effect as the command-line option “-opt arg”.
-o <dest>: Output files into directory <dest>. Notice that <dest> must exist. At the moment herd may output one .dot file per processed test: the file for test base.litmus is named base.dot. By default herd does not generate .dot files.
-suffix <suf>: Change the name of .dot files into basesuff.dot. Useful when several .dot files derive from the same test. Default is the empty string (no suffix).
-gv: Fork the gv Postscript viewer to display execution diagrams.
-dumpes <bool>: Dump genererated event structures and exit. Default is false. Event structures will be dumped in a .dot file whose name is determined as usual — See options -o and -suffix above. Optionally the event structures can be displayed with the -gv option.
-unroll <int>: The setting -unroll n performs backwards jumps n times. This is a workaround for one of herd main limitation: herd does not really handle loops. Default is 2.
-hexa <bool>: Print numbers in hexadecimal. Default is false (numbers are printed in decimal).

Engine control

The main purpose of herd is to run tests on top of memory models. For a given test, herd performs a three stage process:

Generate candidate executions.
For each exection, run the model. The model may reject or accept the execution.
For each candidate that the model accepts, record observed locations and a diagram of the execution (if instructed so).

We now describe options that control those three stages.

Select model, this option accept one tag or one file name with extension .cat. Tags instruct herd to select an internal model, while file names are read for a model definition. By default, herd run tests on top of an advanced (herd) model for Power and ARM, and on top of x86tso for X86. Other (documented) model tags are:

cav12, the model of [4] (Power);
minimal, the minimal model that allows all executions;
uniproc, the uniproc model that checks single-thread correctness.

In fact, herd accepts potentially infinitely many models, as models can given in text files in an adhoc language described in Sec. 14. The herd distribution includes several such models: herd.cat, minimal.cat, uniproc.cat and x86tso.cat are text file versions of the homonymous internal models, but may produce pictures that show different relations. Model files are searched according to the same rules as configuration files.

-through (all|invalid|none)

Let additional executions reach the final stage of herd engine. This option permits users to generate pictures of forbidden executions, which are otherwise rejected at an early stage of herd engine — see Sec. 13.2. Namely, the default “none” let only valid (according to the active model) executions through. The behaviour of this option differs between internal and text file models:

For internal models: the tag all let all executions go through; while the tag invalid will reject executions that violate uniproc, while letting other forbidden execution go through.
Text file models: the tags all and invalid let all executions go through. For such models, a more precise control over executions that reach herd final phase can be achieved with the option -skipcheck — see next option.

Default is none.

-skipcheck <name₁,…,name_n>

This option applies to text file models. It instructs herd to ignore the outcomes of the given checks. For the option to operate, checks must be named in the model file with the as name construct – see Sec. 14.3. Notice that the arguments to -skipcheck options cumulate. That is, “-skipcheck name₁ -skipcheck name₂” acts like “-skipcheck name₁,name₂”.

-strictskip <bool>

Setting this option (-strictskip true), will change the behaviour of the previous option -skipcheck: it will let executions go through when the skipped checks yield false and the unskipped checks yield true. This option comes handy when one want to observe the executions that fail one (or several) checks while passing others. Default is false.

-optace <bool>

Optimise the axiomatic candidate execution stage. When enabled by -optace true, herd does not generate candidate executions that fail the uniproc test. The default is “true” for internal models (except the minimal model), and “false” for text file models. Notice that -model uniproc.cat and -model minimal.cat -optace true should yield identical results, the second being faster. Setting -optace true can lower the execution time significantly, but one should pay attention not to design models that forget the uniproc condition.

-show (prop|neg|all|cond|wit|none)

Select execution diagrams for picture display and generation. Execution diagrams are shown according to the final condition of test. The final condition is a quantified boolean proposition exists p, ~exists p, or forall p. The semantics of recognised tags is as follows:

prop Picture executions for which p is true.
neg Picture executions for which p is false.
all Picture all executions.
cond Picture executions that validate the condition, i.e. p is true for exists and forall, and false for ~exists.
wit Picture “interesting” executions, i.e. p is true for exists and ~exists, and false for forall.
none Picture no execution.

Default is none.

-initwrites <bool>

Represent init writes as plain write events, default is false except for specifically tagged generic models — see “Model options” in Sec. 14.4.

Discard some observations

Those options intentionally omit some of the final states that herd would normally generate.

-speedcheck (false|true|fast): When enabled by -speedcheck true or -speedcheck fast, attempt to settle the test condition. That is, herd will generate a subset of executions (those named “interesting” above) in place of all executions. With setting -speedcheck fast, herd will additionally stop as soon as a condition exists p is validated, and as soon as a condition ~exists p or forall p is invalidated. Default is false.
-nshow <int>: Stop once <int> pictures have been collected. Default is to collect all (specified, see option -show) pictures.

Control dot pictures

These options control the content of DOT images.

We first describe options that act at the general level.

-graph (cluster|free|columns)

Select main mode for graphs. See Sec. 13.1. The default is cluster.

-dotmode (plain|fig)

The setting -dotmode fig produces output that includes the proper escape sequence for translating .dot files to .fig files (e.g. with dot -Tfig…). Default is plain.

-dotcom (dot|neato|circo)

Select the command that formats graphs displayed by the -gv option. The default is dot for the cluster and free graph modes, and neato for the columns graph mode.

-showevents (all|mem|noregs)

Control which events are pictured:

all Picture all events.
mem Picture memory events.
noregs Picture all events except register events, i.e. memory, fences and branch events.

Default is noregs.

-showinitwrites <bool>

Show initial write events (when existing, see option -initwrites) in pictures. Default is true.

-mono <bool>

The setting -mono true commands monochrome pictures. This option acts upon default color selection. Thus, it has no effect on colors given explicitely with the -edgeattr option.

-scale <float>

Global scale factor for graphs in columns mode. Default is 1.0.

-xscale <float>

Global scale factor for graphs in columns mode, x direction. Default is 1.0.

-yscale <float>

Global scale factor for graphs in columns mode, y direction. Default is 1.0.

-showthread <bool>

Show thread numbers in figures. In cluster mode where the events of a thread are clustered, thread cluster have a label. In free mode po edges are suffixed by a thread number. In columns mode, columhs have a header node that shows the thread number. Default is true.

-texmacros <bool>

Use latex commands in some text of pictures. If activated (-showthread true), thread numbers are shown as \myth{n}. Assembler instructions are locations in nodes are argument to an \asm command. It user responsability to define those commands in their L^AT_EX documents that include the pictures. Possible definitions are \newcommand{\myth}[1]{Thread~#1} and \newcommand{\asm}[1]{\texttt{#1}}. Default is false.

A few options control picture legends.

-showlegend <bool>: Add a legend to pictures. By default legends show the test name and a comment from the executed model. This comment is the first item of model syntax — see Sec 14.4. Default is true.
-showkind <bool>: Show test kind in legend. The kind derive from the quantifier of test final condition, kind Allow being exists, kind Forbid being ~exists, and kind Require being forall. Default is false.
-shortlegend <bool>: Limit legend to test name. Default is false.

A few options control what is shown in nodes and on their sizes, i.e. on how events are pictured.

-squished <bool>: The setting -squished true drastically limits the information displayed in graph nodes. This is usually what is wanted in modes free and columns. Default is false.
-fixedsize <bool>: This setting is meaningfull in columns graph mode and for squished nodes. When set by -fixedsize true it forces node width to be 65% of the space between columns. This may sometime yield a nice edge routing. Default is false
-extrachars <float>: This setting is meaningful in columns graph mode and for squished nodes. When the size of nodes is not fixed (i.e. -fixedsize false and default), herd computes the width of nodes by counting caracters in node labels and scaling the result by the font size. The setting -extrachars v commands adding the value v before scaling. Negative values are of course accepted. Default is 0.0.
-showobserved <bool>: Highlight observed memory read events with stars “*”. A memory read is observed when the value it reads is stored in a register that appears in final states. Default is false.
-brackets <bool>: Show brackets around locations. Default is false.

Then we list options that offer some control on which edges are shown. We recall that the main controls over the shown and unshown edges are the show and unshow directives in model definitions — see Sec. 14.3. However, some edges can be controled only with options (or configuration files) and the -unshow option proves convenient.

-showpo <bool>: Show program order (po) edges. Default is true. Default is false.
-showinitrf <bool>: Show read-from edges from initial state. Default is false.
-showfinalrf <bool>: Show read-from edges to the final state, i.e show the last store to locations. Default is false. i.e show the last store to locations. Default is false.
-showfr <bool>: Show from-read edges. Default is true.
-doshow <name₁,…,name_n>: Do show edges labelled with name₁,…,name_n. This setting applies when names are bound in model definition.
-unshow <name₁,…,name_n>: Do not show edges labelled with name₁,…,name_n. This setting applies at the very last momement and thus cancels any show directive in model definition and any -doshow command line option.

Other options offer some control over some of the attributes defined in Graphviz software documentation. Notice that the controlled attributes are omitted from DOT files when no setting is present. For instance in the absence of a -spline <tag> option, herd will generate no definition for the splines attribute thus resorting to DOT tools defaults. Most of the following options accept the none argument that restores their default behaviour.

-splines (spline|true|line|false|polyline|ortho|curved|none): Define the value of the splines attribute. Tags are replicated in output files as the value of the attribute, except for none.
-margin <float|none>: Specifies the margin attribute of graphs.
-pad <float|none>: Specifies the pad attribute of graphs.
-sep <string|none>: Specifies the sep attribute of graphs. Notice that the argument is an arbitray string, so as to allow DOT general syntax for this attribute.
-fontname <string|none>: Specifies the graph fontname attribute.
-fontsize <int|none>: Specifies the fontsize attribute n of all text in the graph.
-edgefontsizedelta <int>: option -edgefontsizedelta m sets the fontsize attributes of edges to n+m, where n is the argument to the -fontsize option. Default is 0. This option has no effect if fontsize is unset.
-penwidth <float|none>: Specifies the penwidth attribute of edges.
-arrowsize <float|none>: Specifies the arrowsize attribute of edges.
-edgeattr <label,attribute,value>: Give value value to attribute attribute of all edges labelled label. This powerful option permits alternative styles for edges. For instance, the ghb edges of the diagrams of this document are thick purple (blueviolet) arrows thanks to the settings: -edgeattr ghb,color,blueviolet -edgeattr ghb,penwidth,3.0 -edgeattr ghb,arrowsize,1.2. Notice that the settings performed by the -edgeattr option override other settings. This option has no default.

Change input

Those options are the same as the ones of litmus — see Sec. 4.

-names <file>: Run herd only on tests whose names are listed in <file>.
-rename <file>: Change test names.
-kinds <file>: Change test kinds. This amonts to changing the quantifier of final conditions, with kind Allow being exists, kind Forbid being ~exists and kind Require being forall.
-conds <file>: Change the final condition of tests. This is by far the most useful of these options: in combination with option -show prop it permits a fine grain selection of execution pictures — see Sec. 20.

15.3 Configuration files

The syntax of configuration files is minimal: lines “key arg” are interpreted as setting the value of parameter key to arg. Each parameter has a corresponding option, usually -key, except for the single letter option -v whose parameter is verbose.

As command line option are processed left-to-right, settings from a configuration file (option -conf) can be overridden by a later command line option. Configuration files will be used mostly for controling pictures. Some configuration files are are present in the distribution. As an example, here is the configuration file apoil.cfg, which can be used to display images in free mode.

#Main graph mode
graph free
#Show memory events only
showevents memory
#Minimal information in nodes
squished true
#Do not show a legend at all
showlegend false

The configuration above is commented with line comments that starts with “#”. The above configuration file comes handy to eye-proof model output, even for relatively complex tests, such as IRIW+lwsyncs and IRIW+syncs:

% herd -conf apoil.cfg -show prop -gv IRIW+lwsyncs.litmus
% herd -through invalid -conf apoil.cfg -show prop -gv IRIW+syncs.litmus

We run the two tests on top of the default model that computes, amongst others, a prop relation. The model rejects executions with a cyclic prop. One can then see that the relation prop is acyclic for IRIW+lwsyncs and cyclic for IRIW+syncs:

Notice that we used the option -through invalid in the case of IRIW+syncs as we would otherwise have no image.

15.4 File searching

Configuration and model files are searched first in the current directory; then in any directory specified by setting the shell environment variable HERDDIR; and then in herd installation directory, which is defined while compiling herd.

16 Extensions to Herd

This section describes several extensions to herd that have been implemented by Tyler Sorensen and John Wickerson in collaboration with the original authors. In due course, the contents of this section will probably be merged into the previous sections to form a cohesive manual.

16.1 Additional command line options

-dumplem: Convert the given herd model to Lem format, and exit. The resultant Lem file is generated on stdout.
Note. It is necessary to provide a litmus test when invoking herd in this way, even though the litmus test will not be examined. This is due to a minor technical problem.
-evince: Fork the evince document viewer to display execution diagrams. This option was implemented to provide an alternative to the gv viewer.

16.2 Curried function application

This is an amendment to Section 14.2.

The syntax of expressions with respect to function application is currently described as:

expr	::=	…
	∣	id ( args )

In fact, the syntax is more general: multiple arguments may be given, in a currying fashion. For instance, one may write such expressions as f (a,b) (c,d,e). The syntax should therefore be updated as follows:

expr	::=	…
	∣	id { ( args ) }

16.3 Expressions over sets

This is an appendix to Section 14.2.

Models can now define expressions over sets, and also bind identifiers to sets. The syntax of expressions thus becomes:

expr	::=	…
	∣	{}
	∣	_
	∣	~ expr
	∣	! expr
	∣	expr * expr
	∣	[ expr ]

We write {} for the empty set (in contrast to 0 for the empty relation), _ for the universal set of all events in the event structure, ~ for the complement of a relation, and ! for the complement of a set. If s and t represent sets, then s * t builds their cross product. We also provide [ s ], which denotes the relation that links each element of s to itself.

16.4 Provided conditions and undefined-unless conditions

This is an appendix to Section 14.3.

A test-instruction is used to enforce a contract between the programming language and the programmer. There are two types: provided conditions and undefined-unless conditions. The latter are indicated with the undefined_unless keyword. A provided condition is an obligation on the programming language; that is, the programmer can assume that every execution of their program will satisfy every provided condition. An undefined-unless condition is an obligation on the programmer; that is, the compiler-writer can assume that every execution of the program will satisfy every undefined-unless condition. In other words, if a program has an execution that violates an undefined-unless condition, then its behaviour is completely undefined.

The syntax for instructions thus becomes:

instruction	::=	…
	∣	undefined_unless check expr [ as id]

16.5 Additional identifiers

This is an appendix to Section 14.1.

Pre-defined relations

Here are some more pre-defined relations.

identifier	name	description

unv	universal relation	relates every event in the structure to every other event
int-s	internal at given scope	(applicable only to scoped memory models) relates events that are in the same part of the execution hierarchy
ext-s	external at given scope	(applicable only to scoped memory models) relates events that are in different parts of the execution hierarchy

Here, s ranges over the following values:

value of s	description (in OpenCL terminology)

wi, thread	work-item
sg, warp	sub-group
wg, block, cta	work-group
kernel	kernel
dev	device

For example, int-cta relates all events that are in the same work-group, while ext-wi relates all events that are in different work-items (threads).

We provide the following additional fence relations: membar.cta, membar.gl, membar.sys (PTX). C++ and OpenCL fences do not appear in this list because those fences are modelled as events rather than relations. By modelling these fences as events, we are better able to attach parameters to them, such as the memory-order (C++ and OpenCL) and the memory-scope (OpenCL).

In C++ models, the following relations are pre-defined.

identifier	name	description

asw	additional synchronises-with	links every initial write to every event that is not an initial write
lo	lock order	a total order on all mutex operations (similar to co, but for mutexes instead of writes)

Pre-defined sets

Here are some pre-defined sets, available in all models.

identifier	name	description

_	universal set	the set of all events in the structure
R	read events	set of all reads from memory
W	write events	set of all writes to memory
M	memory events	set of all reads from and writes to memory
B	barrier events	is a barrier
A	atomic events	is an atomic event
P	plain events	is not an atomic event
I	initial writes	is an initial write event

Having defined these sets, it is now possible to write expressions of the form RW(e) as [R * W] & e. However, the simulation in the latter case may be less efficient, owing to the need to construct the intermediate relation [R * W].

In C++ and OpenCL models, the following sets are pre-defined.

identifier	name	description

rmw	read-modify-writes	the set of all read-modify-write events
brmw	blocked read-modify-writes	events representing an attempted read-modify-write operation that has become stuck
F	fences	the set of all fences
acq	acquire	atomic events with “acquire” memory-order
rel	release	atomic events with “release” memory-order
acq_rel	acquire/release	atomic events with “acquire/release” memory-order
rlx	relaxed	atomic events with “relaxed” memory-order
sc	sequentially consistent	atomic events with “sequentially consistent” memory-order
na	non-atomics	non-atomic events

In C++ models, the following sets are pre-defined.

identifier	name	description

lk	locks	the set of all lock events
ls	successful locks	the set of all successful lock events
ul	unlocks	the set of all unlock events
con	consume	atomic events with “consume” memory-order
atomicloc	atomic locations	events acting on atomic locations
nonatomicloc	non-atomic locations	events acting on non-atomic locations
mutexloc	mutex locations	events acting on mutex locations

In OpenCL models, the following sets are pre-defined.

identifier	name	description

gF	global fences	the set of all global fences
lF	local fences	the set of all local fences
wi	work-item scope	events with “work-item” memory-scope
sg	sub-group scope	events with “sub-group” memory-scope
wg	work-group scope	events with “work-group” memory-scope
dev	device scope	events with “device” memory-scope
all_dev	all-devices scope	events with “all_svn_devices” memory-scope

4: The last communication relation from-read fr, derives from rf and co. A read event r is fr-before a write event w when r takes its value from a write w₀ that is co-before w.
5: This option requires the Postscript visualiser gv.
6: The setting of showthread is also changed, by the omitted -showthread true command line option

Part III Simulating memory models with herd