Previous Up Next

Part I
Running tests with litmus

Traditionally, a litmus test is a small parallel program designed to exercise the memory model of a parallel, shared-memory, computer. Given a litmus test in assembler (X86, Power or ARM) litmus runs the test.

Using litmus thus requires a parallel machine, which must additionally feature gcc and the pthreads library. At the moment, litmus is a prototype and has numerous limitations (recognised instructions, limited porting). Nevertheless, litmus should accept all tests produced by the companion diy tool and has been successfully used on Linux, MacOS and on AIX.

The authors of litmus are Luc Maranget and Susmit Sarkar. The present litmus is inspired from a prototype by Thomas Braibant (INRIA Rhône-Alpes) and Francesco Zappa Nardelli (INRIA Paris-Rocquencourt).

1  A tour of litmus

1.1  A simple run

Consider the following (rather classical, store buffering) SB.litmus litmus test for X86:

X86 SB
"Fre PodWR Fre PodWR"
{ x=0; y=0; }
 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MOV EAX,[y] | MOV EAX,[x] ;
exists (0:EAX=0 /\ 1:EAX=0)

A litmus test source has three main sections:

  1. The initial state defines the initial values of registers and memory locations. Initialisation to zero may be omitted.
  2. The code section defines the code to be run concurrently — above there are two threads. Yes we know, our X86 assembler syntax is a mistake.
  3. The final condition applies to the final values of registers and memory locations.

Run the test by (complete run log):

% litmus SB.litmus
%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for SB.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%
X86 SB
"Fre PodWR Fre PodWR"

{x=0; y=0;}

 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MOV EAX,[y] | MOV EAX,[x] ;

exists (0:EAX=0 /\ 1:EAX=0)
Generated assembler
#START _litmus_P1
 movl $1,(%r10)
 movl (%r9),%eax
#START _litmus_P0
 movl $1,(%r9)
 movl (%r10),%eax

Test SB Allowed
Histogram (4 states)
40    *>0:EAX=0; 1:EAX=0;
499923:>0:EAX=1; 1:EAX=0;
500009:>0:EAX=0; 1:EAX=1;
28    :>0:EAX=1; 1:EAX=1;
Ok

Witnesses
Positive: 40, Negative: 999960
Condition exists (0:EAX=0 /\ 1:EAX=0) is validated
Hash=7dbd6b8e6dd4abc2ef3d48b0376fb2e3
Observation SB Sometimes 40 999960
Time SB 0.44
...

The litmus test is first reminded, followed by actual assembler — the machine is a 64 bits one, in-line address references disappeared, registers may change, and assembler syntax is now more familiar. The test has run one million times, producing one million final states, or outcomes for the registers EAX of threads P0 and P1. The test run validates the condition, with 40 positive witnesses.

1.2  Cross compilation

With option -o <name.tar>, litmus does not run the test. Instead, it produces a tar archive that contains the C sources for the test.

Consider SB-PPC.litmus, a Power version of the previous test:

PPC SB-PPC
"Fre PodWR Fre PodWR"
{
0:r2=x; 0:r4=y;
1:r2=y; 1:r4=x;
}
 P0           | P1           ;
 li r1,1      | li r1,1      ;
 stw r1,0(r2) | stw r1,0(r2) ;
 lwz r3,0(r4) | lwz r3,0(r4) ;
exists (0:r3=0 /\ 1:r3=0)

Our target machine (ppc) runs MacOS, wich we specify with the -os option:

% litmus -o /tmp/a.tar -os mac SB-PPC.litmus
% scp /tmp/a.tar ppc:/tmp

Then, on the remote machine ppc:

ppc% mkdir SB && cd SB
ppc% tar xf /tmp/a.tar
ppc% ls
Makefile comp.sh run.sh SB-PPC.c outs.c utils.c

Test is compiled by the shell script comp.sh (or by (Gnu) make, at user’s choice) and run by the shell script run.sh:

ppc% sh comp.sh
ppc% sh run.sh
...
Test SB-PPC Allowed
Histogram (3 states)
1784  *>0:r3=0; 1:r3=0;
498564:>0:r3=1; 1:r3=0;
499652:>0:r3=0; 1:r3=1;
Ok

Witnesses
Positive: 1784, Negative: 998216
Condition exists (0:r3=0 /\ 1:r3=0) is validated
Hash=4edecf6abc507611612efaecc1c4a9bc
Observation SB-PPC Sometimes 1784 998216
Time SB-PPC 0.55
...

(Complete run log.) As we see, the condition validates also on Power. Notice that compilation produces an executable file, SB-PPC.exe, which can be run directly, for a less verbose output.

1.3  Running several tests at once

Consider the additional test STFW-PPC.litmus:

PPC STFW-PPC
"Rfi PodRR Fre Rfi PodRR Fre"
{
0:r2=x; 0:r5=y;
1:r2=y; 1:r5=x;
}
 P0           | P1           ;
 li r1,1      | li r1,1      ;
 stw r1,0(r2) | stw r1,0(r2) ;
 lwz r3,0(r2) | lwz r3,0(r2) ;
 lwz r4,0(r5) | lwz r4,0(r5) ;
exists
(0:r3=1 /\ 0:r4=0 /\ 1:r3=1 /\ 1:r4=0)

To compile the two tests together, we can give two file names as arguments to litmus:

$ litmus -o /tmp/a.tar -os mac SB-PPC.litmus STFW-PPC.litmus 

Or, more conveniently, list the litmus sources in a file whose name starts with @:

$ cat @ppc
SB-PPC.litmus
STFW-PPC.litmus
$ litmus -o /tmp/a.tar -os mac @ppc

To run the test on the remote ppc machine, the same sequence of commands as in the one test case applies:

ppc% tar xf /tmp/a.tar && make && sh run.sh
...
Test SB-PPC Allowed
Histogram (3 states)
1765  *>0:r3=0; 1:r3=0;
498741:>0:r3=1; 1:r3=0;
499494:>0:r3=0; 1:r3=1;
Ok

Witnesses
Positive: 1765, Negative: 998235
Condition exists (0:r3=0 /\ 1:r3=0) is validated
Hash=4edecf6abc507611612efaecc1c4a9bc
Observation SB-PPC Sometimes 1765 998235
Time SB-PPC 0.57
...
Test STFW-PPC Allowed
Histogram (4 states)
480   *>0:r3=1; 0:r4=0; 1:r3=1; 1:r4=0;
499560:>0:r3=1; 0:r4=1; 1:r3=1; 1:r4=0;
499827:>0:r3=1; 0:r4=0; 1:r3=1; 1:r4=1;
133   :>0:r3=1; 0:r4=1; 1:r3=1; 1:r4=1;
Ok

Witnesses
Positive: 480, Negative: 999520
Condition exists (0:r3=1 /\ 0:r4=0 /\ 1:r3=1 /\ 1:r4=0) is validated
Hash=92b2c3f6332309325000656d0632131e
Observation STFW-PPC Sometimes 480 999520
Time STFW-PPC 0.56
...

(Complete run log.) Now, the output of run.sh shows the result of two tests.

2  Controlling test parameters

Users can control some of testing conditions. Those impact efficiency and outcome variability.

Sometimes one looks for a particular outcome — for instance, one may seek to get the outcome 0:r3=1; 1:r3=1; that is missing in the previous experiment for test SB-PPC. To that aim, varying test conditions may help.

2.1  Architecture of tests

Consider a test a.litmus designed to run on t threads P0,…, Pt−1. The structure of the executable a.exe that performs the experiment is as follows:

Hence, running a.exe produces n × r × s outcomes. Parameters n, a, r and s can first be set directly while invoking a.exe, using the appropriate command line options. For instance, assuming t=2, ./a.exe -a 201 -r 10k -s 1 and ./a.exe -n 1 -r 1 -s 1M will both produce one million outcomes, but the latter is probably more efficient. If our machine has 8 cores, ./a.exe -a 8 -r 1 -s 1M will yield 4 millions outcomes, in a time that we hope not to exceed too much the one experienced with ./a.exe -n 1. Also observe that the memory allocated is roughly proportional to n × s, while the number of Tk threads created will be t × n × r (t × n in cache mode). The run.sh shell script transmits its command line to all the executable (.exe) files it invokes, thereby providing a convenient means to control testing condition for several tests. Satisfactory test parameters are found by experimenting and the control of executable files by command line options is designed for that purpose.

Once satisfactory parameters are found, it is a nuisance to repeat them for every experiment. Thus, parameters a, r and s can also be set while invoking litmus, with the same command line options. In fact those settings command he default values of .exe files controls. Additionally, the synchronisation technique for iterations, the memory mode, and several others compile time parameters can be selected by appropriate litmus command line options. Finally, users can record frequently used parameters in configuration files.

2.2  Affinity

We view affinity as a scheduler property that binds a (software, POSIX) thread to a given (hardware) logical processor. In the most simple situation a logical processor is a core. However in the presence of hyperthreading (x86) or simultaneous multi threading (SMT, Power) a given core can host several logical processors.

2.2.1  Introduction to affinity

In our experience, binding the threads of test programs to selected logical processors yields significant speedups and, more importantly, greater outcome variety. We illustrate the issue by the means of an example.

We consider the test ppc-iriw-lwsync.litmus:

PPC ppc-iriw-lwsync
{
0:r2=x; 1:r2=x; 1:r4=y;
2:r4=y; 3:r2=x; 3:r4=y;
}
 P0           | P1           | P2           | P3           ;
 li r1,1      | lwz r1,0(r2) | li r1,1      | lwz r1,0(r4) ;
 stw r1,0(r2) | lwsync       | stw r1,0(r4) | lwsync       ;
              | lwz r3,0(r4) |              | lwz r3,0(r2) ;
exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0)

The test consists of four threads. There are two writers (P0 and P2) that write the value one into two different locations (x and y), and two readers that read the contents of x and y in different orders — P1 reads x first, while P3 reads y first. The load instructions lwz in reader threads are separated by a lightweight barrier instruction lwsync. The final condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) characterises the situation where the reader threads see the writes by P0 and P2 in opposite order. The corresponding outcome 1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0; is the only non-sequential consistent (non-SC, see Part II) possible outcome. By any reasonable memory model for Power, one expects the condition to validate, i.e. the non-SC outcome to show up.

The tested machine vargas is a Power 6 featuring 32 cores (i.e. 64 logical processors, since SMT is enabled) and running AIX in 64 bits mode. So as not to disturb other users, we run only one instance of the test, thus specifying four available processors. The litmus tool is absent on vargas. All these conditions command the following invocation of litmus, performed on our local machine:

$ litmus -r 1000 -s 1000 -a 4 -os aix -ws w64 ppc-iriw-lwsync.litmus -o ppc.tar
$ scp ppc.tar vargas:/var/tmp

On vargas we unpack the archive and compile the test:

vargas% tar xf /var/tmp/ppc.tar && sh comp.sh

Then we run the test:

vargas% ./ppc-iriw-lwsync.exe
Test ppc-iriw-lwsync Allowed
Histogram (15 states)
163674:>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=0;
34045 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=0;
40283 :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=0;
95079 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=0;
33848 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=0;
72201 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=0;
32452 :>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=0;
43031 :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=1;
73052 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=1;
1     :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=1;
42482 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=1;
90470 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=1;
30306 :>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=1;
43239 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=1;
205837:>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=1;
No

Witnesses
Positive: 0, Negative: 1000000
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is NOT validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Never 0 1000000
Time ppc-iriw-lwsync 1.32

The non-SC outcome does not show up.

Altering parameters may yield this outcome. In particular, we may try using all the available logical processors with option -a 64. Affinity control offers an alternative, which is enabled at compilation time with litmus option -affinity:

$ litmus ... -affinity incr1 ppc-iriw-lwsync.litmus -o ppc.tar
$ scp ppc.tar vargas:/var/tmp

Option -affinity takes one argument (incr1 above) that specifies the increment used while allocating logical processors to test threads. Here, the (POSIX) threads created by the test (named T0, T1, T2 and T3 in Sec. 2.1) will get bound to logical processors 0, 1, 2, and 3, respectively.

Namely, by default, the logical processors are ordered as the sequence 0, 1, …, A−1 — where A is the number of available logical processors, which is inferred by the test executable2. Furthermore, logical processors are allocated to threads by applying the affinity increment while scanning the logical processor sequence. Observe that since the launch mode is changing (the default) threads Tk correspond to different test threads Pi at each run. The unpack compile and run sequence on vargas now yields the non-SC outcome, better outcome variety and a lower running time:

vargas% tar xf /var/tmp/ppc.tar && sh comp.sh
vargas% ./ppc-iriw-lwsync.exe
Test ppc-iriw-lwsync Allowed
Histogram (16 states)
180600:>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=0;
3656  :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=0;
18812 :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=0;
77692 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=0;
2973  :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=0;
9     *>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0;
28881 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=0;
75126 :>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=0;
20939 :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=1;
30498 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=1;
1234  :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=1;
89993 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=1;
75769 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=1;
76361 :>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=1;
87864 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=1;
229593:>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=1;
Ok

Witnesses
Positive: 9, Negative: 999991
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Sometimes 9 999991
Time ppc-iriw-lwsync 0.68

One may change the affinity increment with the command line option -i of executable files. For instance, one binds the test threads to logical processors 0, 2, 4 and 6 as follows:

vargas% ./ppc-iriw-lwsync.exe -i 2 
Test ppc-iriw-lwsync Allowed
Histogram (15 states)
160629:>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=0;
33389 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=0;
43725 :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=0;
93114 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=0;
33556 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=0;
64875 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=0;
34908 :>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=0;
43770 :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=1;
64544 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=1;
4     :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=1;
54633 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=1;
92617 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=1;
34754 :>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=1;
54027 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=1;
191455:>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=1;
No

Witnesses
Positive: 0, Negative: 1000000
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is NOT validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Never 0 1000000
Time ppc-iriw-lwsync 0.92

One observes that the non-SC outcome does not show up with the new affinity setting.

One may also bind test thread to logical processors randomly with executable option +ra.

vargas% ./ppc-iriw-lwsync.exe +ra
Test ppc-iriw-lwsync Allowed
Histogram (15 states)
...
No

Witnesses
Positive: 0, Negative: 1000000
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is NOT validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Never 0 1000000
Time ppc-iriw-lwsync 1.85

As we see, the condition does not validate either with random affinity. As a matter of fact, logical processors are taken at random in the sequence 0, 1, …, 63; while the successful run with -i 1 took them in the sequence 0, 1, 2, 3. One can limit the sequence of logical processor with option -p, which takes a sequence of logical processors numbers as argument:

vargas% ./ppc-iriw-lwsync.exe +ra -p 0,1,2,3
Test ppc-iriw-lwsync Allowed
Histogram (16 states)
...
8     *>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0;
...
Ok

Witnesses
Positive: 8, Negative: 999992
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Sometimes 8 999992
Time ppc-iriw-lwsync 0.70

The condition now validates.

2.2.2  Study of affinity

As illustrated by the previous example, both the running time and the outcomes of a test are sensitive to affinity settings. We measured running time for increasing values of the affinity increment from 0 (which disables affinity control) to 20, producing the following figure:

As regards outcome variety, we get all of the 16 possible outcomes only for an affinity increment of 1.

The differences in running times can be explained by reference to the mapping of logical processors to hardware. The machine vargas consists in four MCM’s (Multi-Chip-Module), each MCM consists in four “chips”, each chip consists in two cores, and each core may support two logical processors. As far as we know, by querying vargas with the AIX commands lsattr, bindprocessor and llstat, the MCM’s hold the logical processors 0–15, 16–31, 32–47 and 48–63, each chip holds the logical processors 4k, 4k+1, 4k+2, 4k+3 and each core holds the logical processors 2k, 2k+1.

The measure of running times for varying increments reveals two noticeable slowdowns: from an increment of 1 to an increment of 2 and from 5 to 6. The gap between 1 and 2 reveals the benefits of SMT for our testing application. An increment of 1 yields both the greatest outcome variety and the minimal running time. The other gap may perhaps be explained by reference to MCM’s: for a value of 5 the tests runs on the logical processors 0, 5, 10, 15, all belonging to the same MCM; while the next affinity increment of 6 results in running the test on two different MCM (0, 6, 12 on the one hand and 18 on the other).

As a conclusion, affinity control provides users with a certain level of control over thread placement, which is likely to yield faster tests when threads are constrained to run on logical processors that are “close” one to another. The best results are obtained when SMT is effectively enforced. However, affinity control is no panacea, and the memory system may be stressed by other means, such as, for instance, allocating important chunks of memory (option -s).

2.2.3  Advanced control

For specific experiments, the technique of allocating logical processors sequentially by following a fixed increment may be two rigid. litmus offers a finer control on affinity by allowing users to supply the logical processors sequence. Notice that most users will probably not need this advanced feature.

Anyhow, so as to confirm that testing ppc-iriw-lwsync benefits from not crossing chip boundaries, one may wish to confine its four threads to logical processors 16 to 19, that is to the first chip of the second MCM. This can be done by overriding the default logical processors sequence by an user supplied one given as an argument to command-line option -p:

vargas% ./ppc-iriw-lwsync.exe -p 16,17,18,19 -i 1
Test ppc-iriw-lwsync Allowed
Histogram (16 states)
169420:>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=0;
1287  :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=0;
17344 :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=0;
85329 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=0;
1548  :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=0;
3     *>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0;
27014 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=0;
75160 :>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=0;
19828 :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=1;
29521 :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=1;
441   :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=1;
93878 :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=1;
81081 :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=1;
76701 :>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=1;
93623 :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=1;
227822:>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=1;
Ok

Witnesses
Positive: 3, Negative: 999997
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Sometimes 3 999997
Time ppc-iriw-lwsync 0.63

Thus we get results similar to the previous experiment on logical processors 0 to 3 (option -i 1 alone).

We may also run four simultaneous instances (-n 4, parameter n of section 2.1) of the test on the four available MCM’s:

vargas% ./ppc-iriw-lwsync.exe -p 0,1,2,3,16,17,18,19,32,33,34,35,48,49,50,51 -n 4 -i 1
Test ppc-iriw-lwsync Allowed
Histogram (16 states)
...
57    *>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0;
...
Ok

Witnesses
Positive: 57, Negative: 3999943
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is validated
Hash=4fbfaafa51f6784d699e9bdaf5ba047d
Observation ppc-iriw-lwsync Sometimes 57 3999943
Time ppc-iriw-lwsync 0.75

Observe that, for a negligible penalty in running time, the number of non-SC outcomes increases significantly.

By contrast, binding threads of a given instance of the test to different MCM’s results in poor running time and no non-SC outcome.

vargas% ./ppc-iriw-lwsync.exe -p 0,1,2,3,16,17,18,19,32,33,34,35,48,49,50,51 -n 4 -i 4
Test ppc-iriw-lwsync Allowed
Histogram (15 states)
...
Witnesses
Positive: 0, Negative: 4000000
Condition exists (0:r1=1 /\ 0:r3=0 /\ 2:r1=1 /\ 2:r3=0) is NOT validated
Time ppc-iriw-lwsync 1.48

In the experiment above, the increment is 4, hence the logical processors allocated to the first instance of the test are 0, 16, 32, 48, of which indices in the logical processors sequence are 0, 4, 8, 12, respectively. The next allocated index in the sequence is 12+4 = 16. However, the sequence has 16 items. Wrapping around yields index 0 which happens to be the same as the starting index. Then, so as to allocate fresh processors, the starting index is incremented by one, resulting in allocating processors 1, 17, 33, 49 (indices 1, 5, 9, 13) to the second instance — see section 2.3 for the full story. Similarly, the third and fourth instances will get processors 2, 18, 34, 50 and 3, 19, 35, 51, respectively. Attentive readers may have noticed that the same experiment can be performed with option -i 16 and no -p option.

Finally, users should probably be aware that at least some versions of Linux for x86 feature a less obvious mapping of logical processors to hardware. On a bi-processor, dual-core, 2-ways hyperthreading, Linux, AMD64 machine, we have checked that logical processors residing on the same core are k and k+4, where k is an arbitrary core number ranging from 0 to 3. As a result, a proper choice for favouring effective hyperthreading on such a machine is -i 4 (or -p 0,4,1,5,2,6,3,7 -i 1). More worthwhile noticing, perhaps, the straightforward choice -i 1 disfavours effective hyperthreading…

2.2.4  Custom control

Most tests run by litmus are produced by the litmus test generators described in Part II. Those tests include meta-information that may direct affinity control. For instance we generate one test with the diyone tool, see Sec. 4.2. More specifically we generate IRIW+lwsyncs for Power (ppc-iriw-lwsync in the previous section) as follows:

% diyone -arch PPC -name IRIW+lwsyncs Rfe LwSyncdRR Fre Rfe LwSyncdRR Fre

We get the new source file IRIW+lwsyncs.litmus:

PPC IRIW+lwsyncs
"Rfe LwSyncdRR Fre Rfe LwSyncdRR Fre"
Prefetch=0:x=T,1:x=F,1:y=T,2:y=T,3:y=F,3:x=T
Com=Rf Fr Rf Fr
Orig=Rfe LwSyncdRR Fre Rfe LwSyncdRR Fre
{
0:r2=x;
1:r2=x; 1:r4=y;
2:r2=y;
3:r2=y; 3:r4=x;
}
 P0           | P1           | P2           | P3           ;
 li r1,1      | lwz r1,0(r2) | li r1,1      | lwz r1,0(r2) ;
 stw r1,0(r2) | lwsync       | stw r1,0(r2) | lwsync       ;
              | lwz r3,0(r4) |              | lwz r3,0(r4) ;
exists
(1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0)

The relevant meta-information is the “Com” line that describes how test threads are related — for instance, thread 0 stores a value to memory that is read by thread 1, written “Rf” (see Part II for more details). Custom affinity control will tend to run threads related by “Rf” on “close” logical processors, where we can for instance consider that close logical processors belong to the same physical core (SMT for Power). This minimal logical processor topology is described by two litmus command-line option: -smt <n> that specifies n-way SMT; and -smt_mode (seq|end) that specifies how logical processors from the same core are numbered. For a 8-cores 4-ways SMT power7 machine we invoke litmus as follows:

% litmus -mem direct -smt 4 -smt_mode seq -affinity custom -o a.tar IRIW+lwsyncs.litmus

Notice that memory mode is direct and that the number of available logical processors is unspecified, resulting in running one instance of the test. More importantly, notice that affinity control is enabled -affinity custom, additionally specifying custom affinity mode.

We then upload the archive a.tar to our Power7 machine, unpack, compile and run the test:

power7% tar xmf a.tar
power7% make
...
power7% ./IRIW+lwsyncs.exe -v
./IRIW+lwsyncs.exe  -v
IRIW+lwsyncs: n=1, r=1000, s=1000, +rm, +ca, p='0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31'
thread allocation: 
[23,22,3,2] {5,5,0,0}

Option -v instructs the executable to show settings of the test harness: we see that one instance of the test is run (n=1), size parameters are reminded (r=1000, s=1000) and shuffling of indirect memory mode is performed (+rm). Affinity settings are also given: mode is custom (+ca) and the logical processor sequence inferred is given (-p 0,1,…,31). Additionally, the allocation of test threads to logical processors is given, as […], as well as the allocation of test threads to physical cores, as {…}.

Here is the run output proper:

Test IRIW+lwsyncs Allowed
Histogram (15 states)
2700  :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=0;
142   :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=0;
37110 :>1:r1=0; 1:r3=1; 3:r1=0; 3:r3=0;
181257:>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=0;
78    :>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=0;
15    *>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=0;
103459:>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=0;
149486:>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=0;
30820 :>1:r1=0; 1:r3=0; 3:r1=0; 3:r3=1;
9837  :>1:r1=1; 1:r3=0; 3:r1=0; 3:r3=1;
2399  :>1:r1=1; 1:r3=1; 3:r1=0; 3:r3=1;
204629:>1:r1=0; 1:r3=0; 3:r1=1; 3:r3=1;
214700:>1:r1=1; 1:r3=0; 3:r1=1; 3:r3=1;
5186  :>1:r1=0; 1:r3=1; 3:r1=1; 3:r3=1;
58182 :>1:r1=1; 1:r3=1; 3:r1=1; 3:r3=1;
Ok

Witnesses
Positive: 15, Negative: 999985
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is validated
Hash=836eb3085132d3cb06973469a08098df
Com=Rf Fr Rf Fr
Orig=Rfe LwSyncdRR Fre Rfe LwSyncdRR Fre
Affinity=[2, 3] [0, 1] ; (1,2) (3,0)
Observation IRIW+lwsyncs Sometimes 15 999985
Time IRIW+lwsyncs 0.70

As we see, the test validates. Namely we observe the non-SC behaviour of IRIW in spite of the presence of two lwsync barriers. We may also notice, in the executable output some meta-information related to affinity: it reads that threads 2 and 3 one the one hand and threads 0 and 1 on the other are considered “close” (i.e. will run on the same physical core); while threads 1 and 2 on the one hand and threads 3 and 0 on the other are considered “far” (i.e. will run on different cores).

Custom affinity can be disabled by enabling another affinity mode. For instance with -i 0 we specify an affinity increment of zero. That is, affinity control is disabled altogether:

power7% ./IRIW+lwsyncs.exe -i 0 -v
./IRIW+lwsyncs.exe -i 0  -v
IRIW+lwsyncs: n=1, r=1000, s=1000, +rm, i=0, p='0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31'
Test IRIW+lwsyncs Allowed
Histogram (15 states)
...
No

Witnesses
Positive: 0, Negative: 1000000
Condition exists (1:r1=1 /\ 1:r3=0 /\ 3:r1=1 /\ 3:r3=0) is NOT validated
Hash=836eb3085132d3cb06973469a08098df
Com=Rf Fr Rf Fr
Orig=Rfe LwSyncdRR Fre Rfe LwSyncdRR Fre
Observation IRIW+lwsyncs Never 0 1000000
Time IRIW+lwsyncs 0.90

As we see, the test does not validate under those conditions.

Notice that section 12 describes a complete experiment on affinity control.

2.3  Controlling executable files

Test conditions

Any executable file produced by litmus accepts the following command line options.

-v
Be verbose, can be repeated to increase verbosity. Specifying -v is a convenient way to look at the default of options.
-q
Be quiet.
-a <n>
Run maximal number of tests concurrently for n available logical processors — parameter a in Sec. 2.1. Notice that if affinity control is enabled (see below), -a 0 will set parameter a to the number of logical processors effectively available.
-n <n>
Run n tests concurrently — parameter n in Sec. 2.1.
-r <n>
Perform n runs — parameter r in Sec. 2.1.
-fr <f>
Multiply r by f (f is a floating point number).
-s <n>
Size of a run — parameter s in Sec. 2.1.
-fs <f>
Multiply s by f.
-f <f>
Multiply s by f and divide r by f.

Notice that options -s and -r accept a generalised syntax for their integer argument: when suffixed by k (resp. M) the integer gets multiplied by 103 (resp. 106).

The following options are accepted only for tests compiled in indirect memory mode (see Sec. 2.1):

-rm
Do not shuffle pointer arrays, resulting a behaviour similar do direct mode, without recompilation.
+rm
Shuffle pointer arrays, provided for regularity.

The following option is accepted only for tests compiled with a specified stride value (see Sec. 2.1).

-st <n>
Change stride to <n>. The default stride is specified at compile time by litmus option -stride.

The following option is accepted when enabled at compile time:

-l <n>
Insert the assembly code of each thread in a loop of size <n>.
Affinity

If affinity control has been enabled at compilation time (for instance, by supplying option -affinity incr1 to litmus), the executable file produced by litmus accepts the following command line options.

-p <ns>
Logical processors sequence. The sequence <ns> is a comma separated list of integers, The default sequence is inferred by the executable as 0,1,…,A−1, where A is the number of logical processors featured by the tested machine; or is a sequence specified at compile time with litmus option -p.
-i <n>
Increment for allocating logical processors to threads. Default is specified at compile time by litmus option -affinity incr<n>. Notice that -i 0 disable affinity and that .exe files reject the -i option when affinity control has not been enabled at compile time.
+ra
Perform random allocation of affinity at each test round.
+ca
Perform custom affinity.

Notice that when custom affinity is not available, would it be that the test source lacked meta-information or that logical processor topology was not specified at compile-time, then +ca behaves as +ra.

Logical processors are allocated test instance by test instance (parameter n of Sec. 2.1) and then thread by thread, scanning the logical processor sequence left-to-right by steps of the given increment. More precisely, assume a logical processor sequence P = p0, p1, …, pA−1 and an increment i. The first processor allocated is p0, then pi, then p2i etc, Indices in the sequence P are reduced modulo A so as to wrap around. The starting index of the allocation sequence (initially 0) is recorded, and coincidence with the index of the next processor to be allocated is checked. When coincidence occurs, a new index is computed, as the previous starting index plus one, which also becomes the new starting index. Allocation then proceeds from this new starting index. That way, all the processors in the sequence will get allocated to different threads naturally, provided of course that less than A threads are scheduled to run. See section 2.2.3 for an example with A=16 and i=4.

2.4  Timebase synchronisation mode

Timebase synchronisation of the testing loop iterations (see Sec. 2.1) is selected by litmus command line option -barrier timebase. Some tests demonstrate that timebase synchronisation is more precise than user synchronisation (-barrier user and default).

For instance, consider the x86 test 6.SB.litmus, a 6-thread analog of the SB test:

X86 6.SB
"Fre PodWR Fre PodWR Fre PodWR Fre PodWR Fre PodWR Fre PodWR"
{
}
 P0          | P1          | P2          | P3          | P4          | P5          ;
 MOV [x],$1  | MOV [y],$1  | MOV [z],$1  | MOV [a],$1  | MOV [b],$1  | MOV [c],$1  ;
 MOV EAX,[y] | MOV EAX,[z] | MOV EAX,[a] | MOV EAX,[b] | MOV EAX,[c] | MOV EAX,[x] ;
exists
(0:EAX=0 /\ 1:EAX=0 /\ 2:EAX=0 /\ 3:EAX=0 /\ 4:EAX=0 /\ 5:EAX=0)

We first compile the test in user synchronisation mode, saving litmus output files into the pre-existing directory R:

% litmus -barrier user -vb true -o R 6.SB.litmus
% cd R
% make

The additional command line option -vb true activates the printing of some timing information on synchronisations.

We then directly run the test executable 6.SB.exe:

% ./6.SB.exe
Test 6.SB Allowed
Histogram (62 states)
7569  :>0:EAX=1; 1:EAX=0; 2:EAX=0; 3:EAX=0; 4:EAX=0; 5:EAX=0;
8672  :>0:EAX=0; 1:EAX=1; 2:EAX=0; 3:EAX=0; 4:EAX=0; 5:EAX=0;
...
326   :>0:EAX=1; 1:EAX=0; 2:EAX=1; 3:EAX=1; 4:EAX=1; 5:EAX=1;
907   :>0:EAX=0; 1:EAX=1; 2:EAX=1; 3:EAX=1; 4:EAX=1; 5:EAX=1;
No

Witnesses
Positive: 0, Negative: 1000000
Condition exists (0:EAX=0 /\ 1:EAX=0 /\ 2:EAX=0 /\ 3:EAX=0 /\ 4:EAX=0 /\ 5:EAX=0) is NOT validated
Hash=107f1303932972b3abace3ee4027408e
Observation 6.SB Never 0 1000000
Time 6.SB 0.85

The targeted outcome — reading zero in the EAX registers of the 6 threads — is not observed. We can observe synchronisation times for all tests runs with the executable command line option +vb:

% ./6.SB.exe  +vb
99999: 162768 420978 564546   -894 669468
99998:    -93      3     81   -174   -651
99997:   -975    -30    -33     93   -192
99996:    990   1098    852   1176    774
...

We see five columns of numbers that list, for each test run, the starting delays of P1, P2 etc. with respect to P0, expressed in timebase ticks. Obviously, synchronisation is rather loose, there are always two threads whose starting delays differ of of about 1000 ticks.

We now compile the same test in timebase synchronisation mode, saving litmus output files into the pre-existing directory RT:

% litmus -barrier timebase -vb true -o RT 6.SB.litmus
% cd RT
% make

And we run the test directly (option -vb disable the printing of any synchronisation timing information):

% ./6.SB.exe -vb
Test 6.SB Allowed
Histogram (64 states)
60922 *>0:EAX=0; 1:EAX=0; 2:EAX=0; 3:EAX=0; 4:EAX=0; 5:EAX=0;
38299 :>0:EAX=1; 1:EAX=0; 2:EAX=0; 3:EAX=0; 4:EAX=0; 5:EAX=0;
...
598   :>0:EAX=0; 1:EAX=1; 2:EAX=1; 3:EAX=1; 4:EAX=1; 5:EAX=1;
142   :>0:EAX=1; 1:EAX=1; 2:EAX=1; 3:EAX=1; 4:EAX=1; 5:EAX=1;
Ok

Witnesses
Positive: 60922, Negative: 939078
Condition exists (0:EAX=0 /\ 1:EAX=0 /\ 2:EAX=0 /\ 3:EAX=0 /\ 4:EAX=0 /\ 5:EAX=0) is validated
Hash=107f1303932972b3abace3ee4027408e
Observation 6.SB Sometimes 60922 939078
Time 6.SB 1.62

We now see that the test validates. Moreover all of the 64 possible outcomes are observed.

Timebase synchronisation works as follows: at every iteration,

  1. one of the threads reads timebase T;
  2. all threads synchronise by the means of a polling synchronisation barrier;
  3. each thread computes Ti = T + δi, where δi is the timebase delay, a thread specific constant;
  4. each thread loops, reading the timebase until the read value exceeds Ti.

By default the timebase delay δi is 211 = 2048 for all threads.

The precision of timebase synchronisation can be illustrated by enabling the printing of all synchronisation timings:

% ./6.SB.exe +vb
99999: 672294[1] 671973[1] 672375[1] 672144[1] 672303[1] 672222[1]
99998:  4524[1]  4332[1]  4446[1]  2052[65]  2064[73]  4095[1]
...
99983:  4314[1]  3036[1]  3141[1]  2769[1]  4551[1]  3243[1]
99982:* 2061[36]  2064[33]  2067[11]  2079[12]  2064[14]  2064[24]
99981:  2121[1]  2382[1]  2586[1]  2643[1]  2502[1]  2592[1]
...

For each test iteration and each thread, two numbers are shown (1) the last timebase value read by and (2) (in brackets []) how many iterations of loop 4. were performed. Additionally a star “*” indicates the occurrence of the targeted outcome. Here, we see that a nearly perfect synchronisation can be achieved (cf. line 99982: above).

Once timebase synchronisation have been selected (litmus option -barrier timebase), test executable behaviour can be altered by the following two command line options:

-ta <n>
Change the timebase delay δi of all threads.
-tb <0:n0;1:n1;>
Change the timebase delay δi of individual threads.

The litmus command line option -vb true (verbose barrier) governs the printing of synchronisation timings. It comes handy when choosing values for the -ta and -tb options. When set, the executable show synchronisation timings for outcomes that validate the test final condition. This default behaviour can be altered with the following two command line options:

-vb
Do not show synchronisation timings.
+vb
Show synchronisation timings for all outcomes.

Synchronisation timings are expressed in timebase ticks. The format depends on the synchronisation mode (litmus option -barrier)). This section just gave two examples for user mode (timings are show as differences from thread P0); and for timebase mode (timings are shown as differences from a commonly agreed by all thread timebase value). Notice that, when affinity control is enabled, the running logical processors of threads are also shown.

3  Usage of litmus

Arguments

litmus takes file names as command line arguments. Those files are either a single litmus test, when having extension .litmus, or a list of file names, when prefixed by @. Of course, the file names in @files can themselves be @files.

Options

There are many command line options. We describe the more useful ones:

General behaviour
-version
Show version number and exit.
-libdir
Show installation directory and exit.
-v
Be verbose, can be repeated to increase verbosity.
-mach <name>
Read configuration file name.cfg. See the next section for the syntax of configuration files.
-o <dest>
Save C-source of test files into <dest> instead of running them. If argument <dest> is an archive (extension .tar) or a compressed archive (extension .tgz), litmus builds an archive: this is the “cross compilation feature” demonstrated in Sec. 1.2. Otherwise, <dest> is interpreted as the name of an existing directory and tests are saved in it.
-crossrun <(user@)?host(:port)?|adb>
Run individual tests on a distant <host> (where a the ssh daemon may listen on the non-standard port port), default: disabled — i.e. run tests on the machine where the run.sh script runs. This option may be useful when the tested machine has little disk space or a crippled installation. The experimental adb test run tests by the means of the Adroid Debug Bridge in remote directory /data/tmp.
-index <@name>
Save the source names of compiled files in index file @name.
Test conditions

The following options set the default values of the options of the executable files produced:

-a <n>
Run maximal number of tests concurrently for n available logical processors — set default value for -a of Sec. 2.3. Default is 1 (run one test). When affinity control is enabled, the value 0 has the special meaning of having executables to set the number of available logical processors according to how many are actually present.
-limit <bool>
Do not process tests with more than n threads, where n is the number of available cores defined above. Default is false.
-r <n>
Perform n runs — set default value for option -r of Sec. 2.3. The option accepts generalised syntax for integers and default is 10.
-s <n>
Size of a run — set default value for option -s of Sec. 2.3. The option accepts generalised syntax for integers and default is 100000 (or 100k).

The following additional options control the various modes described in Sec. 2.1, and more. Those cannot be changed without running litmus again:

-barrier (user|userfence|pthread|none|timebase)
Set synchronisation mode, default user. Synchronisation modes are described in Sec. 2.1
-launch (changing|fixed)
Set launch mode, default changing.
-mem (indirect|direct)
Set memory mode, default indirect. It is possible to instruct executables compiled in indirect mode to behave almost as if compiled in direct mode, see Sec. 2.3.
-stride <n>
Specify a stride value of <n> — set default value for option -st of Sec. 2.3. See Sec. 2.1 for details on the stride parameter. If <n> is negative or zero, restore the default, which is stride feature disabled.
-st <n>
Alias for -stride <n>.
-para (self|shell)
Perform several tests concurrently, either by forking POSIX threads (as described in Sec. 2.1), or by forking Unix processes. Only applies for cross compilation. Default is self.
-prealloc <bool>
Enable or disable pre-allocation mode, default disabled. In pre-allocation mode, memory is allocated before forking any thread.
-preload (no|random|custom)
Specify preload mode (see Sec. 2.1), default is random. Starting from version 5.0 we provide the additional custom preload mode, for a finer control of prefetching and flushing of some memory locations by some threads. The feature is experimental and undocumented.
-safer (no|all|write)
Specify safer mode, default is write. When instructed to do so, executable files perform some consistency checks. Those are intended both for debugging and for dynamically checking some assumptions on POSIX threads that we rely upon. More specifically the test harness checks for the stabilisation of memory locations after a test round in the “all” and “write” mode, while the initial values of memory locations are checked in “all” mode.
-speedcheck (no|some|all)
Quick condition check mode, default is “no”. In mode “some”, test executable will stop as soon as its condition is settled. In mode “all”, the run.sh script will additionally not run the test if invoked once more later.

The following option commands affinity control:

-affinity (none|incr<n>|random|custom)
Enable (of disable with tag none) affinity control, specifying default affinity mode of executables. Default is none, i.e. executables do not include affinity control code. The various tags are interpreted as follows:
  1. incr<n>: integer <n> is the increment for allocating logical processors to threads — see Sec. 2.2. Notice that with -affinity incr0 the produced code features affinity control, which executable files do not exercise by default.
  2. random: executables perform random allocation of test threads to logical processors.
  3. custom: executables perform custom allocation of test threads to logical processors.
Notice that the default for executables can be overidden using options -i,+ra and +ca of Sec. 2.3.
-i <n>
Alias for -affinity incr<n>.

Notice that affinity control is not implemented for MacOs.

The following option is significant when affinity control is enabled. Otherwise it is a silent no-op.

-p <ns>
Specify the sequence of logical processors. The notation <ns> stands for a comma separated list of integers. Set default value for option -p of Sec. 2.3. Default for this -p option will let executable files compute the logical processor sequence themselves.

Custom affinity control (see Sec. 2.2.4) is enabled, first by enabling affinity control (e.g. with -affinity …), and then by specifying a logical processor topology with options -smt and -smt_mode.

-smt <n>
Specify that logical processors are close by groups of n, default is 1.
-smt_mode (none|seq|end)
Specify how “close” logical processors are numbered, default is none. In mode “end”, logical processors of the same core are numbered as c, c+Ac etc. where c is a physical core number and Ac is the number of physical cores available. In mode “seq”, logical processors of the same core are numbered in sequence.

Notice that custom affinity works only for those tests that include the proper meta-information. Otherwise, custom affinity silently degrades to random affinity.

Finally, a few miscellaneous options are documented:

-l <n>
Insert the assembly code of each thread in test in a loop of size <n>. Accepts generalised integer syntax, disabled by default. Sets default value for option -l of Sec. 2.3.

This feature may prove useful for measuring running times that are not too much perturbed by the test harness, in combination with options -s 1 -r 1.

-vb <bool>
Disable/enable the printing of synchronisation timings, default is false.

This feature may prove useful for analysing the synchronisation behaviour of a specific test, see Sec. 2.4.

-ccopts <flags>
Set gcc compilation flags (defaults: X86="-fomit-frame-pointer -O2", PPC/ARM="-O2").
-gcc <name>
Change the name of C compiler, default gcc.
-linkopt <flags>
Set gcc linking flags. (default: void).
-gas <bool>
Emit Gnu as extensions (default Linux/Mac=true, AIX=false)
Target architecture description

Litmus compilation chain may slightly vary depending on the following parameters:

-os (linux|mac|aix)
Set target operating system. This parameter mostly impacts some of gcc options. Default linux.
-ws (w32|w64)
Set word size. This option first selects gcc 32 or 64 bits mode, by providing it with the appropriate option (-m32 or -m64 on linux, -maix32 or -maix64 on AIX). It also slightly impacts code generation in the corner case where memory locations hold other memory locations. Default is a bit contrived: it acts as w32 as regards code generation, while it provides no 32/64 bits mode selection option to gcc.

Configuration files

The syntax of configuration files is minimal: lines “key = arg” are interpreted as setting the value of parameter key to arg. Each parameter has a corresponding option, usually -key, except for single-letter options:

  option    key    arg  
-aavailinteger
-ssize_of_testinteger
-rnumber_of_runinteger
-pprocslist of integers
-lloopinteger

Notice that litmus in fact accepts long versions of options (e.g. -avail for -a).

As command line option are processed left-to-right, settings from a configuration file (option -mach) can be overridden by a later command line option. Some configuration files for the machines we have tested are present in the distribution. As an example here is the configuration file hpcx.cfg.

size_of_test = 2000
number_of_run = 20000
os = AIX
ws = W32
# A node has 16 cores X2 (SMT)
avail = 32

Lines introduced by # are comments and are thus ignored.

Configuration files are searched first in the current directory; then in any directory specified by setting the shell environment variable LITMUSDIR; and then in litmus installation directory, which is defined while compiling litmus.


1
Power and x86-based systems provide a user accessible timebase counter that should provide consistent times to all cores and processors.
2
Parameter A is not to be confused with a of section 2.1. The former serves to compute logical threads while the latter governs the number of tests that run simultaneously. However parameters a will be set to A when affinity control is enabled and when a value is 0.

Previous Up Next