Part I |
Traditionally, a litmus test is a small parallel program designed to exercise the memory model of a parallel, shared-memory, computer. Given a litmus test in assembler (X86 or Power) litmus runs the test.
Using litmus thus requires a parallel machine, which must additionally feature gcc and the pthreads library. At the moment, litmus is a prototype and has numerous limitations (recognised instructions, limited porting). Nevertheless, litmus should accept all tests produced by the companion diy tool and has been successfully used on Linux, Mac OS, and on two versions of AIX.
The authors of litmus are Luc Maranget and Susmit Sarkar. The present litmus is inspired from a prototype by Thomas Braibant (INRIA Rhône-Alpes) and Francesco Zappa Nardelli (INRIA Paris-Rocquencourt).
Consider the following (rather classical) classic.litmus litmus test for X86:
X86 classic "Fre PodWR Fre PodWR" { x=0; y=0; } P0 | P1 ; MOV [y],$1 | MOV [x],$1 ; MOV EAX,[x] | MOV EAX,[y] ; exists (0:EAX=0 /\ 1:EAX=0)
A litmus test source has three main sections:
$ litmus classic.litmus %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for classic.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% X86 classic "Fre PodWR Fre PodWR" { x=0; y=0; } P0 | P1 ; MOV [y],$1 | MOV [x],$1 ; MOV EAX,[x] | MOV EAX,[y] ; exists (0:EAX=0 /\ 1:EAX=0) Generated assembler _litmus_P0_0_: movl $1,(%rcx) _litmus_P0_1_: movl (%rsi),%eax _litmus_P1_0_: movl $1,(%rsi) _litmus_P1_0_: movl $1,(%rsi) _litmus_P1_1_: movl (%rcx),%eax Test classic Allowed Histogram (4 states) 34 :>0:EAX=0; 1:EAX=0; 499911:>0:EAX=1; 1:EAX=0; 499805:>0:EAX=0; 1:EAX=1; 250 :>0:EAX=1; 1:EAX=1; Ok Witnesses Positive: 34, Negative: 999966 Condition exists (0:EAX=0 /\ 1:EAX=0) is validated Hash=eb447b2ffe44de821f49c40caa8e9757 Time classic 0.60 ...
The litmus test is first reminded, followed by actual assembler
— the machine is an AMD64, in-line address references disappeared,
registers may change, and
assembler syntax is now more familiar.
The test has run one million times, producing one million final states,
or outcomes for the registers EAX
of threads P0 and P1.
The test run validates the condition, with 34 positive witnesses.
With option -o <name.tar>, litmus does not run the test. Instead, it produces a tar archive that contains the C sources for the test.
Consider ppc-classic.litmus, a Power version of the previous test:
PPC ppc-classic "Fre PodWR Fre PodWR" { 0:r2=y; 0:r4=x; 1:r2=x; 1:r4=y; } P0 | P1 ; li r1,1 | li r1,1 ; stw r1,0(r2) | stw r1,0(r2) ; lwz r3,0(r4) | lwz r3,0(r4) ; exists (0:r3=0 /\ 1:r3=0)
Our target machine (ppc) runs Mac OS, wich we specify with the -os option:
$ litmus -o /tmp/a.tar -os mac ppc-classic.litmus $ scp /tmp/a.tar ppc:/tmp
Then, on the remote machine ppc:
ppc$ mkdir classic && cd classic ppc$ tar xf /tmp/a.tar ppc$ ls comp.sh run.sh ppc-classic.c outs.c utils.c
Test is compiled by the shell script comp.sh and run by the shell script run.sh:
$ sh comp.sh $ sh run.sh ... Test ppc-classic Allowed Histogram (3 states) 3947 :>0:r3=0; 1:r3=0; 499357:>0:r3=1; 1:r3=0; 496696:>0:r3=0; 1:r3=1; Ok Witnesses Positive: 3947, Negative: 996053 Condition exists (0:r3=0 /\ 1:r3=0) is validated ...
As we see, the condition validates also on Power.
The compilation script comp.sh produces an executable: ppc-classic.exe. Notice that ppc-classic.exe can be run directly, for a less verbose output.
Consider the additional test ppc-storefwd.litmus:
PPC ppc-storefwd "DpdR Fre Rfi DpdR Fre Rfi" { 0:r2=x; 0:r6=y; 1:r2=y; 1:r6=x; } P0 | P1 ; li r1,1 | li r1,1 ; stw r1,0(r2) | stw r1,0(r2) ; lwz r3,0(r2) | lwz r3,0(r2) ; xor r4,r3,r3 | xor r4,r3,r3 ; lwzx r5,r4,r6 | lwzx r5,r4,r6 ; exists (0:r3=1 /\ 0:r5=0 /\ 1:r3=1 /\ 1:r5=0)
To compile the two tests together, we can give two file names as arguments to litmus:
$ litmus -o /tmp/a.tar -os mac ppc-classic.litmus ppc-storefwd.litmus
Or, more conveniently, list the litmus sources in a file whose name starts with @:
$ cat @ppc ppc-classic.litmus ppc-storefwd.litmus $ litmus -o /tmp/a.tar -os mac @ppc
To run the test on the remote ppc machine, the same sequence of commands as in the one test case applies:
ppc$ tar xf /tmp/a.tar && sh comp.sh && sh run.sh ... Test ppc-classic Allowed Histogram (3 states) 4167 :>0:r3=0; 1:r3=0; 499399:>0:r3=1; 1:r3=0; 496434:>0:r3=0; 1:r3=1; Ok Witnesses Positive: 4167, Negative: 995833 Condition exists (0:r3=0 /\ 1:r3=0) is validated ... Test ppc-storefwd Allowed Histogram (4 states) 37 :>0:r3=1; 0:r5=0; 1:r3=1; 1:r5=0; 499837:>0:r3=1; 0:r5=1; 1:r3=1; 1:r5=0; 499912:>0:r3=1; 0:r5=0; 1:r3=1; 1:r5=1; 214 :>0:r3=1; 0:r5=1; 1:r3=1; 1:r5=1; Ok Witnesses Positive: 37, Negative: 999963 Condition exists (0:r3=1 /\ 0:r5=0 /\ 1:r3=1 /\ 1:r5=0) is validated ...
Now, the output of run.sh shows the result of two tests.
Users can control some of testing conditions. Those impact efficiency and outcome variability.
Sometimes one looks for a particular outcome
— for instance, one may seek to get the
outcome 0:r3=1; 1:r3=1;
that is missing
in the previous experiment for test ppc-classical
.
To that aim, varying test conditions may help.
Consider a test a.litmus designed to run on t threads P0,…, Pt−1. The structure of the executable a.exe that performs the experiment is as follows:
How this array cell is accessed depends upon the memory mode. In direct mode the array cell is accessed directly as x[i]; as a result, cells are accessed sequentially and false sharing effects are likely. In indirect mode the array cell is accessed by the means of a shuffled array of pointers; as a result we observed a much greater variability of outcomes.
If the preload mode is enabled, a preliminary loop of size s reads a random subset of the memory locations accessed by Pk. Preload have a noticeable effect.
The iterations performed by the different threads Tk may be unsynchronised, exactly synchronised by a pthread based barrier, or approximately synchronised by specific code. Absence of synchronisation may be interesting when t exceeds a. As a matter of fact, in this situation, any kind of synchronisation leads to prohibitive running times. However, for a large value of parameter s and small t we have observed spontaneous concurrent execution of some iterations amongst many. Pthread based barriers are exact but they are slow and in fact offers poor synchronisation for short code sequences. The approximate synchronisation is thus the preferred technique.
Hence, running a.exe produces n × r × s outcomes.
Parameters n, a, r and s can first be set directly while
invoking a.exe, using the appropriate command line options.
For instance, assuming t=2,
./a.exe -a 201 -r 10000 -s 1
and ./a.exe -n 1 -r 1 -s 1000000
will both produce one million outcomes, but the latter is probably
more efficient.
If our machine has 8 cores,
./a.exe -a 8 -r 1 -s 1000000
will yield 4 millions outcomes,
in a time that we hope not to exceed too much the one experienced
with ./a.exe -n 1
.
Also observe that the memory allocated is roughly proportional
to n × s, while the number of Tk threads created will be
t × n × r.
The run.sh shell script transmits its command line to all
the executable (.exe) files
it invokes, thereby providing a convenient means
to control testing condition for several tests.
Satisfactory test parameters are found by experimenting and
the control of executable files by command line options is designed for
that purpose.
Once satisfactory parameters are found, it is a nuisance to repeat them for every experiment. Thus, parameters a, r and s can also be set while invoking litmus, with the same command line options. In fact those settings command the default values of .exe files controls. Additionally, the synchronisation technique for iterations, the memory mode, and several others compile time parameters can be selected by appropriate litmus command line options. Finally, users can record frequently used parameters in configuration files.
Any executable file produced by litmus accepts the following command line options.
litmus takes file names as command line arguments. Those files are either a single litmus test, when having extension .litmus, or a list of file names, when prefixed by @. Of course, the file names in @files can themselves be @files.
There are many command line options. We describe the more useful ones:
The following options set the default values of the options of the executable files produced:
The following additional options control the various modes described in Section 2.1. Those cannot be changed without running litmus again:
Litmus compilation chain may slightly vary depending on the following parameters:
The syntax of configuration files is minimal: lines “key = arg” are interpreted as setting the value of parameter key to arg. Each parameter has a corresponding option, usually -key, except for single-letter options:
option | key | arg |
-a | avail | integer |
-s | size_of_test | integer |
-r | number_of_run | integer |
As command line option are processed left-to-right, settings from a configuration file (option -mach) can be overridden by a later command line option. Some configuration files for the machines we have tested are present in the distribution. As an example here is the configuration file hpcx.cfg.
size_of_test = 2000 number_of_run = 20000 os = AIX ws = W32 # A node has 16 cores X2 (SMT) avail = 32
Lines introduced by #
are comments and are thus ignored.
Configuration files are searched first in the current directory; then in any directory specified by setting the shell environment variable LITMUSDIR; and then in litmus installation directory, which is defined while compiling litmus.