Analysis of remaining invalid tests

So as to select the most simple tests, we first consider the tests of which invalid behaviours all become allowed, when a subset of checks amongst the four sc per location, no thin air, observation and propagation is removed from model definition (see items 1, 2 and 3 below). Then, we consider some behaviours that our models do not explain (item 4). The remaining invalid tests are complex, we list them at item 5.

Some invalid tests can be explained solely by new violations of sc per location. By “new” we here mean violations other than read-after-read hazards.
In practice we select tests of which invalid behaviours all belong to the “S” batch. Those are the 9 tests S+PPO230, MOREDETOUR0058, MOREDETOUR0052, MOREDETOUR0185, MOREDETOUR0188, MOREDETOUR0326, MOREDETOUR0068, MOREDETOUR0216 and MOREDETOUR0342. Examination of the above tests most often reveals violations of sc per location in the style of test CoRW (i.e. co; rf vs. po):
```
Executions for behaviour: "0:R2=2 ; x=2"
```
We have observed several such violations on Tegra3, Exynos4412 and A6X, the vast majority of observations being on Tegra3.
Notice that we do not observe the test CoRW directly. Instead we find the CoRW pattern (i.e. a cycle po vs. co; rf) in more complex tests. This applies for instance to test MOREDETOUR0058, for which we observed invalid behaviours on Exynos4412.
```
Executions for behaviour: "0:R2=1 ; 1:R0=3 ; 2:R0=2 ; 2:R2=1 ; y=3"
```
The diagrams above may look complex, but the violation of the CoRW pattern is quite easy to find. In all diagram above a sequence co; rf contradicts the programm order from the read event e to the write event f (both by Thread 1), which both access location y. The “co” component may be a bit hidden as there is no edge in execution diagrams for sequences of transitive relations. In any case, the violation of CoRW is very apparent from the test condition only. This condition specifies a final value of 3 for the location y. Morever, Thread 1 reads that final value of location y and then writes again to location y. This consists in a obvious violation of sc per location.
Notice that we observed the following invalid behaviour of the test MOREDETOUR0052 both on Tegra3 and Exynos4412:
```
Executions for behaviour: "0:R3=2 ; 1:R0=4 ; 2:R0=4 ; 2:R2=1 ; y=4"
```
Again, the violation of CoRW is easy to find, considering that the location y has the final value 4 and that Thread 1 reads that final value before writing again to y.
Finally, we observe 3 invalid behaviours for the test S+PPO230, all on A6X.
1. Behaviour 00 features a CoRW style violation: e −co→ b −rf→ c −po-loc→ e.
2. Behaviours 01 and 02 feature a rare CoRW1 style violation: e −rf→ c −po-loc→ e.

Some invalid behaviours, observed on Tegra3 only and rather unfrequently, correspond to executions that include simpler tests, which are highly likely to be forbidden by any reasonable model. Those behaviours all violate the observation check. Notice that, due to the lack of lightweight fences in ARM, such behaviours are also rejected by the propagation check.

In practice, we select tests of which invalid behaviours all belong to the “OP” batch. Those are the 15 tests WRR+2W+addr+dmb, WRR+2W+addr+dmb.st, WRR+2W+ctrlisb+dmb, WRR+2W+ctrlisb+dmb.st, WRR+2W+ctrlisb+dsb.st, DETOUR0661, DETOUR0687, ISA2+dmb+fri+addr, MP+dmb+pos-ctrlisb+BIS, DETOUR0689, MOREDETOUR0457, MOREDETOUR0458, MOREDETOUR0785, MOREDETOUR0461 and MOREDETOUR0788. Examination of the above tests reveals violations of the MP+fence+ppo idiom, with various instances of fence and ppo. It should be noticed that no violation of those simpler idioms is observed when the idioms are tested directly.

As an example, consider the test MP+dmb+pos-ctrlisb+BIS, for which we have observed three invalid behaviours w.r.t the armllh.cat model:

Executions for behaviour: "1:R0=0 ; 1:R1=1 ; 1:R2=0 ; y=1"

Executions for behaviour: "1:R0=1 ; 1:R1=1 ; 1:R2=0 ; y=1"

Executions for behaviour: "1:R0=1 ; 1:R1=1 ; 1:R2=0 ; y=2"

The executions above (which are all unambiguously observed, as the final value of location y is collected) all include the forbidden execution MP+dmb+ctrlisb, whose status is uncontroversial. Tests DETOUR0687 and DETOUR0689 are similar, if not identical, the difference with MP+dmb+pos-ctrlisb+BIS being that the final value of the location y is not collected.

The following table lists the offending tests, indexed by the specific MP+fence+ppo idiom which is violated:

Idiom	Tests

MP+dmb+ctrlisb	MP+dmb+pos-ctrlisb+BIS, DETOUR0687, DETOUR0689, WRR+2W+ctrlisb+dmb, MOREDETOUR0457, MOREDETOUR0458
MP+dmb.st+ctrlisb	WRR+2W+ctrlisb+dmb.st
MP+dsb.st+ctrlisb	WRR+2W+ctrlisb+dsb.st
MP+dmb+addr	DETOUR0661, WRR+2W+addr+dmb, ISA2+dmb+fri+addr, MOREDETOUR0785, MOREDETOUR0461 MOREDETOUR0788
MP+dmb.st+addr	WRR+2W+addr+dmb.st

We also have found some tests, of which invalid behaviours all violate the sc per location, observation and propagation checks.
Those are MOREDETOUR0684 and MOREDETOUR0927. Those invalid behaviours are observed on Tegra3 only.
As an example, consider MOREDETOUR0927 for which we observed one invalid behaviour that corresponds to 24 invalid executions, due a long, impossible to observe, co-chain for location x. We here show one such execution:
As can be seen by following the link MOREDETOUR0927, we observe a final value of 4 in location x. This leads to a dramatic violation of sc per location by Thread 3, which first reads 4 from x, then writes 3 to x and then reads 5 from x. In effect we observe two violations of sc per location here, first a CoRW style (co;rf vs. po-loc from read h to write i) and a read-after-read hazard anomaly CoRR (fr;rf vs. po-loc from read h to read j). One may also notice that there is a dmb femce between i and j.
One could legitimately argue that the test is complex, that the invalid behaviour is observed unfrequently, and, above all, that our explanation is based upon the observation of x final value being 4. As a matter of fact, our harness might be wrong here, or suffer from the acknowledged read-after-read hazard anomaly. Nevertheless, we have enough other simpler observations of invalid behaviours to assert safely that we have discovered anomalies.
The other test in the SOP category is MOREDETOUR0684, it presents the same violation of sc per location as MOREDETOUR0927, but that time on Thread 2 and relative to location y.
For two tests, we observed behaviours that are out of the enveloppe of our model. We call those “unclassified” tests.
More precisely, we observed some memory reads that cannot be explained by the rf relation. Or, some load instruction from the test acts as if it reads a value that is not written by any store instruction present in the test.
Let us first examine DETOUR0008, which we observed exactly 1000 times, in a single run, on Tegra3.
The “execution diagram” above has been drawn by hand, as herd cannot generate such an execution. Nevertheless the diagram depicts the troublesome situation where the read b reads the unexpected value −239487. This value is not random, it is a sentinel initial value for some array which collects observed reads. More precisely, the test threads save the value of observed reads into this array, which is later read by the test harness. Overall, the test behaves as if Thread 0 has not run at all. This may be due to a flaw in the POSIX threads library, or more precisely in the interaction of the pthread_create (create a thread) and pthread_join (wait for a thread to terminate) functions. The count of 1000 invalid behaviours supports this hypothesis, as the precise test run that exhibited the anonaly was running 1000 times the test for a given allocation of two POSIX threads. (See our paper Litmus: Running Tests Against Hardware for some details on our testing infrastructure).
The second unclassified test LB+PPO0536 has been observed on Tegra2 and only once.
The “execution diagram” above shows that the read a reads the value 64 from the location z, while no store instruction from the test stores 64 in z. The anomaly may originate from a hardware failure of the SSD persistent storage device of our Trimslice computer, which has been running tests with few interrupts since July 2011. Namely, the value 64 is only one bit flip away from the legitimate value 0. Morever we have also observed similar bit flips in test logs from our Trimslice computer, including in text parts. Further notice that we have stopped running tests on this machine. We have not discarded the here discussed behaviour, as an indication of the difficulty of analysing our test results.

Finally there remains some tests of which invalid behaviours belong to several classification batches.

The complexity of these tests results from two causes, first we have observed several invalid behaviours for one test, which behaviours may belong do different batches; and second, those tests are ambiguous because they feature long co-chains, a situation that leads to numerous (potential) executions, wich may belong to different batches. Those tests are the 7 tests MOREDETOUR0740, MOREDETOUR0824, MOREDETOUR0782, MOREDETOUR0503, MOREDETOUR0810, MOREDETOUR0812 and MOREDETOUR0864. The following table gives links to all invalid behaviours indexed by test.

Test		Behaviours

MOREDETOUR0740		00, 01, 02
MOREDETOUR0824		00, 01, 02, 03, 04, 05, 06
MOREDETOUR0782		00, 01, 02, 03, 04, 05, 06, 07, 08, 09
MOREDETOUR0503		00, 01, 02, 03
MOREDETOUR0810		00, 01
MOREDETOUR0812		00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11
MOREDETOUR0864		00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27

Also notice that all the invalid behaviours above are observed on Tegra3, except for the ones of MOREDETOUR0810 that are observed on Exynos4412.

As an example of an analysis we consider MOREDETOUR0824, which features 7 invalid behaviours. Amongst those, behaviours 00, 01, 02, 03, 04 and 05 all feature a relatively obvious violation of the MP+dmb+addr idiom: Thread 0 stores the value 4 into the location x, executes the dmb fence instruction and then stores the value 1 into the location y; while Thread 1 reads the value 1 from y and then, by some address dependent load, reads the initial value 0 from the location x. The behaviour can be identified from execution diagram without paying much attention to the other accesses to the location x by Threads 1,2 and 3, which accesses apparently act as noise that favors the occurrence of the forbidden behaviour of MP+dmb+addr.

The behaviour 06, which we reproduce below, is more interesting.

	Model	Tegra3	Exynos4412	Exynos5410	APQ8064
MOREDETOUR0824	Forbid	Ok, 1/3.4G	No, 0/11G	No, 0/3.1G	No, 0/2.3G
			Allow unseen	Allow unseen	Allow unseen

Executions for behaviour: "1:R0=1 ; 1:R2=2 ; 2:R0=0 ; 2:R2=4 ; x=3"

The offending behaviour correspond to six execution diagrams, because there are 6 = 3! possible total orders for the four writes to location x, considering that the observed final value of 3 is maximal. Observe that we cannot decide what happened at runtime. In other words, the six possible executions are only six possibilities amongst which we cannot decide.

Moreover, Thread 0 stores the value 4 into the location x (event a), while Thread 1 reads the value 2 from the location x (event d). As a result we witness a violation of the MP+dmb+addr idiom, if and only if the write of 2 into the location x (event h) is co-before the write of 4 into the location x (event a). This corresponds to the last three execution diagrams above, were the cycle a −dmb→ b −rf→ c −addr→ d −fr→ a is clearly visible. By contrast, no such cycle appears when the write a (stored value 4) is co-before the write h (stored value 2) in the first three execution diagrams above. But then, Thread 2 performs a sc per location violation by writing 2 into x (event h) and then reading 4 from x (event i). The violation is in the CoWR style, i.e. fr vs. po. Finally, we here observe either a violation of the MP+dmb+addr idiom, or a sc per location violation of a new kind. Which we observe we cannot decide.