Classification of forbidden behaviours

Our test sets (power-tests.tgz, 8121 tests, and arm-tests.tgz, 9763 tests) are quite large and some tests are rather complex. Thus, when some behaviour that is forbidden by the model is observed on a machine — an invalid behaviour, it is quite difficult to find an explanation in terms of the model.

To alleviate this problem, we classify forbidden executions according to the checks that forbid them. The classification is automated, which enables us to categorise invalid behaviours automatically. A benefit of the approach is to select the most simple and thus most significant invalid behaviours automatically. As a matter of fact, our model consists in four checks, sc per location, no thin air, observation and propagation; which from now we designate with the letters S, T, O and P.

We partition forbidden executions into 15 batches, according to the checks that reject them. For instance, the executions of batch S are forbidden because they are rejected by the S check, and by that check only; or the executions of batch OP are forbidden because they are rejected by the O and P checks, and by those checks only.

As an example we consider a few tests, which we list below with the count of all their possible executions:

CoWW (2), CoRR1 (4), CO-MP+dmb+po (18), CO-W+RR+fri-dmb (18), LB+ctrls (4), LB+dmbs (4), MP (4), MP+dmb+addr (4), SB+dmbs (4), DETOUR0153 (12), LB+PPO0518 (36), DETOUR0391 (192), WRC+ctrl+addr (8), WRC+dmb+addr (8), RDW (16), MOREDETOUR0443 (768), MOREDETOUR0714 (768)

CoWW (1), CoRR1 (1), CO-MP+dmb+po (6), CO-W+RR+fri-dmb (14), LB+ctrls (1), LB+dmbs (1), MP+dmb+addr (1), SB+dmbs (1), DETOUR0153 (4), LB+PPO0518 (18), DETOUR0391 (150), WRC+dmb+addr (1), RDW (5), MOREDETOUR0443 (672), MOREDETOUR0714 (684)

Notice that MP and WRC+ctrl+addr do not appear in the list above, as all their possible executions are allowed.

The following table summarises the classification of forbidden executions: numbers are execution counts and, when not zero, hold a link to execution diagrams.

	ALL	S	T	O	P	ST	SO	SP	TO	TP	OP	STO	STP	SOP	TOP	STOP

CoWW	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
CoRR1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
CO-MP+dmb+po	6	3	0	0	0	0	0	3	0	0	0	0	0	0	0	0
CO-W+RR+fri-dmb	14	9	0	0	0	0	2	0	0	0	0	0	0	3	0	0
LB+ctrls	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
LB+dmbs	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
MP+dmb+addr	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
SB+dmbs	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
DETOUR0153	4	0	2	0	0	2	0	0	0	0	0	0	0	0	0	0
LB+PPO0518	18	18	0	0	0	0	0	0	0	0	0	0	0	0	0	0
DETOUR0391	150	88	1	0	0	17	38	0	0	0	0	6	0	0	0	0
WRC+dmb+addr	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
RDW	5	4	0	0	0	0	0	0	0	0	1	0	0	0	0	0
MOREDETOUR0443	672	496	0	0	0	42	0	0	0	0	12	0	0	116	0	6
MOREDETOUR0714	684	444	0	0	19	42	0	173	0	0	0	0	6	0	0	0

The table above helps to identify simple tests, with one forbidden execution that have a simple explanation. For instance, CoRR1 has only one forbidden execution and it is a violation of sc per location alone. Similar tests are LB+ctrls (violation of no thin air) and SB+dmbs (violation of propagation). Things are only slightly more complex for MP+dmb+addr, whose forbidden execution is rejected by observation and propagation and thus goes into the OP batch; or for LB+dmbs whose forbidden execution is rejected by no thin air and propagation and thus goes into the TP batch.

For non-ambiguous tests, such as RDW, forbidden behaviours match forbidden executions and we have a clear explanation for all forbidden behaviours: namely, any of the 5 forbidden behaviours of RDW is either amongst the 4 violations of sc per location, or is the unique violation of OP.

Ambiguous tests require a more careful examination, for instance CO-MP+dmb+po has only three forbidden behaviours — see the values read by Thread 1, which can be 2 then 1, 2 then 0, or 1 then 0. However, those correspond to 6 forbidden executions, due to the non-observation of the final value of location x. More precisely, any of the forbidden behaviours corresponds to two executions: one in the S batch, and the other in the SP batch. Notice that the executions from the SP batch feature a cycle of dmb;co, which is both a violation of sc per location (po-loc contradicts com) and of propagation (there is a cycle in dmb union com). Thus, if we observe some forbidden behaviour of CO-MP+dmb+po on hardware (and we do!), we cannot decide between a simple violation of sc per location or a very serious flaw involving the dmb fence. Some other tests, such as CoRR1 (forbidden and observed) and CoWW (forbidden and not observed), help us to select the first possibility. Sometimes the forbidden behaviours of an ambiguous test have a simple explanation: test LB+PPO0518 is ambiguous, due to the non-observation of the second load of location x by Thread 1 (event d). Nevertheless all its forbidden executions are in the S batch.

In practice we apply classification to invalid behaviours, i.e. behaviours that are forbidden by model yet observed on hardware — see the classification of all the invalid behaviours of our model on ARM. As we found invalid behaviours on ARM only, we only classify ARM forbidden behaviours: S, T, O, P, ST, SO, SP, TO, TP, OP, STO, STP, SOP, TOP, and STOP.