The Phat Experiment

Jade Alglave   Luc Maranget
INRIA

Abstract: We ran The Phat Experiment from December 2009 to January 2010. We tested 3 Power machines with our diy tool. We present here the experimental protocol and results.

1  Experimental Protocol

  1. Make a directory exp, with a sub-directory src for sources. We provide the sources of our tests as the archive src.tgz. Those tests were produced with the diy tool from configuration files that are present in the archive.
  2. The structure of the src sub-directory is as follows:
  3. Make the same directory hierarchy as the one of src in each of the machine sub-directories. In each leaf directory, make a symbolic link to the corresponding source directory. This way, litmus can be run as litmus src/@all
  4. For a given machine, create a file rem that initially lists all the leaf directories. One complete run is a (litmus) run of all the tests in rem.
  5. After each complete run, check the output logs (we provide a simple tool readRelax). If a relaxation is confirmed, remove the corresponding line from rem.
  6. The parameters of litmus have been previously tuned so that most expected relaxations are confirmed after the first complete run. We run a total of about 10 complete runs.

During our experiment on hpcx, some tests in safe yielded Ok — see Sec. 2.2. We then followed a different procedure — see Sec. 5

2  Summary of the experimental results

2.1  Relaxations observed on squale, vargas and hpcx

RelaxDefinition1hpcxsqualevargas
PosRRrpo r2/40M3/2M0/4745M
PodRRrpo rℓ′2275/320M12/2M0/16725M
PodRWrpo wℓ′8653/400M3178/2M30/6305M
PodWWwpo wℓ′2029/4M2299/2M2092501/32M
PodWRwpo rℓ′51085/40M178286/2M672001/32M
Rfirfi7286/4M1133/2M145/32M
Rferfe177/400M0/1776M9/21M
LwSyncsWRwlwsync r243423/600M2/40M385/32M
LwSyncdWRwlwsync rℓ′103814/640M11/2M117670/32M
ACLwSyncsRRwrfe rlwsync r11/320M0/960M1/21M
ACLwSyncdRRwrfe rlwsync rℓ′124/400M0/7665M2/21M
BCLwSyncsWWwlwsync wrfe r68/400M0/560M2/160M
BCLwSyncdWWwlwsync wℓ′rfe rℓ′158/400M0/11715M1/21M

1
Notation: r (w) is a read (write) event with location ℓ.

When a relaxed relaxation is confirmed, the array cell above holds a link to the log of one run of one litmus test that confirms the relaxation. The cell also shows the number of outcomes that yielded Ok/total number of outcomes, for the selected log.

When a relaxed relaxation is not confirmed, the array cell holds 0/total number of outcomes for all tests of all performed complete runs.

2.2  Anomalies observed on hpcx

SourceCyclehpcxName in [1]
safe503Rfe SyncdRR Fre Rfe SyncdRR Fre2/320Miriw
safe388Rfe SyncdRR Fre SyncdWR Fre3/400Mrwc
safe180Rfe SyncdRR Fre LwSyncdWW Wse1/400M 
safe400Rfe SyncdRW Rfe SyncdRR Fre SyncdWR Fre1/320M 
safe071Rfe SyncdRW Wse SyncdWR Fre SyncdWW Wse1/320M 
acssrr002DpdR Fre Rfe SyncsRR DpdR Fre Rfe SyncsRR1/320M 
acsdrw027Wse LwSyncdWW Wse Rfe SyncdRW0/400M1 1/400M 
aclwdrw031Wse SyncdWR Fre Rfe LwSyncdRW1/400M 

1
Ok showed up on second run

Incidentally, those anomalies explain why we selected such a high number of outcomes per run for hpcx.

3  Detailed results on squale (Power 5)

3.1  Description of the machine

squale is a PowerPC G5 with 4 cores, running Mac Os X (a Power Mac G5).

3.2  Plain relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
PosRRn/aposrr1 [003] 3/2M00[003] DpdR Fre SyncdWW Rfe DpdR PosRR
Rfip. 661, 1st col, 1st bulletrfi[000] 388/2M, [001] 140/2M, [002] 1133/2M00[000] DpdR Fre Rfi DpdR Fre Rfi
PodRRn/apodrr1[000] 12/2M00[000] Fre SyncdWW Rfe PodRR
PodRWn/apodrw10/1796M3  Logs: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10,11,12
PodWWn/apodww[000] 2299/2M [001] 3/2M00[000] Wse PodWW Wse PodWW
PodWRn/apodwr[000] 178286/2M [001] 93911/2M00[000] Fre PodWR Fre PodWR
Rfen/arfe0/1776M  Logs: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10,11,12

When an expected relaxation is confirmed, the relevant line in the table above gives the number of Ok states observed (and the number of outcomes collected) for a few typical tests, with a link to the complete litmus log of the run that yielded Ok. When an expected relaxation is not confirmed, we rather give the total number of outcomes collected for this relaxation, and links to all logs. This number may vary a lot, since it critically depends on the number of tests produced by diy. See for instance ACLwSyncsRR and ACLwSyncdRR below.

We notice that no relaxation of Rfe is observed.

3.3  Barriers relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
LwSyncsWRp. 700, 1st col, L=1 caselwswr[002] 2/40M04[002] DpdR Fre LwSyncsWR DpdR Fre LwSyncsWR
LwSyncdWRp. 700, 1st col, L=1 caselwdwr[020] 11/2M, [020] 81/40M01,02[020] Fre LwSyncdWR Fre LwSyncdWR

3.4  A-Cumulativity relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
ACLwSyncsRRp. 700, 2nd col, 3rd bulletaclwsrr0/960M  Logs: 04, 05, 06, 07, 08, 09, 10,11,12
ACLwSyncdRRp. 700, 2nd col, 3rd bulletaclwdrr0/7665M  Logs: 02, 03, 04, 05, 06, 07, 08, 09, 10,11,12

Not observing expected cumulativity relaxations is compatible with the result of experiments on Rfe.

3.5  B-Cumulativity relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
BCLwSyncsWWp. 700, 2nd col, 3rd bulletbclwsww0/560M  Logs: 02, 04, 05, 06, 07, 08, 09, 10,11,12
BCLwSyncdWWp. 700, 2nd col, 3rd bulletbclwdww0/11715M  Logs: 02, 03, 04, 05, 06, 07, 08, 09, 10,11,12

Again, we did not observed the expected cumulativity relaxations.

3.6  Safe (non)-relaxations

SourcesLogsnoutstotal
safe02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12  341M–682M     198462M   

The nouts entry above counts the total number of final states (or outcomes) collected for a given test in safe. The number of outcomes varies from test to test, depending on the number of threads in the test. It also varies from log to log due to changes in litmus parameters. This number is to be compared to the number of outcomes needed to observe a relaxation (typically a few millions).

The last column shows the total number of outcomes collected.

4  Detailed results on vargas (Power 6)

4.1  Description of the machine

vargas is a Power 6 with 112 nodes of 32 cores. The operating system is AIX. We run our experiments on one node, i.e. on 32 cores.

4.2  Plain relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
PosRRn/aposrr10/4745M  Logs: 00, 01, 02, 03, 04, 10, 11, 12, 13, 14, 15
Rfip. 661, 1st col, 1st bulletrfi[000] 948/32M [001] 377/32M [002] 145/32M00[002] DpdW Wse Rfi DpdW Wse Rfi
PodRRn/apodrr0/16725M  Logs: 00, 01, 02, 03, 04, 05 10, 11, 12, 13, 14, 15
PodRWn/apodrw0/6305M  Logs: 00, 01, 02, 03, 04, 10, 11, 12, 13, 14, 15
PodWWn/apodww[000] 2092501/32M [001] 37638/21M00[000] Wse PodWW Wse PodWW
PodWRn/apodwr[000] 672001/32M, [001] 5762/21M00[000] Fre PodWR Fre PodWR
Rfen/arfe[002] 1/21M [003] 9/21M00[002] DpdW Wse Rfe DpdW Rfe

Our experiments do not confirm PodRR, which is confirmed on Power 5. This may be related to Power 6 being more in-order.

4.3  Barriers relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
LwSyncsWRp. 700, 1st col, L=1 caselwswr[001] 17/21M [002] 385/32M [003] 5/21M00[003] DpdR Fre LwSyncsWR DpdR Fre LwSyncsWR
LwSyncdWRp. 700, 1st col, L=1 caselwdwr[020] 117670/32M00[020] Fre LwSyncdWR Fre LwSyncdWR

4.4  A-Cumulativity relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
ACLwSyncsRRp. 700, 2nd col, 3rd bulletaclwsrr[000] 1/21M [002] 3/16M00[000] DpdR Fre SyncdWW Wse Rfe LwSyncsRR
ACLwSyncdRRp. 700, 2nd col, 3rd bulletaclwdrr[010] 1/16M [017] 2/21M00[010] Fre Rfe LwSyncdRR Fre Rfe LwSyncdRR

It is to be noticed that the test aclwdrr010 is iriw with lwsync.

4.5  B-Cumulativity relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
BCLwSyncsWWp. 700, 2nd col, 3rd bulletbclwsww[000] 2/160M13[000] DpdR Fre LwSyncsWW Rfe DpdR Fre LwSyncsWW Rfe
BCLwSyncdWWp. 700, 2nd col, 3rd bulletbclwdww[011] 1/21M01[011] LwSyncdRR Fre LwSyncdWW Wse LwSyncdWW Rfe

4.6  Safe (non)-relaxations

SourcesLogsnoutstotal
safe00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12  1568M–3136M     969906M   

The nouts entry above counts the total number of final states (or outcomes) collected for a given test in safe. The number of outcomes varies from test to test, depending on the number of threads in the test. It also varies from log to log due to changes in litmus parameters. This number is to be compared to the number of outcomes needed to observe a relaxation (typically 16–32 millions).

5  Detailed results on hpcx (Power 5)

Due to the discovery of relaxations that do not follow our Power model, we have altered our procedure:

As a result, we find anomalous relaxations for A-cumulativity, and none for B-cumulativity nor safe plain. Thus, we may venture to assume that the problem originates in either our model of A-cumulativity, or in hpcx implementation of A-cumulativity.

5.1  Description of the machine

hpcx is a Power 5 with 160 nodes of 16 physical cores. The operating system is AIX. We run our experiments on one node, i.e. on 16 cores, with the possibility to enable SMT.

5.2  Plain relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
PosRRn/aposrr1[003] 2/40M, [005] 1/40M00[003] DpdR Fre SyncdWW Rfe DpdR PosRR
Rfip. 661, 1st col, 1st bulletrfi[000] 15/4M, [001] 2848/4M, [002] 7286/4M00[000] DpdR Fre Rfi DpdR Fre Rfi
PodRRn/apodrr12[000] 54328/640M, [001] 2275/320M, [002] 3596/320M,…02[000] Fre SyncdWW Rfe PodRR
PodRWn/apodrw10/31680M3  Logs: 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17
PodWWn/apodww[000] 2029/4M, [001] 26/4M00[001] Wse PodWW Wse PodWW Wse PodWW
PodWRn/apodwr[000] 5714/40M, [001] 51085/40M00[001] Fre PodWR Fre PodWR Fre PodWR
Rfen/arfe2[002] 21/400M, [003] 177/400M, [004] 8/320M, [005] 6/320M, [006] 29/320M00[002] DpdW Wse Rfe DpdW Rfe

5.3  Barriers relaxations

 Documentation [2]SourcesOkLog(s)Example of cycle
LwSyncsWRp. 700, 1st col, L=1 caselwswr2[000] 137/230M, [001] 757/400M, [002] 243423/640M, [003] 944/400M04[002] DpdR Fre LwSyncsWR DpdR Fre LwSyncsWR
LwSyncdWRp. 700, 1st col, L=1 caselwdwr2[020] 103814/640M02[020] Fre LwSyncdWR Fre LwSyncdWR

5.4  A-Cumulativity relaxations

We confirm expected relaxations.

 Documentation [2]SourcesOkLog(s)Example of cycle
ACLwSyncsRRp. 700, 2nd col, 3rd bulletaclwsrr2[000] 11/320M19[000] DpdR Fre Rfe LwSyncsRR DpdR Fre Rfe LwSyncsRR
ACLwSyncdRRp. 700, 2nd col, 3rd bulletaclwdrr2[000] 2/320M, [001] 3/320M, [002] 33/400M, [003] 2/320M,…02[002] Fre SyncdWW Wse Rfe LwSyncdRR

All the remaining cases of A-cumulativity are anomalously relaxed.

 Documentation [2]SourcesOkLog(s)Example of cycle
ACSyncsRRp. 700, 2nd col, 3rd bulletacssrr2[002] 1/320M04[002] DpdR Fre Rfe SyncsRR DpdR Fre Rfe SyncsRR
ACSyncdRRp. 700, 2nd col, 3rd bulletacsdrr2[017] 2/400M02[017] Fre SyncdWR Fre Rfe SyncdRR
ACSyncdRWp. 700, 2nd col, 3rd bulletacsdrw2[027] 0/400M + [027] 1/400M = 1/800M02, 03[027] Wse LwSyncdWW Wse Rfe SyncdRW
ACLwSyncdRWp. 700, 2nd col, 3rd bulletaclwdrw2[031] 1/400M02[031] Wse SyncdWR Fre Rfe LwSyncdRW

5.5  B-Cumulativity relaxations

We easily confirm the expected relaxations involving the B-cumulativity of lwsync.

 Documentation [2]SourcesOkLog(s)Example of cycle
BCLwSyncsWWp. 700, 2nd col, 3rd bulletbclwsww2[000] 7/320M, [001] 3/320M, [002] 6/320M, [003] 19/400M,…02[003] DpdW Wse SyncdWR Fre LwSyncsWW Rfe
BCLwSyncdWWp. 700, 2nd col, 3rd bulletbclwdww2[001] 11/320M, [002] 8/320M, [003] 11/320M, [004] 19/400M,…02 [020] pdR Fre LwSyncdWW Rfe DpdR Fre LwSyncdWW Rfe

None of the remaining cases of B-cumulativity is observed to be relaxed.

RelaxSourcesLogsnoutstotal
BCSyncsRWbcssrw2 08, 09, 10, 11, 12, 13, 14, 16, 173360M–6400M172960M
BCSyncdRWbcsdrw2 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 16, 175280M–10560M22440M
BCSyncsWWbcssww2 02, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 16, 174640M–6200M66640M
BCSyncdWWbcsdww2 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 16, 175280M–10560M245520M
BCLwSyncsRWbclwsrw2 08, 09, 10, 11, 12, 13, 14, 16, 173360M–6400M172960M
BCLwSyncdRWbclwdrw2 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 16, 175280M–10560M22440M

5.6  Safe (non)-relaxations

The safe set is restricted to plain relaxations.

SourcesLogsnoutstotal
safe202, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12  7040M–11264M     184760M   

6  PodRW is relaxed

To confirm the relaxation of PodRW, which our models predicts, we used an advanced feature of diy: composite relaxations. Namely, the configuration file podrwposwr.conf sets the relax list to [PodRW,PosWR]. That is, PodRW followed by PosWR.

Observe that PosWR is not safe, hence absent from the cycles of standard experiments on PodRW. The litmus tests are in podrwposwr.

MachineOkLog(s)  Example of cycle
squale [023] 178/2M13[0231] DpdR Fre SyncdWW Rfe PodRW PosWR
vargas 0/1244M  Logs: 00
hpcx [003] 179/320M,…[030] 8653/400M,…202[0301] DpdR Fre SyncdWR Fre SyncdWW Rfe PodRW PosWR

The experiments show the sequence “PodRW followed by PosWR” to be relaxed on squale and hpcx. If PodRW was not relaxed, then PodRW followed by PosWR would not be either, by the definition of preserved programming order we give in the associated paper. As the sequence is observed to be relaxed, we consider that PodRW is observed to be relaxed.

References

[1]
H.-J. Boehm and S.V. Adve. Foundations of the C++ concurrency memory model. In Proc. PLDI, 2008.
[2]
Power ISA Version 2.06. 2009.

3
See Sec. 6 for detailed explanations.
1
This test makes use of BCSyncdWW, which has not been observed to be relaxed, thus is considered as safe.
2
We enabled SMT for this test.

This document was translated from LATEX by HEVEA.