Comparing two Power models

This document shows tables for the “Comparing two Power models” experiments. The models we compare are:

We used our toolset, including the automated front-end dont, to find tests that show that the two models are not comparable. Namely, behaviours exist that are allowed by one model and forbidden by the other.

As all such behaviours are unobserved on PowerG5, Power6 and Power7 machines. Both models remain experimentally valid.

Behaviours forbidden by Power2010, allowed by Power2011

In a previous experiment, we have explored Power2010, yielding a particular safe set. By definition, all tests built from those candidate relaxations are forbidden by Power2010. We now use dont in conformance mode to run ppcmem the (off-line) executable version of Power2011 on all such tests up to three processors.

The procedure finally yields five tests allowed by Power2011 and forbidden by Power2010, as shown immediately by the experiment log. See a complete description of the experiment. We give names to those five tests, following the scheme introduced in “Understanding POWER Multiprocessors” and summarised in this document. We then run them on Power machines (or benefit from previous runs, some of the tests being already known to us), resulting in the following table:

 Power2010Power2011PowerG5Power6Power7
R+lwsync+syncForbidAllow0/2.5G0/8.3G0/41G
WRR+2W+sync+lwsyncForbidAllow0/1.3G0/4.8G0/26G
WRW+WR+lwsync+syncForbidAllow0/1.3G0/4.8G0/26G
Z6.3+lwsync+sync+addrForbidAllow0/908M0/2.4G0/14G
Z6.3+lwsync+sync+ctrlisyncForbidAllow0/380M0/693M0/8.5G

The table above reminds the Allow/Forbid status of the tests according to the two models. It also shows how many times the targeted behaviour has been observed on hardware and how many time a test has been run. For instance 0/2.1G means that no observation occured out of about 2.1 × 109 runs of a given test performed on a given implementation.

Strictly speaking, only the last test Z6.3+lwsync+sync+ctrlisync is new. However our tool set proved useful, enabling a painless generation of some tests forbidden by Power2010, and the automation of checking the result of running ppcmem on those.

Behaviours allowed by Power2010, forbidden by Power2011

One can now expect that we explore Power2011 by using dont, deduce forbidden behaviours and check with dont that Power2010 allows some of those. Unfortunately, it does not works, as the prototype ppcmem is two slow and consumes too much memory to withstand a complete exploration in an acceptable time.

Fortunately, the exploration of Power2010 already yielded a bunch of observed behaviours as a by product. Hence, we proceed in the opposite direction. Namely, we simply run ppcmem on those tests, with decent time and memory limits (complete run log), and then select amongst tests that has terminated those that did not produced the targetted behaviour — using the “Observation name Never” information present in logs.

The procedure yields 197 tests that we rename according to our systematic scheme and shall now comment. In this discussion we refer to the “architectural intent”. As Power2011 was designed with help from IBM, we claim it to be close to “architectural intent”. By contrast, Power2010 was designed from black box experiments. However, Power2011 does not, in our opinion, supersedes Power2010, as being still valid valid after two years of intensive experiments on hardware, and given its simplicity, which proves beneficial in applications such as automatic fence insertion.

Simple tests

The MP (message passing) and WRC (write to read causality) tests correspond to known programming idioms. From our understanding, the architectural intent commands that “minimal” protection (the lightweight barrier lwsync and dependencies) suffices.

 Power2010Power2011PowerG5Power6Power7
MP+lwsync+addrAllowForbid0/7.4G0/22G0/130G
MP+lwsync+ctrlisyncAllowForbid0/5.0G0/17G0/102G

 

 Power2010Power2011PowerG5Power6Power7
WRC+lwsync+addrAllowForbid0/3.7G0/18G0/146G
WRC+lwsync+ctrlisyncAllowForbid0/2.5G0/9.0G0/64G

As demonstrated by the run counts above, implementations follow the architectural intent, to the best of our knowledge.

The following two series are vaguely similar to MP and WRC, but are not claimed to be related to significant programming idioms. Nevertheless, a model has to handle any situation, and here again “minimal” protection suffices.

 Power2010Power2011PowerG5Power6Power7
S+lwsync+addrAllowForbid0/1.8G0/4.5G0/22G
S+lwsync+ctrlAllowForbid0/2.6G0/9.3G0/43G
S+lwsync+dataAllowForbid0/2.6G0/9.3G0/43G

 

 Power2010Power2011PowerG5Power6Power7
WWC+lwsync+addrAllowForbid0/2.7G0/12G0/89G
WWC+lwsync+ctrlAllowForbid0/380M0/693M0/8.5G
WWC+lwsync+dataAllowForbid0/380M0/693M0/8.5G

A new familly of tests

We notice the presence of a previously unknown to us and thus untested familly of tests: W+RW+RW+RR, an extension of WRC (which can be spelled out as W+RW+RR). One may remark that, again, minimal protection suffices, as shown for instance by the first test below:

 Power2010Power2011PowerG5Power6Power7
W+RW+RW+RR+lwsync+addr+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+addr+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+addr+fri+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+addr+fri+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+ctrl+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+ctrl+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+data+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+lwsync+data+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+addr+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+addr+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+addr+fri+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+addr+fri+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+ctrl+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+ctrl+ctrlisyncAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+data+addrAllowForbid0/380M0/528M0/6.8G
W+RW+RW+RR+sync+data+ctrlisyncAllowForbid0/380M0/528M0/6.8G

“Out of thin air”-like tests

Any “causal” cycle must be rejected by a realistic model. Power2010 defines such a cycle by a combination of read to write (data or control dependency) and of Rf (the relation from a write to any read that takes its value from it).

The prototype of a test that exhibit such a causal dependency is LB+addrs (load buffer, with address dependencies):

        
PPC LB+addrs
"DpAddrdW Rfe DpAddrdW Rfe"
Cycle=Rfe DpAddrdW Rfe DpAddrdW
{
0:r2=x; 0:r5=y;
1:r2=y; 1:r5=x;
}
 P0            | P1            ;
 lwz r1,0(r2)  | lwz r1,0(r2)  ;
 xor r3,r1,r1  | xor r3,r1,r1  ;
 li r4,1       | li r4,1       ;
 stwx r4,r3,r5 | stwx r4,r3,r5 ;
exists
(0:r1=1 /\ 1:r1=1)

The following tests are variations on LB and 3.LB (the three processor extension of LB) which suggests that pure dependencies can be supplemented with read to write and write to write memory accesses to the same location and by the same processor (Fri and Wsi). Namely, in “causal” loops, data, address and control dependencies do not fully account for the natural ordering constraints that core designers follow.

 Power2010Power2011PowerG5Power6Power7
LB+addr+addr+friAllowForbid0/760M0/992M0/14G
LB+addr+addr+wsiAllowForbid0/760M0/992M0/14G
LB+addr+data+wsiAllowForbid0/760M0/992M0/14G
LB+addr+fri+addr+friAllowForbid0/760M0/992M0/14G
LB+addr+fri+ctrlAllowForbid0/760M0/992M0/14G
LB+addr+fri+dataAllowForbid0/760M0/992M0/14G
LB+addr+fri+data+wsiAllowForbid0/760M0/992M0/14G
LB+addr+wsi+addr+friAllowForbid0/760M0/992M0/14G
LB+addr+wsi+addr+wsiAllowForbid0/760M0/992M0/14G
LB+addr+wsi+ctrlAllowForbid0/760M0/992M0/14G
LB+addr+wsi+dataAllowForbid0/760M0/992M0/14G
LB+addr+wsi+data+wsiAllowForbid0/760M0/992M0/13G
LB+data+data+wsiAllowForbid0/760M0/992M0/14G
LB+data+wsi+ctrlAllowForbid0/760M0/992M0/14G
LB+data+wsi+data+wsiAllowForbid0/760M0/992M0/14G
LB+lwsync+addr+friAllowForbid0/760M0/992M0/14G
LB+lwsync+addr+wsiAllowForbid0/760M0/992M0/14G
LB+lwsync+data+wsiAllowForbid0/760M0/1.0G0/14G

 

 Power2010Power2011PowerG5Power6Power7
3.LB+addr+addr+fri+addrAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+addrAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+addr+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+ctrl+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+ctrl+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+ctrl+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+addrAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+data+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+fri+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+addr+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+ctrl+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+fri+data+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+addr+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+ctrl+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+addr+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+dataAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+syncAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+wsi+addr+friAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+addr+wsi+data+wsi+dataAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+addr+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+addr+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+data+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+ctrl+data+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+addr+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+addr+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+ctrl+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+data+data+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+data+data+wsi+dataAllowForbid0/380M0/693M0/8.5G
3.LB+data+data+wsi+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+data+data+wsi+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+data+wsi+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+addr+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+addr+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+ctrl+ctrlAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+ctrl+data+wsiAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+ctrl+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+ctrl+syncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+data+lwsyncAllowForbid0/380M0/693M0/8.5G
3.LB+data+wsi+data+syncAllowForbid0/380M0/693M0/8.5G

Remaining tests

We give the classified remaining tests for completeness.

 Power2010Power2011PowerG5Power6Power7
ISA2+lwsync+addr+fri+syncAllowForbid0/380M0/693M0/8.5G
ISA2+lwsync+addr+syncAllowForbid0/2.5G0/9.0G0/64G
ISA2+lwsync+addr+wsi+syncAllowForbid0/380M0/693M0/8.5G
ISA2+lwsync+ctrl+syncAllowForbid0/2.5G0/9.0G0/64G
ISA2+lwsync+data+syncAllowForbid0/2.5G0/9.0G0/64G
ISA2+lwsync+data+wsi+syncAllowForbid0/380M0/693M0/8.5G
ISA2+sync+addr+addrAllowForbid0/2.5G0/9.0G0/64G
ISA2+sync+addr+ctrlisyncAllowForbid0/2.5G0/9.0G0/64G
ISA2+sync+addr+fri+addrAllowForbid0/380M0/693M0/8.5G
ISA2+sync+addr+fri+ctrlisyncAllowForbid0/380M0/693M0/8.5G
ISA2+sync+addr+wsi+addrAllowForbid0/380M0/693M0/8.5G
ISA2+sync+addr+wsi+ctrlisyncAllowForbid0/380M0/693M0/8.5G
ISA2+sync+ctrl+addrAllowForbid0/2.5G0/9.0G0/64G
ISA2+sync+ctrl+ctrlisyncAllowForbid0/2.5G0/9.0G0/64G
ISA2+sync+data+addrAllowForbid0/3.7G0/18G0/166G
ISA2+sync+data+ctrlisyncAllowForbid0/2.5G0/9.0G0/64G
ISA2+sync+data+wsi+addrAllowForbid0/380M0/693M0/8.5G
ISA2+sync+data+wsi+ctrlisyncAllowForbid0/380M0/693M0/8.5G

 

 Power2010Power2011PowerG5Power6Power7
Z6.1+sync+lwsync+ctrlAllowForbid0/380M0/693M0/7.5G
Z6.1+sync+lwsync+dataAllowForbid0/380M0/693M0/7.5G

 

 Power2010Power2011PowerG5Power6Power7
Z6.2+lwsync+addr+fri+lwsyncAllowForbid0/380M0/693M0/8.5G
Z6.2+lwsync+addr+fri+syncAllowForbid0/380M0/693M0/7.5G
Z6.2+lwsync+addr+lwsyncAllowForbid0/908M0/2.4G0/13G
Z6.2+lwsync+addr+syncAllowForbid0/908M0/2.4G0/13G
Z6.2+lwsync+ctrl+lwsyncAllowForbid0/380M0/693M0/8.5G
Z6.2+lwsync+ctrl+syncAllowForbid0/380M0/693M0/7.5G
Z6.2+lwsync+data+lwsyncAllowForbid0/380M0/693M0/8.5G
Z6.2+lwsync+data+syncAllowForbid0/380M0/693M0/7.5G
Z6.2+sync+addr+addrAllowForbid0/908M0/2.4G0/14G
Z6.2+sync+addr+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+dataAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+fri+addrAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+fri+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+fri+dataAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+wsi+addrAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+addr+wsi+dataAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+ctrl+addrAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+ctrl+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+ctrl+dataAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+addrAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+dataAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+wsi+addrAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+wsi+ctrlAllowForbid0/380M0/693M0/8.5G
Z6.2+sync+data+wsi+dataAllowForbid0/380M0/693M0/8.5G

This document was translated from LATEX by HEVEA.