Part IV |
The tool jingle7 is a litmus test converter. It translates tests from one architecture to another. For instance a LISA test may be translated into an AArch64 (or ARMv8) test.
Transation is directed by user-specified conversion rules. All conversion rules of a specific (i.e. one architecture to another) are regrouped in a specific “theme” files.
A theme file starts by specififing source and target architectures. For instance:
LISA to AArch64
Then conversion rules follow. For instance here are two rules that translate LISA loads and stores to AArch64 loads and stores:
"r[] %x %y" -> "LDR %x,[%y]" "w[] %x %y" -> "STR %y,[%x]"
More generally, conversions rules are akin to rewrite
rules "L" -> "R".
In the simplest case,
a rule associates a source instruction pattern L
to a target instruction pattern R.
Patterns may contain identifiers, which are bound to concrete items
during matching.
Identifiers starting with the character “%
" represent registers.
The above rules suffice to translate the LISA (register indirect)
load instruction “r[] r0 r1
into the equivalent AArch64 instruction “LDR R3,[R4]
”.
Observe that registers are translated.
Here LISA r0
and r1
are translated into AArch64 R3
and R4
.
If a later LISA instruction uses, say, r1
again,
that register will be translated to the same AArch64 register R4
.
As a result, the sequence r[] r0 r1; w[] r2 r1
will get
translated to LDR R3,[R4]; STR R5,[R4]
.
There are additional categories of identifiers:
Identifiers starting with “&
represent constants.
For instance, the following rule translates register load by a constant:
"mov %r &c" -> "MOV %r,&c"
Identifiers starting with an alphabetical letter represent memory locations. Consider for instance the following LISA to AArch64 converstion rule:
"w[] x %r" -> "STR %r,[%x]"
The identifier “x
” will match any symbol.
In LISA, such symbols embedded in code are memory locations.
By constrast modern architectures, such as AArch64, do
not permit to embed memory addresses in code.
As a consequence, the matched symbolic location from the left hand-side
pattern is replaced by a register, written “%x
” above.
The tool jingle7 will initialise the corresponding AArch64 register
appropriately.
For instance the LISA instrucation w[] z r0
will get translated
to say STR R1,[R2]
, with R2
initial value being specified
as z
in the final target litmus test.
Sometimes, a single instruction with a couple of symbolic register is not nearly enough to express every conversion in a good fashion.
A single rule ought to be enough for anybody to understand:
"w[] x &c" -> "MOV %tmp,&c; STR %tmp,[%x]"
In AArch64, it is not possible to directly store a constant value in memory, thus we have to express the LISA instruction in two AArch64 instructions with a register picked on the fly.
This example illustrates at the same time:
&
.
In case of ambiguity, jingle7 choose the rule to apply according to their order in the file: the higher the rule, the higher its priority.
The last section covers what is necessary to convert most tests from any assembly language to another. However, we might want to allow our tool to work on higher level languages.
The suite currently support a relevant subset of the C language. That means our tool must not only convert sequences of instructions but also potential control structures.
For this purpose, we must allow the expression of chunks of code:
C to LISA "if(x==constvar:c) codevar:t; else codevar:e;" -> "mov %test (eq %x &c); b[] %test then; codevar:e; b[] 1 end; then : codevar:t; end :"
This awfully looks like a compilation process (and it is!), but in practice, conditionals are used to express control dependencies which can have a much simpler form in assembly code.
Now, the important point here is the use of codevar:
to state that
arbitrary code is expected. Such code will also be converted by the same
given set of rules, thus allowing us to convert arbitrarily deep code.
Notice that labels too are subject to identification and that is perfectly fine to end a pattern with it since the tool will see a nop-like instruction.
The special keyword constvar:
is used in C for the obvious reason
that &
has an entirely different meaning.
Now with a well defined file, we can let the burden of converting our thousands of tests to jingle7.
In order to fully understand its behaviour, we shall explore more in detail its mechanisms.
The rules we define are nothing but generic patterns, for them to hold any meaning we have to find an application in the source program. Such application is simply an instance of the conversion of a part of the source program. The source part must match the pattern in the left side of the rule, the instance of the conversion is the code defined in the right side plus a set of what we call substitutions.
The substitutions are the link between the identifiers in the rule and their actual representations in both the source and target programs.
To roughly formalise, App(R, P) = (Rright, {(id,Srcid,Tgtid) ∣ id is an identifier of R}) would be the application of the rule R on a part of the source program P, where P is a possible instance of Rleft and Srcid (Tgtid) is the source (respectively the target) representation.
Now that we have a first step of local rewriting, we want to convert an entire program. Thanks to relative simplicity of the supported languages, decomposing a source program linearly is a good enough approach for our needs.
The process can be divided in two steps:
The program is decomposed witha greedy algorithm applying the rules according to their priority order.
A recursive definition would be:
|
Of course, we assume the given rule set is sufficient to assure that all part of the program will be matched. If not, users have to refine it.
With the result of the previous step, which already looks like a converted program, we have to actually substitute the abstract identifiers for their representation in the target language given by the associated substitutions, for each part. Then simply append the results to one another in order.
|
There is one key aspect that we have yet to cover. Until now, the coherence between the source and target representation of a substitution in regards to the others was assumed, i.e.:
∀ id1,id2, Srcid1 = Srcid2 ⇒ Tgtid1 = Tgtid2 |
However, if this property in the substitutions of a single application is given by the rule itself, ensuring it between different applications is not as obvious because it would not make sense to compare pattern identifiers from different rules. Thus, we need to keep track of any Src,Tgt association made by applications through the whole program.
To do so, we define a global environment that preserves the property by delivering a target language representation for each source value:
|
Nothing exotic here, this is a part of the application of a rule as it is supposed to be the only safe way to obtain a target representation.
Considering the LISA litmus test MP+poplainplain+fencedmballsyplainplain.litmus:
LISA MP+poplainplain+fencedmballsyplainplain "PodWWPlainPlain RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain" Cycle=RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain PodWWPlainPlain Relax= Safe=Rfe Fre PodWW FencedRR Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=PodWWPlainPlain RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain { } P0 | P1 ; w[] x 1 | r[] r0 y ; w[] y 1 | f[dmb,all,sy] ; | r[] r1 x ; exists (1:r0=1 /\ 1:r1=0)
and the theme file BelltoAArch64.theme in which the only relevant rules, in regards of the test above, are:
"r[] %x y" -> "LDR %x,[%y]" "w[] x &c" -> "MOV %tmp,&c; STR %tmp,[%x]" "f[dmb,all,sy]" -> "DMB SY"
the output of a call to jingle7 with those in arguments will be:
AArch64 MP+poplainplain+fencedmballsyplainplain "PodWWPlainPlain RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain" Mapping=1:X2=r1,1:X0=r0 Hash=6945a3af44248d1d826a14b204ccf067 Cycle=RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain PodWWPlainPlain Relax= Safe=Rfe Fre PodWW FencedRR Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Rf Fr Orig=PodWWPlainPlain RfePlainPlain FenceDmbAllSydRRPlainPlain FrePlainPlain {0:X1=y; 0:X0=x; 1:X3=x; 1:X1=y;} P0 | P1 ; MOV X2,#1 | LDR X0,[X1] ; STR X2,[X0] | DMB SY ; MOV X2,#1 | LDR X2,[X3] ; STR X2,[X1] | ; exists (1:X0=1 /\ 1:X2=0)
According to the algorithm described in the last section, jingle7 has converted the three instructions of P1 with, in order, the load rule, the fence rule and the load again. For P0 it apply the store rule twice.
Notice that the same AArch64 register X2
is used as
the temporary register for both applications. It is safe to do so
as long as its value is not needed but outside the scope
of each application.
The initialisation and test conditions are converted as well: the registers are from the right architecture and the location, directly used in LISA, are now bound to specific registers.
A Mapping
metadatum is also added to allow comparison tools
to work properly.
The command jingle7 handles its arguments as file names, just as herd7. Those files are either a single litmus test when having extension .litmus, or a list of file names when prefixed by @.
There is one option that must always be used:
.theme
.
When the tool fails to find a conversion in a program, it will print the remaining instructions.
It makes easy for the user to pin down missing rules in his .theme
file
as the first instruction printed is likely the one that cannot be matched.
Using those error might be helpful to build such file instead of trying to figure it out as a whole beforehand.