Frightening small children and disconcerting grown-ups: Concurrency in the Linux kernel— Companion Material

Jade Alglave         Luc Maranget         Paul E. McKenney         Andrea Parri         Alan Stern

Submission with appendix

We provide a complete version of our article with appendices.

Tests and models

Our complete test base consists of 8238 tests. The test base has been constructed both by accumulating significant tests and by systematic generation from cycles of non-SC executions. Tests are in two languages:

Using the following files, and the memory model simulator herd7, run a test by:

% herd7 -macros kernel.def -bell kernel.bell -model [filename]+

Once the files above have been copied to the local directory, your can also use the configuration file kernel.cfg:

% herd7 -conf kernel.cfg [filename]+

We run our tests on machine using a dedicated python script and the klitmus tool. We have run a subset of 2576 tests of the complete test base. Limitations are as follows:

Tested systems are “Power8”, a CHRP IBM pSeries machine with 8 POWER8E CPUs at 3.4GHz (Linux v4.4.40), “ARMv8”, a Amlogic ARMv8 board with 4 Cortex-A53 cores at 1.5GHz (Linux v3.14.29), “ARMv7”, a Raspberry Pi ARMv7 board with 4 Cortex-A7 cores at 900Mhz (Linux v4.9.20), and “X86”, a HP desktop computer with 2 (6-core) Intel Xeon E5-2620 v3 CPUs at 2.40GHz (Linux v3.16.04).

Examples from the paper

We here provide an actualized version of the result table of the paper

There are 15 such tests
LBAllowNo, 0/15GNo, 0/10GNo, 0/33GNo, 0/3.0GAllow
  Allow unseenAllow unseenAllow unseenAllow unseen 
LB+ctrl+mbForbidOk, 0/17GOk, 0/5.1GOk, 0/18GOk, 0/4.4GAllow
WRCAllowOk, 741k/7.7GOk, 13k/5.2GNo, 0/17GNo, 0/1.6GAllow
    Allow unseenAllow unseen 
WRC+wmb+acqAllowNo, 0/7.5GNo, 0/4.6GNo, 0/16GNo, 0/1.6GForbid
  Allow unseenAllow unseenAllow unseenAllow unseen 
WRC+po-rel+rmbForbidOk, 0/7.7GOk, 0/5.3GOk, 0/17GOk, 0/1.6GForbid
SBAllowOk, 4.4G/15GOk, 2.4G/10GOk, 765M/33GOk, 429M/3.0GAllow
SB+mbsForbidOk, 0/15GOk, 0/10GOk, 0/33GOk, 0/3.0GForbid
MPAllowOk, 57M/15GOk, 104M/10GNo, 0/33GOk, 3.0M/3.0GAllow
    Allow unseen  
MP+wmb+rmbForbidOk, 0/15GOk, 0/9.4GOk, 0/33GOk, 0/2.6GForbid
PeterZ-No-SynchroAllowOk, 26M/4.6GOk, 3.6M/900MOk, 351k/7.2GOk, 10k/380MAllow
PeterZForbidOk, 0/8.7GOk, 0/2.5GOk, 0/9.1GOk, 0/2.2GAllow
RCU-deferred-freeForbidOk, 0/256MOk, 0/576MOk, 0/25MOk, 0/15M
RCU-MPForbidOk, 0/672MOk, 0/336MOk, 0/336MOk, 0/336M
RWCAllowOk, 88M/11GOk, 94M/4.8GOk, 5.6M/17GOk, 7.5M/1.6GAllow
RWC+mbsForbidOk, 0/8.7GOk, 0/2.5GOk, 0/9.1GOk, 0/2.2GAllow

Detailed hardware results

Our main experimental result is here: our model is experimentally sound as demonstrated by the empty table “Forbidden by model and observed” below.

Comparison with the C11 model

In this section we compare the Linux model with the C11 model. The C11 CAT model is the default C11 model of herd7. This model is the second C11 model (partial order SC model) from “Overhauling SC atomics in C11 and OpenCL” (POPL’2016) by Mark Batty, Alastair F. Donaldson, and John Wickerson. The herd7 implementation of this model is adapted from the authors own CAT files.

Our main experimental result demonstrates that C11 SC fence atomic_thread_fence(memory_order_seq_cst) does not suffice to rule out all non-sequentially consistent executions — see the table here. By contrast and given our Linux model, smp_mb() suffices to guarantee sequential consistency. One should notice that all modern architectures provide such a strong fence that Linux implementations would map smp_mb() onto.

New this round

There is one such test
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-CEOk, 4/400M

Comparison with old model

More behaviours

There is one such test
C-WillDeacon-MP+o-r+ai-rmb-oAllowForbidNo, 0/660M
   Allow unseen

Less behaviours

There are 3 such tests
Luc17ForbidAllowNo, 0/31G
   Allow unseen
Luc19ForbidAllowNo, 0/31G
   Allow unseen
Luc15ForbidAllowNo, 0/31G
   Allow unseen

This document was translated from LATEX by HEVEA.