Documentation/RCU/torture.txt

   1 RCU Torture Test Operation
   2
   3
   4 CONFIG_RCU_TORTURE_TEST
   5
   6 The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
   7 implementations.  It creates an rcutorture kernel module that can
   8 be loaded to run a torture test.  The test periodically outputs
   9 status messages via printk(), which can be examined via the dmesg
  10 command (perhaps grepping for "torture").  The test is started
  11 when the module is loaded, and stops when the module is unloaded.
  12
  13
  14 MODULE PARAMETERS
  15
  16 This module has the following parameters:
  17
  18 fqs_duration    Duration (in microseconds) of artificially induced bursts
  19                 of force_quiescent_state() invocations.  In RCU
  20                 implementations having force_quiescent_state(), these
  21                 bursts help force races between forcing a given grace
  22                 period and that grace period ending on its own.
  23
  24 fqs_holdoff     Holdoff time (in microseconds) between consecutive calls
  25                 to force_quiescent_state() within a burst.
  26
  27 fqs_stutter     Wait time (in seconds) between consecutive bursts
  28                 of calls to force_quiescent_state().
  29
  30 gp_normal       Make the fake writers use normal synchronous grace-period
  31                 primitives.
  32
  33 gp_exp          Make the fake writers use expedited synchronous grace-period
  34                 primitives.  If both gp_normal and gp_exp are set, or
  35                 if neither gp_normal nor gp_exp are set, then randomly
  36                 choose the primitive so that about 50% are normal and
  37                 50% expedited.  By default, neither are set, which
  38                 gives best overall test coverage.
  39
  40 irqreader       Says to invoke RCU readers from irq level.  This is currently
  41                 done via timers.  Defaults to "1" for variants of RCU that
  42                 permit this.  (Or, more accurately, variants of RCU that do
  43                 -not- permit this know to ignore this variable.)
  44
  45 n_barrier_cbs   If this is nonzero, RCU barrier testing will be conducted,
  46                 in which case n_barrier_cbs specifies the number of
  47                 RCU callbacks (and corresponding kthreads) to use for
  48                 this testing.  The value cannot be negative.  If you
  49                 specify this to be non-zero when torture_type indicates a
  50                 synchronous RCU implementation (one for which a member of
  51                 the synchronize_rcu() rather than the call_rcu() family is
  52                 used -- see the documentation for torture_type below), an
  53                 error will be reported and no testing will be carried out.
  54
  55 nfakewriters    This is the number of RCU fake writer threads to run.  Fake
  56                 writer threads repeatedly use the synchronous "wait for
  57                 current readers" function of the interface selected by
  58                 torture_type, with a delay between calls to allow for various
  59                 different numbers of writers running in parallel.
  60                 nfakewriters defaults to 4, which provides enough parallelism
  61                 to trigger special cases caused by multiple writers, such as
  62                 the synchronize_srcu() early return optimization.
  63
  64 nreaders        This is the number of RCU reading threads supported.
  65                 The default is twice the number of CPUs.  Why twice?
  66                 To properly exercise RCU implementations with preemptible
  67                 read-side critical sections.
  68
  69 onoff_interval
  70                 The number of seconds between each attempt to execute a
  71                 randomly selected CPU-hotplug operation.  Defaults to
  72                 zero, which disables CPU hotplugging.  In HOTPLUG_CPU=n
  73                 kernels, rcutorture will silently refuse to do any
  74                 CPU-hotplug operations regardless of what value is
  75                 specified for onoff_interval.
  76
  77 onoff_holdoff   The number of seconds to wait until starting CPU-hotplug
  78                 operations.  This would normally only be used when
  79                 rcutorture was built into the kernel and started
  80                 automatically at boot time, in which case it is useful
  81                 in order to avoid confusing boot-time code with CPUs
  82                 coming and going.
  83
  84 shuffle_interval
  85                 The number of seconds to keep the test threads affinitied
  86                 to a particular subset of the CPUs, defaults to 3 seconds.
  87                 Used in conjunction with test_no_idle_hz.
  88
  89 shutdown_secs   The number of seconds to run the test before terminating
  90                 the test and powering off the system.  The default is
  91                 zero, which disables test termination and system shutdown.
  92                 This capability is useful for automated testing.
  93
  94 stall_cpu       The number of seconds that a CPU should be stalled while
  95                 within both an rcu_read_lock() and a preempt_disable().
  96                 This stall happens only once per rcutorture run.
  97                 If you need multiple stalls, use modprobe and rmmod to
  98                 repeatedly run rcutorture.  The default for stall_cpu
  99                 is zero, which prevents rcutorture from stalling a CPU.
 100
 101                 Note that attempts to rmmod rcutorture while the stall
 102                 is ongoing will hang, so be careful what value you
 103                 choose for this module parameter!  In addition, too-large
 104                 values for stall_cpu might well induce failures and
 105                 warnings in other parts of the kernel.  You have been
 106                 warned!
 107
 108 stall_cpu_holdoff
 109                 The number of seconds to wait after rcutorture starts
 110                 before stalling a CPU.  Defaults to 10 seconds.
 111
 112 stat_interval   The number of seconds between output of torture
 113                 statistics (via printk()).  Regardless of the interval,
 114                 statistics are printed when the module is unloaded.
 115                 Setting the interval to zero causes the statistics to
 116                 be printed -only- when the module is unloaded, and this
 117                 is the default.
 118
 119 stutter         The length of time to run the test before pausing for this
 120                 same period of time.  Defaults to "stutter=5", so as
 121                 to run and pause for (roughly) five-second intervals.
 122                 Specifying "stutter=0" causes the test to run continuously
 123                 without pausing, which is the old default behavior.
 124
 125 test_boost      Whether or not to test the ability of RCU to do priority
 126                 boosting.  Defaults to "test_boost=1", which performs
 127                 RCU priority-inversion testing only if the selected
 128                 RCU implementation supports priority boosting.  Specifying
 129                 "test_boost=0" never performs RCU priority-inversion
 130                 testing.  Specifying "test_boost=2" performs RCU
 131                 priority-inversion testing even if the selected RCU
 132                 implementation does not support RCU priority boosting,
 133                 which can be used to test rcutorture's ability to
 134                 carry out RCU priority-inversion testing.
 135
 136 test_boost_interval
 137                 The number of seconds in an RCU priority-inversion test
 138                 cycle.  Defaults to "test_boost_interval=7".  It is
 139                 usually wise for this value to be relatively prime to
 140                 the value selected for "stutter".
 141
 142 test_boost_duration
 143                 The number of seconds to do RCU priority-inversion testing
 144                 within any given "test_boost_interval".  Defaults to
 145                 "test_boost_duration=4".
 146
 147 test_no_idle_hz Whether or not to test the ability of RCU to operate in
 148                 a kernel that disables the scheduling-clock interrupt to
 149                 idle CPUs.  Boolean parameter, "1" to test, "0" otherwise.
 150                 Defaults to omitting this test.
 151
 152 torture_type    The type of RCU to test, with string values as follows:
 153
 154                 "rcu":  rcu_read_lock(), rcu_read_unlock() and call_rcu(),
 155                         along with expedited, synchronous, and polling
 156                         variants.
 157
 158                 "rcu_bh": rcu_read_lock_bh(), rcu_read_unlock_bh(), and
 159                         call_rcu_bh(), along with expedited and synchronous
 160                         variants.
 161
 162                 "rcu_busted": This tests an intentionally incorrect version
 163                         of RCU in order to help test rcutorture itself.
 164
 165                 "srcu": srcu_read_lock(), srcu_read_unlock() and
 166                         call_srcu(), along with expedited and
 167                         synchronous variants.
 168
 169                 "sched": preempt_disable(), preempt_enable(), and
 170                         call_rcu_sched(), along with expedited,
 171                         synchronous, and polling variants.
 172
 173                 "tasks": voluntary context switch and call_rcu_tasks(),
 174                         along with expedited and synchronous variants.
 175
 176                 Defaults to "rcu".
 177
 178 verbose         Enable debug printk()s.  Default is disabled.
 179
 180
 181 OUTPUT
 182
 183 The statistics output is as follows:
 184
 185         rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
 186         rcu-torture: rtc:           (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
 187         rcu-torture: Reader Pipe:  727860534 34213 0 0 0 0 0 0 0 0 0
 188         rcu-torture: Reader Batch:  727877838 17003 0 0 0 0 0 0 0 0 0
 189         rcu-torture: Free-Block Circulation:  155440 155440 155440 155440 155440 155440 155440 155440 155440 155440 0
 190         rcu-torture:--- End of test: SUCCESS: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
 191
 192 The command "dmesg | grep torture:" will extract this information on
 193 most systems.  On more esoteric configurations, it may be necessary to
 194 use other commands to access the output of the printk()s used by
 195 the RCU torture test.  The printk()s use KERN_ALERT, so they should
 196 be evident.  ;-)
 197
 198 The first and last lines show the rcutorture module parameters, and the
 199 last line shows either "SUCCESS" or "FAILURE", based on rcutorture's
 200 automatic determination as to whether RCU operated correctly.
 201
 202 The entries are as follows:
 203
 204 o       "rtc": The hexadecimal address of the structure currently visible
 205         to readers.
 206
 207 o       "ver": The number of times since boot that the RCU writer task
 208         has changed the structure visible to readers.
 209
 210 o       "tfle": If non-zero, indicates that the "torture freelist"
 211         containing structures to be placed into the "rtc" area is empty.
 212         This condition is important, since it can fool you into thinking
 213         that RCU is working when it is not.  :-/
 214
 215 o       "rta": Number of structures allocated from the torture freelist.
 216
 217 o       "rtaf": Number of allocations from the torture freelist that have
 218         failed due to the list being empty.  It is not unusual for this
 219         to be non-zero, but it is bad for it to be a large fraction of
 220         the value indicated by "rta".
 221
 222 o       "rtf": Number of frees into the torture freelist.
 223
 224 o       "rtmbe": A non-zero value indicates that rcutorture believes that
 225         rcu_assign_pointer() and rcu_dereference() are not working
 226         correctly.  This value should be zero.
 227
 228 o       "rtbe": A non-zero value indicates that one of the rcu_barrier()
 229         family of functions is not working correctly.
 230
 231 o       "rtbke": rcutorture was unable to create the real-time kthreads
 232         used to force RCU priority inversion.  This value should be zero.
 233
 234 o       "rtbre": Although rcutorture successfully created the kthreads
 235         used to force RCU priority inversion, it was unable to set them
 236         to the real-time priority level of 1.  This value should be zero.
 237
 238 o       "rtbf": The number of times that RCU priority boosting failed
 239         to resolve RCU priority inversion.
 240
 241 o       "rtb": The number of times that rcutorture attempted to force
 242         an RCU priority inversion condition.  If you are testing RCU
 243         priority boosting via the "test_boost" module parameter, this
 244         value should be non-zero.
 245
 246 o       "nt": The number of times rcutorture ran RCU read-side code from
 247         within a timer handler.  This value should be non-zero only
 248         if you specified the "irqreader" module parameter.
 249
 250 o       "Reader Pipe": Histogram of "ages" of structures seen by readers.
 251         If any entries past the first two are non-zero, RCU is broken.
 252         And rcutorture prints the error flag string "!!!" to make sure
 253         you notice.  The age of a newly allocated structure is zero,
 254         it becomes one when removed from reader visibility, and is
 255         incremented once per grace period subsequently -- and is freed
 256         after passing through (RCU_TORTURE_PIPE_LEN-2) grace periods.
 257
 258         The output displayed above was taken from a correctly working
 259         RCU.  If you want to see what it looks like when broken, break
 260         it yourself.  ;-)
 261
 262 o       "Reader Batch": Another histogram of "ages" of structures seen
 263         by readers, but in terms of counter flips (or batches) rather
 264         than in terms of grace periods.  The legal number of non-zero
 265         entries is again two.  The reason for this separate view is that
 266         it is sometimes easier to get the third entry to show up in the
 267         "Reader Batch" list than in the "Reader Pipe" list.
 268
 269 o       "Free-Block Circulation": Shows the number of torture structures
 270         that have reached a given point in the pipeline.  The first element
 271         should closely correspond to the number of structures allocated,
 272         the second to the number that have been removed from reader view,
 273         and all but the last remaining to the corresponding number of
 274         passes through a grace period.  The last entry should be zero,
 275         as it is only incremented if a torture structure's counter
 276         somehow gets incremented farther than it should.
 277
 278 Different implementations of RCU can provide implementation-specific
 279 additional information.  For example, Tree SRCU provides the following
 280 additional line:
 281
 282         srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6)
 283
 284 This line shows the per-CPU counter state, in this case for Tree SRCU
 285 using a dynamically allocated srcu_struct (hence "srcud-" rather than
 286 "srcu-").  The numbers in parentheses are the values of the "old" and
 287 "current" counters for the corresponding CPU.  The "idx" value maps the
 288 "old" and "current" values to the underlying array, and is useful for
 289 debugging.  The final "T" entry contains the totals of the counters.
 290
 291
 292 USAGE
 293
 294 The following script may be used to torture RCU:
 295
 296         #!/bin/sh
 297
 298         modprobe rcutorture
 299         sleep 3600
 300         rmmod rcutorture
 301         dmesg | grep torture:
 302
 303 The output can be manually inspected for the error flag of "!!!".
 304 One could of course create a more elaborate script that automatically
 305 checked for such errors.  The "rmmod" command forces a "SUCCESS",
 306 "FAILURE", or "RCU_HOTPLUG" indication to be printk()ed.  The first
 307 two are self-explanatory, while the last indicates that while there
 308 were no RCU failures, CPU-hotplug problems were detected.
 309
 310 However, the tools/testing/selftests/rcutorture/bin/kvm.sh script
 311 provides better automation, including automatic failure analysis.
 312 It assumes a qemu/kvm-enabled platform, and runs guest OSes out of initrd.
 313 See tools/testing/selftests/rcutorture/doc/initrd.txt for instructions
 314 on setting up such an initrd.