th_define(1M) System Administration Commands th_define(1M)NAMEth_define - create fault injection test harness error specifications
SYNOPSISth_define [-n name -i instance| -P path] [-a acc_types]
[-r reg_number] [-l offset [length]]
[-c count [failcount]] [-o operator [operand]]
[-f acc_chk] [-w max_wait_period [report_interval]]
or
th_define [-n name -i instance| -P path]
[-a log [acc_types] [-r reg_number] [-l offset [length]]]
[-c count [failcount]] [-s collect_time] [-p policy]
[-x flags] [-C comment_string]
[-e fixup_script [args]]
or
th_define [-h]
DESCRIPTION
The th_define utility provides an interface to the bus_ops fault injec‐
tion bofi device driver for defining error injection specifications
(referred to as errdefs). An errdef corresponds to a specification of
how to corrupt a device driver's accesses to its hardware. The command
line arguments determine the precise nature of the fault to be
injected. If the supplied arguments define a consistent errdef, the
th_define process will store the errdef with the bofi driver and sus‐
pend itself until the criteria given by the errdef become satisfied (in
practice, this will occur when the access counts go to zero).
You use the th_manage(1M) command with the start option to activate the
resulting errdef. The effect of th_manage with the start option is that
the bofi driver acts upon the errdef by matching the number of hardware
accesses—specified in count, that are of the type specified in
acc_types, made by instance number instance—of the driver whose name is
name, (or by the driver instance specified by path) to the register set
(or DMA handle) specified by reg_number, that lie within the range off‐
set to offset + length from the beginning of the register set or DMA
handle. It then applies operator and operand to the next failcount
matching accesses.
If acc_types includes log, th_define runs in automatic test script gen‐
eration mode, and a set of test scripts (written in the Korn shell) is
created and placed in a sub-directory of the current directory with the
name <driver>.test.<id> (for example, glm.test.978177106). A separate,
executable script is generated for each access handle that matches the
logging criteria. The log of accesses is placed at the top of each
script as a record of the session. If the current directory is not
writable, file output is written to standard output. The base name of
each test file is the driver name, and the extension is a number that
discriminates between different access handles. A control script (with
the same name as the created test directory) is generated that will run
all the test scripts sequentially.
Executing the scripts will install, and then activate, the resulting
error definitions. Error definitions are activated sequentially and the
driver instance under test is taken offline and brought back online
before each test (refer to the -e option for more information). By
default, logging applies to all PIO accesses, all interrupts, and all
DMA accesses to and from areas mapped for both reading and writing. You
can constrain logging by specifying additional acc_types, reg_number,
offset and length. Logging will continue for count matching accesses,
with an optional time limit of collect_time seconds.
Either the -n or -P option must be provided. The other options are
optional. If an option (other than -a) is specified multiple times,
only the final value for the option is used. If an option is not speci‐
fied, its associated value is set to an appropriate default, which will
provide maximal error coverage as described below.
OPTIONS
The following options are available:
-n name
Specify the name of the driver to test. (String)
-i instance
Test only the specified driver instance (-1 matches all instances
of driver). (Numeric)
-P path
Specify the full device path of the driver to test. (String)
-r reg_number
Test only the given register set or DMA handle (-1 matches all reg‐
ister sets and DMA handles). (Numeric)
-a acc_types
Only the specified access types will be matched. Valid values for
the acc_types argument are log, pio, pio_r, pio_w, dma, dma_r,
dma_w and intr. Multiple access types, separated by spaces, can be
specified. The default is to match all hardware accesses.
If acc_types is set to log, logging will match all PIO accesses,
interrupts and DMA accesses to and from areas mapped for both read‐
ing and writing. log can be combined with other acc_types, in which
case the matching condition for logging will be restricted to the
specified addional acc_types. Note that dma_r will match only DMA
handles mapped for reading only; dma_w will match only DMA handles
mapped for writing only; dma will match only DMA handles mapped for
both reading and writing.
-l offset [length]
Constrain the range of qualifying accesses. The offset and length
arguments indicate that any access of the type specified with the
-a option, to the register set or DMA handle specified with the -r
option, lie at least offset bytes into the register set or DMA han‐
dle and at most offset + length bytes into it. The default for off‐
set is 0. The default for length is the maximum value that can be
placed in an offset_t C data type (see types.h). Negative values
are converted into unsigned quantities. Thus, th_define-l 0 -1 is
maximal.
-c count[failcount]
Wait for count number of matching accesses, then apply an operator
and operand (see the -o option) to the next failcount number of
matching accesses. If the access type (see the -a option) includes
logging, the number of logged accesses is given by count + fail‐
count - 1. The -1 is required because the last access coincides
with the first faulting access.
Note that access logging may be combined with error injection if
failcount and operator are nonzero and if the access type includes
logging and any of the other access types (pio, dma and intr) See
the description of access types in the definition of the -a option,
above.
When the count and failcount fields reach zero, the status of the
errdef is reported to standard output. When all active errdefs cre‐
ated by the th_define process complete, the process exits. If
acc_types includes log, count determines how many accesses to log.
If count is not specified, a default value is used. If failcount is
set in this mode, it will simply increase the number of accesses
logged by a further failcount - 1.
-o operator [operand]
For qualifying PIO read and write accesses, the value read from or
written to the hardware is corrupted according to the value of
operator:
EQ operand is returned to the driver.
OR operand is bitwise ORed with the real value.
AND operand is bitwise ANDed with the real value.
XOR operand is bitwise XORed with the real value.
For PIO write accesses, the following operator is allowed:
NO Simply ignore the driver's attempt to write to the hardware.
Note that a driver performs PIO via the ddi_getX(), ddi_putX(),
ddi_rep_getX() and ddi_rep_putX() routines (where X is 8, 16, 32 or
64). Accesses made using ddi_getX() and ddi_putX() are treated as a
single access, whereas an access made using the ddi_rep_*(9F) rou‐
tines are broken down into their respective number of accesses, as
given by the repcount parameter to these DDI calls. If the access
is performed via a DMA handle, operator and value are applied to
every access that comprises the DMA request. If interference with
interrupts has been requested then the operator may take any of the
following values:
DELAY After count accesses (see the -c option), delay delivery
of the next failcount number of interrupts for operand
number of microseconds.
LOSE After count number of interrupts, fail to deliver the next
failcount number of real interrupts to the driver.
EXTRA After count number of interrupts, start delivering operand
number of extra interrupts for the next failcount number
of real interrupts.
The default value for operand and operator is to corrupt the data
access by flipping each bit (XOR with -1).
-f acc_chk
If the acc_chk parameter is set to 1 or pio, then the driver's
calls to ddi_check_acc_handle(9F) return DDI_FAILURE when the
access count goes to 1. If the acc_chk parameter is set to 2 or
dma, then the driver's calls to ddi_check_dma_handle(9F) return
DDI_FAILURE when the access count goes to 1.
-w max_wait_period [report_interval]
Constrain the period for which an error definition will remain
active. The option applies only to non-logging errdefs. If an error
definition remains active for max_wait_period seconds, the test
will be aborted. If report_interval is set to a nonzero value, the
current status of the error definition is reported to standard out‐
put every report_interval seconds. The default value is zero. The
status of the errdef is reported in parsable format (eight fields,
each separated by a colon (:) character, the last of which is a
string enclosed by double quotes and the remaining seven fields are
integers):
ft:mt:ac:fc:chk:ec:s:"message" which are defined as follows:
ft The UTC time when the fault was injected.
mt The UTC time when the driver reported the fault.
ac The number of remaining non-faulting accesses.
fc The number of remaining faulting accesses.
chk The value of the acc_chk field of the errdef.
ec The number of fault reports issued by the driver
against this errdef (mt holds the time of the initial
report).
s The severity level reported by the driver.
"message" Textual reason why the driver has reported a fault.
-h
Display the command usage string.
-s collect_time
If acc_types is given with the -a option and includes log, the
errdef will log accesses for collect_time seconds (the default is
to log until the log becomes full). Note that, if the errdef speci‐
fication matches multiple driver handles, multiple logging errdefs
are registered with the bofi driver and logging terminates when all
logs become full or when collect_time expires or when the associ‐
ated errdefs are cleared. The current state of the log can be
checked with the th_manage(1M) command, using the broadcast parame‐
ter. A log can be terminated by running th_manage(1M) with the
clear_errdefs option or by sending a SIGALRM signal to the
th_define process. See alarm(2) for the semantics of SIGALRM.
-p policy
Applicable when the acc_types option includes log. The parameter
modifies the policy used for converting from logged accesses to
errdefs. All policies are inclusive:
o Use rare to bias error definitions toward rare accesses
(default).
o Use operator to produce a separate error definition for
each operator type (default).
o Use common to bias error definitions toward common
accesses.
o Use median to bias error definitions toward median
accesses.
o Use maximal to produce multiple error definitions for
duplicate accesses.
o Use unbiased to create unbiased error definitions.
o Use onebyte, twobyte, fourbyte, or eightbyte to select
errdefs corresponding to 1, 2, 4 or 8 byte accesses (if
chosen, the -xr option is enforced in order to ensure
that ddi_rep_*() calls are decomposed into multiple sin‐
gle accesses).
o Use multibyte to create error definitions for multibyte
accesses performed using ddi_rep_get*() and
ddi_rep_put*().
Policies can be combined by adding together these options. See the
NOTES section for further information.
-x flags
Applicable when the acc_types option includes log. The flags param‐
eter modifies the way in which the bofi driver logs accesses. It is
specified as a string containing any combination of the following
letters:
w Continuous logging (that is, the log will wrap when full).
t Timestamp each log entry (access times are in seconds).
r Log repeated I/O as individual accesses (for example, a
ddi_rep_get16(9F) call which has a repcount of N is logged N
times with each transaction logged as size 2 bytes. Without
this option, the default logging behavior is to log this
access once only, with a transaction size of twice the rep‐
count).
-C comment_string
Applicable when the acc_types option includes log. It provides a
comment string to be placed in any generated test scripts. The
string must be enclosed in double quotes.
-e fixup_script [args]
Applicable when the acc_types option includes log. The output of a
logging errdefs is to generate a test script for each driver access
handle. Use this option to embed a command in the resulting script
before the errors are injected. The generated test scripts will
take an instance offline and bring it back online before injecting
errors in order to bring the instance into a known fault-free
state. The executable fixup_script will be called twice with the
set of optional args— once just before the instance is taken off‐
line and again after the instance has been brought online. The fol‐
lowing variables are passed into the environment of the called exe‐
cutable:
DRIVER_PATH Identifies the device path of the instance.
DRIVER_INSTANCE Identifies the instance number of the device.
DRIVER_UNCONFIGURE Has the value 1 when the instance is about to
be taken offline.
DRIVER_CONFIGURE Has the value 1 when the instance has just
been brought online.
Typically, the executable ensures that the device under test is in
a suitable state to be taken offline (unconfigured) or in a suit‐
able state for error injection (for example configured, error free
and servicing a workload). A minimal script for a network driver
could be:
#!/bin/ksh
driver=xyznetdriver
ifnum=$driver$DRIVER_INSTANCE
if [[ $DRIVER_CONFIGURE = 1 ]]; then
ifconfig $ifnum plumb
ifconfig $ifnum ...
ifworkload start $ifnum
elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then
ifworkload stop $ifnum
ifconfig $ifnum down
ifconfig $ifnum unplumb
fi
exit $?
The -e option must be the last option on the command line.
If the -a log option is selected but the -e option is not given, a
default script is used. This script repeatedly attempts to detach and
then re-attach the device instance under test.
EXAMPLES
Examples of Error Definitions
th_define-n foo -i 1 -a log
Logs all accesses to all handles used by instance 1 of the foo driver
while running the default workload (attaching and detaching the
instance). Then generates a set of test scripts to inject appropriate
errdefs while running that default workload.
th_define-n foo -i 1 -a log pio
Logs PIO accesses to each PIO handle used by instance 1 of the foo
driver while running the default workload (attaching and detaching the
instance). Then generates a set of test scripts to inject appropriate
errdefs while running that default workload.
th_define-n foo -i 1 -p onebyte median -e fixup arg -now
Logs all accesses to all handles used by instance 1 of the foo driver
while running the workload defined in the fixup script fixup with argu‐
ments arg and -now. Then generates a set of test scripts to inject
appropriate errdefs while running that workload. The resulting error
definitions are requested to focus upon single byte accesses to loca‐
tions that are accessed a median number of times with respect to fre‐
quency of access to I/O addresses.
th_define-n se -l 0x20 1 -a pio_r -o OR 0x4 -c 10 1000
Simulates a stuck serial chip command by forcing 1000 consecutive read
accesses made by any instance of the se driver to its command status
register, thereby returning status busy.
th_define-n foo -i 3 -r 1 -a pio_r -c 0 1 -f 1 -o OR 0x100
Causes 0x100 to be ORed into the next physical I/O read access from any
register in register set 1 of instance 3 of the foo driver. Subsequent
calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
th_define-n foo -i 3 -r 1 -a pio_r -c 0 1 -o OR 0x0
Causes 0x0 to be ORed into the next physical I/O read access from any
register in register set 1 of instance 3 of the foo driver. This is of
course a no-op.
th_define-n foo -i 3 -r 1 -l 0x8100 1 -a pio_r -c 0 10 -o EQ 0x70003
Causes the next ten next physical I/O reads from the register at offset
0x8100 in register set 1 of instance 3 of the foo driver to return
0x70003.
th_define-n foo -i 3 -r 1 -l 0x8100 1 -a pio_w -c 100 3 -o AND
0xffffffffffffefff
The next 100 physical I/O writes to the register at offset 0x8100 in
register set 1 of instance 3 of the foo driver take place as normal.
However, on each of the three subsequent accesses, the 0x1000 bit will
be cleared.
th_define-n foo -i 3 -r 1 -l 0x8100 0x10 -a pio_r -c 0 1 -f 1 -o XOR 7
Causes the bottom three bits to have their values toggled for the next
physical I/O read access to registers with offsets in the range 0x8100
to 0x8110 in register set 1 of instance 3 of the foo driver. Subsequent
calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
th_define-n foo -i 3 -a pio_w -c 0 1 -o NO 0
Prevents the next physical I/O write access to any register in any reg‐
ister set of instance 3 of the foo driver from going out on the bus.
th_define-n foo -i 3 -l 0 8192 -a dma_r -c 0 1 -o OR 7
Causes 0x7 to be ORed into each long long in the first 8192 bytes of
the next DMA read, using any DMA handle for instance 3 of the foo
driver.
th_define-n foo -i 3 -r 2 -l 0 8 -a dma_r -c 0 1 -o OR
0x7070707070707070
Causes 0x70 to be ORed into each byte of the first long long of the
next DMA read, using the DMA handle with sequential allocation number 2
for instance 3 of the foo driver.
th_define-n foo -i 3 -l 256 256 -a dma_w -c 0 1 -f 2 -o OR 7
Causes 0x7 to be ORed into each long long in the range from offset 256
to offset 512 of the next DMA write, using any DMA handle for instance
3 of the foo driver. Subsequent calls in the driver to
ddi_check_dma_handle() return DDI_FAILURE.
th_define-n foo -i 3 -r 0 -l 0 8 -a dma_w -c 100 3 -o AND
0xffffffffffffefff
The next 100 DMA writes using the DMA handle with sequential allocation
number 0 for instance 3 of the foo driver take place as normal. How‐
ever, on each of the three subsequent accesses, the 0x1000 bit will be
cleared in the first long long of the transfer.
th_define-n foo -i 3 -a intr -c 0 6 -o LOSE 0
Causes the next six interrupts for instance 3 of the foo driver to be
lost.
th_define-n foo -i 3 -a intr -c 30 1 -o EXTRA 10
When the thirty-first subsequent interrupt for instance 3 of the foo
driver occurs, a further ten interrupts are also generated.
th_define-n foo -i 3 -a intr -c 0 1 -o DELAY 1024
Causes the next interrupt for instance 3 of the foo driver to be
delayed by 1024 microseconds.
NOTES
The policy option in the th_define-p syntax determines how a set of
logged accesses will be converted into the set of error definitions.
Each logged access will be matched against the chosen policies to
determine whether an error definition should be created based on the
access.
Any number of policy options can be combined to modify the generated
error definitions.
Bytewise Policies
These select particular I/O transfer sizes. Specifing a byte policy
will exclude other byte policies that have not been chosen. If none of
the byte type policies is selected, all transfer sizes are treated
equally. Otherwise, only those specified transfer sizes will be
selected.
onebyte Create errdefs for one byte accesses (ddi_get8())
twobyte Create errdefs for two byte accesses (ddi_get16())
fourbyte Create errdefs for four byte accesses (ddi_get32())
eightbyte Create errdefs for eight byte accesses (ddi_get64())
multibyte Create errdefs for repeated byte accesses (ddi_rep_get*())
Frequency of Access Policies
The frequency of access to a location is determined according to the
access type, location and transfer size (for example, a two-byte read
access to address A is considered distinct from a four-byte read access
to address A). The algorithm is to count the number of accesses (of a
given type and size) to a given location, and find the locations that
were most and least accessed (let maxa and mina be the number of times
these locations were accessed, and mean the total number of accesses
divided by total number of locations that were accessed). Then a rare
access is a location that was accessed less than
(mean - mina) / 3 + mina
times. Similarly for the definition of common accesses:
maxa - (maxa - mean) / 3
A location whose access patterns lies within these cutoffs is regarded
as a location that is accessed with median frequency.
rare Create errdefs for locations that are rarely accessed.
common Create errdefs for locations that are commonly accessed.
median Create errdefs for locations that are accessed a median fre‐
quency.
Policies for Minimizing errdefs
If a transaction is duplicated, either a single or multiple errdefs
will be written to the test scripts, depending upon the following two
policies:
maximal Create multiple errdefs for locations that are repeatedly
accessed.
unbiased Create a single errdef for locations that are repeatedly
accessed.
operators For each location, a default operator and operand is typi‐
cally applied. For maximal test coverage, this default may
be modified using the operators policy so that a separate
errdef is created for each of the possible corruption
operators.
SEE ALSOkill(1), th_manage(1M), alarm(2), ddi_check_acc_handle(9F),
ddi_check_dma_handle(9F)SunOS 5.11 11 Apr 2001 th_define(1M)