Skip to content

Commit c850ed3

Browse files
Jon Mastersrostedt
authored andcommitted
tracing: Add documentation for hwlat_detector tracer
Added the documentation on how to use th hwlat_detector. Signed-off-by: Jon Masters <[email protected]> [ Various updates and modified to show hwlat as a tracer ] Signed-off-by: Steven Rostedt <[email protected]>
1 parent e7c15cd commit c850ed3

File tree

1 file changed

+73
-0
lines changed

1 file changed

+73
-0
lines changed
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Introduction:
2+
-------------
3+
4+
The tracer hwlat_detector is a special purpose tracer that is used to
5+
detect large system latencies induced by the behavior of certain underlying
6+
hardware or firmware, independent of Linux itself. The code was developed
7+
originally to detect SMIs (System Management Interrupts) on x86 systems,
8+
however there is nothing x86 specific about this patchset. It was
9+
originally written for use by the "RT" patch since the Real Time
10+
kernel is highly latency sensitive.
11+
12+
SMIs are not serviced by the Linux kernel, which means that it does not
13+
even know that they are occuring. SMIs are instead set up by BIOS code
14+
and are serviced by BIOS code, usually for "critical" events such as
15+
management of thermal sensors and fans. Sometimes though, SMIs are used for
16+
other tasks and those tasks can spend an inordinate amount of time in the
17+
handler (sometimes measured in milliseconds). Obviously this is a problem if
18+
you are trying to keep event service latencies down in the microsecond range.
19+
20+
The hardware latency detector works by hogging one of the cpus for configurable
21+
amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
22+
for some period, then looking for gaps in the TSC data. Any gap indicates a
23+
time when the polling was interrupted and since the interrupts are disabled,
24+
the only thing that could do that would be an SMI or other hardware hiccup
25+
(or an NMI, but those can be tracked).
26+
27+
Note that the hwlat detector should *NEVER* be used in a production environment.
28+
It is intended to be run manually to determine if the hardware platform has a
29+
problem with long system firmware service routines.
30+
31+
Usage:
32+
------
33+
34+
Write the ASCII text "hwlat" into the current_tracer file of the tracing system
35+
(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
36+
redefine the threshold in microseconds (us) above which latency spikes will
37+
be taken into account.
38+
39+
Example:
40+
41+
# echo hwlat > /sys/kernel/tracing/current_tracer
42+
# echo 100 > /sys/kernel/tracing/tracing_thresh
43+
44+
The /sys/kernel/tracing/hwlat_detector interface contains the following files:
45+
46+
width - time period to sample with CPUs held (usecs)
47+
must be less than the total window size (enforced)
48+
window - total period of sampling, width being inside (usecs)
49+
50+
By default the width is set to 500,000 and window to 1,000,000, meaning that
51+
for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
52+
(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
53+
change to a default of 10 usecs. If any latencies that exceed the threshold is
54+
observed then the data will be written to the tracing ring buffer.
55+
56+
The minimum sleep time between periods is 1 millisecond. Even if width
57+
is less than 1 millisecond apart from window, to allow the system to not
58+
be totally starved.
59+
60+
If tracing_thresh was zero when hwlat detector was started, it will be set
61+
back to zero if another tracer is loaded. Note, the last value in
62+
tracing_thresh that hwlat detector had will be saved and this value will
63+
be restored in tracing_thresh if it is still zero when hwlat detector is
64+
started again.
65+
66+
The following tracing directory files are used by the hwlat_detector:
67+
68+
in /sys/kernel/tracing:
69+
70+
tracing_threshold - minimum latency value to be considered (usecs)
71+
tracing_max_latency - maximum hardware latency actually observed (usecs)
72+
hwlat_detector/width - specified amount of time to spin within window (usecs)
73+
hwlat_detector/window - amount of time between (width) runs (usecs)

0 commit comments

Comments
 (0)