-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesPatch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126520.diff 3 Files Affected:
diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..c33853aca76168d 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
A translation unit contains multiple entry points, each of which take multiple steps to analyze.
+Performance analysis using ``-ftime-trace``
+===========================================
+
You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
or using `speedscope <https://speedscope.app>`_.
@@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
for a single entry point.
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+ :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+ # -F: Sampling frequency, use `-F max` for maximal frequency
+ # -g: Enable call-graph recording for both kernel and user space
+ perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+ -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+ -verify ./clang/test/Analysis/string.c
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+ :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+ perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+ /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
+ firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+
+.. code-block:: bash
+ :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
+
+ uftrace record clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+ -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+ -verify ./clang/test/Analysis/string.c
+ uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.
+
+Similar filters could be applied for dumping too. That way you can reuse the same (detailed)
+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+ cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.svg b/clang/docs/analyzer/images/flamegraph.svg
new file mode 100644
index 000000000000000..d3c4a22c9ff536a
--- /dev/null
+++ b/clang/docs/analyzer/images/flamegraph.svg
@@ -0,0 +1,27641 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg version="1.1" width="1200" height="934" onload="init(evt)" viewBox="0 0 1200 934" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. -->
+<!-- NOTES: -->
+<defs>
+ <linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
+ <stop stop-color="#eeeeee" offset="5%" />
+ <stop stop-color="#eeeeb0" offset="95%" />
+ </linearGradient>
+</defs>
+<style type="text/css">
+ text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
+ #search, #ignorecase { opacity:0.1; cursor:pointer; }
+ #search:hover, #search.show, #ignorecase:hover, #ignorecase.show { opacity:1; }
+ #subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
+ #title { text-anchor:middle; font-size:17px}
+ #unzoom { cursor:pointer; }
+ #frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
+ .hide { display:none; }
+ .parent { opacity:0.5; }
+</style>
+<script type="text/ecmascript">
+<![CDATA[
+ "use strict";
+ var details, searchbtn, unzoombtn, matchedtxt, svg, searching, currentSearchTerm, ignorecase, ignorecaseBtn;
+ function init(evt) {
+ details = document.getElementById("details").firstChild;
+ searchbtn = document.getElementById("search");
+ ignorecaseBtn = document.getElementById("ignorecase");
+ unzoombtn = document.getElementById("unzoom");
+ matchedtxt = document.getElementById("matched");
+ svg = document.getElementsByTagName("svg")[0];
+ searching = 0;
+ currentSearchTerm = null;
+
+ // use GET parameters to restore a flamegraphs state.
+ var params = get_params();
+ if (params.x && params.y)
+ zoom(find_group(document.querySelector('[x="' + params.x + '"][y="' + params.y + '"]')));
+ if (params.s) search(params.s);
+ }
+
+ // event listeners
+ window.addEventListener("click", function(e) {
+ var target = find_group(e.target);
+ if (target) {
+ if (target.nodeName == "a") {
+ if (e.ctrlKey === false) return;
+ e.preventDefault();
+ }
+ if (target.classList.contains("parent")) unzoom(true);
+ zoom(target);
+ if (!document.querySelector('.parent')) {
+ // we have basically done a clearzoom so clear the url
+ var params = get_params();
+ if (params.x) delete params.x;
+ if (params.y) delete params.y;
+ history.replaceState(null, null, parse_params(params));
+ unzoombtn.classList.add("hide");
+ return;
+ }
+
+ // set parameters for zoom state
+ var el = target.querySelector("rect");
+ if (el && el.attributes && el.attributes.y && el.attributes._orig_x) {
+ var params = get_params()
+ params.x = el.attributes._orig_x.value;
+ params.y = el.attributes.y.value;
+ history.replaceState(null, null, parse_params(params));
+ }
+ }
+ else if (e.target.id == "unzoom") clearzoom();
+ else if (e.target.id == "search") search_prompt();
+ else if (e.target.id == "ignorecase") toggle_ignorecase();
+ }, false)
+
+ // mouse-over for info
+ // show
+ window.addEventListener("mouseover", function(e) {
+ var target = find_group(e.target);
+ if (target) details.nodeValue = "Function: " + g_to_text(target);
+ }, false)
+
+ // clear
+ window.addEventListener("mouseout", function(e) {
+ var target = find_group(e.target);
+ if (target) details.nodeValue = ' ';
+ }, false)
+
+ // ctrl-F for search
+ // ctrl-I to toggle case-sensitive search
+ window.addEventListener("keydown",function (e) {
+ if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
+ e.preventDefault();
+ search_prompt();
+ }
+ else if (e.ctrlKey && e.keyCode === 73) {
+ e.preventDefault();
+ toggle_ignorecase();
+ }
+ }, false)
+
+ // functions
+ function get_params() {
+ var params = {};
+ var paramsarr = window.location.search.substr(1).split('&');
+ for (var i = 0; i < paramsarr.length; ++i) {
+ var tmp = paramsarr[i].split("=");
+ if (!tmp[0] || !tmp[1]) continue;
+ params[tmp[0]] = decodeURIComponent(tmp[1]);
+ }
+ return params;
+ }
+ function parse_params(params) {
+ var uri = "?";
+ for (var key in params) {
+ uri += key + '=' + encodeURIComponent(params[key]) + '&';
+ }
+ if (uri.slice(-1) == "&")
+ uri = uri.substring(0, uri.length - 1);
+ if (uri == '?')
+ uri = window.location.href.split('?')[0];
+ return uri;
+ }
+ function find_child(node, selector) {
+ var children = node.querySelectorAll(selector);
+ if (children.length) return children[0];
+ }
+ function find_group(node) {
+ var parent = node.parentElement;
+ if (!parent) return;
+ if (parent.id == "frames") return node;
+ return find_group(parent);
+ }
+ function orig_save(e, attr, val) {
+ if (e.attributes["_orig_" + attr] != undefined) return;
+ if (e.attributes[attr] == undefined) return;
+ if (val == undefined) val = e.attributes[attr].value;
+ e.setAttribute("_orig_" + attr, val);
+ }
+ function orig_load(e, attr) {
+ if (e.attributes["_orig_"+attr] == undefined) return;
+ e.attributes[attr].value = e.attributes["_orig_" + attr].value;
+ e.removeAttribute("_orig_"+attr);
+ }
+ function g_to_text(e) {
+ var text = find_child(e, "title").firstChild.nodeValue;
+ return (text)
+ }
+ function g_to_func(e) {
+ var func = g_to_text(e);
+ // if there's any manipulation we want to do to the function
+ // name before it's searched, do it here before returning.
+ return (func);
+ }
+ function update_text(e) {
+ var r = find_child(e, "rect");
+ var t = find_child(e, "text");
+ var w = parseFloat(r.attributes.width.value) -3;
+ var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,"");
+ t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
+
+ // Smaller than this size won't fit anything
+ if (w < 2 * 12 * 0.59) {
+ t.textContent = "";
+ return;
+ }
+
+ t.textContent = txt;
+ var sl = t.getSubStringLength(0, txt.length);
+ // check if only whitespace or if we can fit the entire string into width w
+ if (/^ *$/.test(txt) || sl < w)
+ return;
+
+ // this isn't perfect, but gives a good starting point
+ // and avoids calling getSubStringLength too often
+ var start = Math.floor((w/sl) * txt.length);
+ for (var x = start; x > 0; x = x-2) {
+ if (t.getSubStringLength(0, x + 2) <= w) {
+ t.textContent = txt.substring(0, x) + "..";
+ return;
+ }
+ }
+ t.textContent = "";
+ }
+
+ // zoom
+ function zoom_reset(e) {
+ if (e.attributes != undefined) {
+ orig_load(e, "x");
+ orig_load(e, "width");
+ }
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_reset(c[i]);
+ }
+ }
+ function zoom_child(e, x, ratio) {
+ if (e.attributes != undefined) {
+ if (e.attributes.x != undefined) {
+ orig_save(e, "x");
+ e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
+ if (e.tagName == "text")
+ e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
+ }
+ if (e.attributes.width != undefined) {
+ orig_save(e, "width");
+ e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio;
+ }
+ }
+
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_child(c[i], x - 10, ratio);
+ }
+ }
+ function zoom_parent(e) {
+ if (e.attributes) {
+ if (e.attributes.x != undefined) {
+ orig_save(e, "x");
+ e.attributes.x.value = 10;
+ }
+ if (e.attributes.width != undefined) {
+ orig_save(e, "width");
+ e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2);
+ }
+ }
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_parent(c[i]);
+ }
+ }
+ function zoom(node) {
+ var attr = find_child(node, "rect").attributes;
+ var width = parseFloat(attr.width.value);
+ var xmin = parseFloat(attr.x.value);
+ var xmax = parseFloat(xmin + width);
+ var ymin = parseFloat(attr.y.value);
+ var ratio = (svg.width.baseVal.value - 2 * 10) / width;
+
+ // XXX: Workaround for JavaScript float issues (fix me)
+ var fudge = 0.0001;
+
+ unzoombtn.classList.remove("hide");
+
+ var el = document.getElementById("frames").children;
+ for (var i = 0; i < el.length; i++) {
+ var e = el[i];
+ var a = find_child(e, "rect").attributes;
+ var ex = parseFloat(a.x.value);
+ var ew = parseFloat(a.width.value);
+ var upstack;
+ // Is it an ancestor
+ if (0 == 0) {
+ upstack = parseFloat(a.y.value) > ymin;
+ } else {
+ upstack = parseFloat(a.y.value) < ymin;
+ }
+ if (upstack) {
+ // Direct ancestor
+ if (ex <= xmin && (ex+ew+fudge) >= xmax) {
+ e.classList.add("parent");
+ zoom_parent(e);
+ update_text(e);
+ }
+ // not in current path
+ else
+ e.classList.add("hide");
+ }
+ // Children maybe
+ else {
+ // no common path
+ if (ex < xmin || ex + fudge >= xmax) {
+ e.classList.add("hide");
+ }
+ else {
+ zoom_child(e, xmin, ratio);
+ update_text(e);
+ }
+ }
+ }
+ search();
+ }
+ function unzoom(dont_update_text) {
+ unzoombtn.classList.add("hide");
+ var el = document.getElementById("frames").children;
+ for(var i = 0; i < el.length; i++) {
+ el[i].classList.remove("parent");
+ el[i].classList.remove("hide");
+ zoom_reset(el[i]);
+ if(!dont_update_text) update_text(el[i]);
+ }
+ search();
+ }
+ function clearzoom() {
+ unzoom();
+
+ // remove zoom state
+ var params = get_params();
+ if (params.x) delete params.x;
+ if (params.y) delete params.y;
+ history.replaceState(null, null, parse_params(params));
+ }
+
+ // search
+ function toggle_ignorecase() {
+ ignorecase = !ignorecase;
+ if (ignorecase) {
+ ignorecaseBtn.classList.add("show");
+ } else {
+ ignorecaseBtn.classList.remove("show");
+ }
+ reset_search();
+ search();
+ }
+ function reset_search() {
+ var el = document.querySelectorAll("#frames rect");
+ for (var i = 0; i < el.length; i++) {
+ orig_load(el[i], "fill")
+ }
+ var params = get_params();
+ delete params.s;
+ history.replaceState(null, null, parse_params(params));
+ }
+ function search_prompt() {
+ if (!searching) {
+ var term = prompt("Enter a search term (regexp " +
+ "allowed, eg: ^ext4_)"
+ + (ignorecase ? ", ignoring case" : "")
+ + "\nPress Ctrl-i to toggle case sensitivity", "");
+ if (term != null) search(term);
+ } else {
+ reset_search();
+ searching = 0;
+ currentSearchTerm = null;
+ searchbtn.classList.remove("show");
+ searchbtn.firstChild.nodeValue = "Search"
+ matchedtxt.classList.add("hide");
+ matchedtxt.firstChild.nodeValue = ""
+ }
+ }
+ function search(term) {
+ if (term) currentSearchTerm = term;
+
+ var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : '');
+ var el = document.getElementById("frames").children;
+ var matches = new Object();
+ var maxwidth = 0;
+ for (var i = 0; i < el.length; i++) {
+ var e = el[i];
+ var func = g_to_func(e);
+ var rect = find_child(e, "rect");
+ if (func == null || rect == null)
+ continue;
+
+ // Save max width. Only works as we have a root frame
+ var w = parseFloat(rect.attributes.width.value);
+ if (w > maxwidth)
+ maxwidth = w;
+
+ if (func.match(re)) {
+ // highlight
+ var x = parseFloat(rect.attributes.x.value);
+ orig_save(rect, "fill");
+ rect.attributes.fill.value = "rgb(230,0,230)";
+
+ // remember matches
+ if (matches[x] == undefined) {
+ matches[x] = w;
+ } else {
+ if (w > matches[x]) {
+ // overwrite with parent
+ matches[x] = w;
+ }
+ }
+ searching = 1;
+ }
+ }
+ if (!searching)
+ return;
+ var params = get_params();
+ params.s = currentSearchTerm;
+ history.replaceState(null, null, parse_params(params));
+
+ searchbtn.classList.add("show");
+ searchbtn.firstChild.nodeValue = "Reset Search";
+
+ // calculate percent matched, excluding vertical overlap
+ var count = 0;
+ var lastx = -1;
+ var lastw = 0;
+ var keys = Array();
+ for (k in matches) {
+ if (matches.hasOwnProperty(k))
+ keys.push(k);
+ }
+ // sort the matched frames by their x location
+ // ascending, then width descending
+ keys.sort(function(a, b){
+ return a - b;
+ });
+ // Step through frames saving only the biggest bottom-up frames
+ // thanks to the sort order. This relies on the tree property
+ // where children are always smaller than their parents.
+ var fudge = 0.0001; // JavaScript floating point
+ for (var k in keys) {
+ var x = parseFloat(keys[k]);
+ var w = matches[keys[k]];
+ if (x >= lastx + lastw - fudge) {
+ count += w;
+ lastx = x;
+ lastw = w;
+ }
+ }
+ // display matched percent
+ matchedtxt.classList.remove("hide");
+ var pct = 100 * count / maxwidth;
+ if (pct != 100) pct = pct.toFixed(1)
+ matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
+ }
+]]>
+</script>
+<rect x="0.0" y="0" width="1200.0" height="934.0" fill="url(#background)" />
+<text id="title" x="600.00" y="24" >Flame Graph</text>
+<text id="details" x="10.00" y="917" > </text>
+<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
+<text id="search" x="1090.00" y="24" >Search</text>
+<text id="ignorecase" x="1174.00" y="24" >ic</text>
+<text id="matched" x="1090.00" y="917" > </text>
+<g id="frames">
+<g >
+<title>[unknown] (140,334 samples, 0.09%)</title><rect x="311.8" y="469" width="1.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
+<text x="314.78" y="479.5" ></text>
+</g>
+<g >
+<title>clang::Sema::CheckSingleAssignmentConstraints (259,475 samples, 0.17%)</title><rect x="665.4" y="325" width="2.0" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
+<text x="668.36" y="335.5" ></text>...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note about the illustrations:
Thank you for adding them, I think they are invaluable to understand the docs!
Yet, I feel they could be smaller both in detail and in the file sizes.
- Fewer details means the reader will quicker understand the points you want to make.
- Lower file sizes will save bandwidth and disk space to thousands of people and robots that checkout LLVM every day. Especially for such docs that are unlikely to be viewed by more than a hundred of people.
For example, here is how I minimized my PNG illustration using ImageMagick:
convert speedscope.png -resize 1825x900 +dither -colors 64 -strip -quality 90 -define png:compression-level=9 speedscope-low.png
I played with the resolution, colors, and quality parameters trying to find the smallest values that don't destroy the readability completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks wonderful! Added some nits inline.
|
||
# -F: Sampling frequency, use `-F max` for maximal frequency | ||
# -g: Enable call-graph recording for both kernel and user space | ||
perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a simpler/smaller CSA invocation would suffice here for demonstration purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to sick with a similar invocation as was present in the beginning of this file.
If we were t simplify this, we should harmonize them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in steakhal@169664c.
====================================== | ||
|
||
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data | ||
that you could use to focus and drill down into the timeline of your application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/could/can/
Also do that substitution on other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced both could
with can
in steakhal@1b105e0.
Performance analysis using ``perf`` | ||
=================================== | ||
|
||
`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that perf
is excellent, I wonder if we in general want to stay away from value judgements in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased into Perf is a tool for conducting sampling-based profiling.
Fixed in steakhal@aa5a285.
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data | ||
that you could use to focus and drill down into the timeline of your application. | ||
We will use it to generate Chromium trace JSON. | ||
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``. | |
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in steakhal@3b2d323.
|
||
There is only one prerequisite to use this tool. | ||
You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``. | ||
This will make it run substantially slower but allows rich instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be interesting to include the typical slowdown factor.
Also, I think, it is important to note the substantial disk space requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't measure. The best I could find on the internet was that it's not as slow as cachegrind. I'd avoid mentioning this though.
I added a remark about high storage requirement in steakhal@7a76bd5.
You can also see how deep function calls we may have due to AST visitors. | ||
|
||
Using different filters can reduce the number of functions to record. | ||
For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it makes more sense to put the link on the documentation rather than "generic" "common options" noun
For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation. | |
For the common options, refer to `the ``uftrace`` documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful -- the RST format has a dumb limitation that inline formatting cannot be nested, so I'd guess that the suggested "monospace text within link text" nesting wouldn't work (but I'm not 100% sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the link to the word documentation
in steakhal@4aa1f34.
recording to selectively focus on some special part using a refinement of the filter flags. | ||
Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_, | ||
thus it needs to be of a limited size. | ||
In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks out of place. I guess it goes with the second sentence of this paragraph, but not with the third (which now immediately precedes it).
I think you can make it more clear if you avoid "that":
In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. | |
If you do not apply filters on recording, you will collect a large trace and every dump operation would need to sieve through the much larger recording which may be annoying if done repeatedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted as-is in steakhal@196dd50.
thus it needs to be of a limited size. | ||
In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. | ||
|
||
If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts. | |
If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted as-is in steakhal@004b8a6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the noise of "closed PRs" due to the removal of my repo.
I had difficulties with Github that made me try different solutions to solve that.
In any case, I fixed all the review comments.
Thanks for the excellent recommendations!
|
||
# -F: Sampling frequency, use `-F max` for maximal frequency | ||
# -g: Enable call-graph recording for both kernel and user space | ||
perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in steakhal@169664c.
Performance analysis using ``perf`` | ||
=================================== | ||
|
||
`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased into Perf is a tool for conducting sampling-based profiling.
Fixed in steakhal@aa5a285.
====================================== | ||
|
||
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data | ||
that you could use to focus and drill down into the timeline of your application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced both could
with can
in steakhal@1b105e0.
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data | ||
that you could use to focus and drill down into the timeline of your application. | ||
We will use it to generate Chromium trace JSON. | ||
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in steakhal@3b2d323.
|
||
There is only one prerequisite to use this tool. | ||
You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``. | ||
This will make it run substantially slower but allows rich instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't measure. The best I could find on the internet was that it's not as slow as cachegrind. I'd avoid mentioning this though.
I added a remark about high storage requirement in steakhal@7a76bd5.
You can also see how deep function calls we may have due to AST visitors. | ||
|
||
Using different filters can reduce the number of functions to record. | ||
For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the link to the word documentation
in steakhal@4aa1f34.
recording to selectively focus on some special part using a refinement of the filter flags. | ||
Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_, | ||
thus it needs to be of a limited size. | ||
In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted as-is in steakhal@196dd50.
thus it needs to be of a limited size. | ||
In that case though, every dump operation would need to sieve through the whole recording if called repeatedly. | ||
|
||
If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted as-is in steakhal@004b8a6.
No description provided.