[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126520

steakhal · 2025-02-10T14:14:57Z

No description provided.

…mance issues

steakhal · 2025-02-10T14:15:05Z

@necto

llvmbot · 2025-02-10T14:15:17Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126520.diff

3 Files Affected:

(modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst (+91)
(added) clang/docs/analyzer/images/flamegraph.svg (+27641)
(added) clang/docs/analyzer/images/uftrace_detailed.png ()

diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..c33853aca76168d 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
 Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
 A translation unit contains multiple entry points, each of which take multiple steps to analyze.
 
+Performance analysis using ``-ftime-trace``
+===========================================
+
 You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
 You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
 or using `speedscope <https://speedscope.app>`_.
@@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
 Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
 for a single entry point.
 You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+   :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+         -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+         -verify ./clang/test/Analysis/string.c
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+   :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+   perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+   /path/to/FlameGraph/flamegraph.pl perf.folded  > perf.svg
+   firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+
+.. code-block:: bash
+   :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
+
+   uftrace record clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+         -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+         -verify ./clang/test/Analysis/string.c
+   uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.
+
+Similar filters could be applied for dumping too. That way you can reuse the same (detailed)
+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+   cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.svg b/clang/docs/analyzer/images/flamegraph.svg
new file mode 100644
index 000000000000000..d3c4a22c9ff536a
--- /dev/null
+++ b/clang/docs/analyzer/images/flamegraph.svg
@@ -0,0 +1,27641 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg version="1.1" width="1200" height="934" onload="init(evt)" viewBox="0 0 1200 934" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. -->
+<!-- NOTES:  -->
+<defs>
+	<linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
+		<stop stop-color="#eeeeee" offset="5%" />
+		<stop stop-color="#eeeeb0" offset="95%" />
+	</linearGradient>
+</defs>
+<style type="text/css">
+	text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
+	#search, #ignorecase { opacity:0.1; cursor:pointer; }
+	#search:hover, #search.show, #ignorecase:hover, #ignorecase.show { opacity:1; }
+	#subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
+	#title { text-anchor:middle; font-size:17px}
+	#unzoom { cursor:pointer; }
+	#frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
+	.hide { display:none; }
+	.parent { opacity:0.5; }
+</style>
+<script type="text/ecmascript">
+<![CDATA[
+	"use strict";
+	var details, searchbtn, unzoombtn, matchedtxt, svg, searching, currentSearchTerm, ignorecase, ignorecaseBtn;
+	function init(evt) {
+		details = document.getElementById("details").firstChild;
+		searchbtn = document.getElementById("search");
+		ignorecaseBtn = document.getElementById("ignorecase");
+		unzoombtn = document.getElementById("unzoom");
+		matchedtxt = document.getElementById("matched");
+		svg = document.getElementsByTagName("svg")[0];
+		searching = 0;
+		currentSearchTerm = null;
+
+		// use GET parameters to restore a flamegraphs state.
+		var params = get_params();
+		if (params.x && params.y)
+			zoom(find_group(document.querySelector('[x="' + params.x + '"][y="' + params.y + '"]')));
+                if (params.s) search(params.s);
+	}
+
+	// event listeners
+	window.addEventListener("click", function(e) {
+		var target = find_group(e.target);
+		if (target) {
+			if (target.nodeName == "a") {
+				if (e.ctrlKey === false) return;
+				e.preventDefault();
+			}
+			if (target.classList.contains("parent")) unzoom(true);
+			zoom(target);
+			if (!document.querySelector('.parent')) {
+				// we have basically done a clearzoom so clear the url
+				var params = get_params();
+				if (params.x) delete params.x;
+				if (params.y) delete params.y;
+				history.replaceState(null, null, parse_params(params));
+				unzoombtn.classList.add("hide");
+				return;
+			}
+
+			// set parameters for zoom state
+			var el = target.querySelector("rect");
+			if (el && el.attributes && el.attributes.y && el.attributes._orig_x) {
+				var params = get_params()
+				params.x = el.attributes._orig_x.value;
+				params.y = el.attributes.y.value;
+				history.replaceState(null, null, parse_params(params));
+			}
+		}
+		else if (e.target.id == "unzoom") clearzoom();
+		else if (e.target.id == "search") search_prompt();
+		else if (e.target.id == "ignorecase") toggle_ignorecase();
+	}, false)
+
+	// mouse-over for info
+	// show
+	window.addEventListener("mouseover", function(e) {
+		var target = find_group(e.target);
+		if (target) details.nodeValue = "Function: " + g_to_text(target);
+	}, false)
+
+	// clear
+	window.addEventListener("mouseout", function(e) {
+		var target = find_group(e.target);
+		if (target) details.nodeValue = ' ';
+	}, false)
+
+	// ctrl-F for search
+	// ctrl-I to toggle case-sensitive search
+	window.addEventListener("keydown",function (e) {
+		if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
+			e.preventDefault();
+			search_prompt();
+		}
+		else if (e.ctrlKey && e.keyCode === 73) {
+			e.preventDefault();
+			toggle_ignorecase();
+		}
+	}, false)
+
+	// functions
+	function get_params() {
+		var params = {};
+		var paramsarr = window.location.search.substr(1).split('&');
+		for (var i = 0; i < paramsarr.length; ++i) {
+			var tmp = paramsarr[i].split("=");
+			if (!tmp[0] || !tmp[1]) continue;
+			params[tmp[0]]  = decodeURIComponent(tmp[1]);
+		}
+		return params;
+	}
+	function parse_params(params) {
+		var uri = "?";
+		for (var key in params) {
+			uri += key + '=' + encodeURIComponent(params[key]) + '&';
+		}
+		if (uri.slice(-1) == "&")
+			uri = uri.substring(0, uri.length - 1);
+		if (uri == '?')
+			uri = window.location.href.split('?')[0];
+		return uri;
+	}
+	function find_child(node, selector) {
+		var children = node.querySelectorAll(selector);
+		if (children.length) return children[0];
+	}
+	function find_group(node) {
+		var parent = node.parentElement;
+		if (!parent) return;
+		if (parent.id == "frames") return node;
+		return find_group(parent);
+	}
+	function orig_save(e, attr, val) {
+		if (e.attributes["_orig_" + attr] != undefined) return;
+		if (e.attributes[attr] == undefined) return;
+		if (val == undefined) val = e.attributes[attr].value;
+		e.setAttribute("_orig_" + attr, val);
+	}
+	function orig_load(e, attr) {
+		if (e.attributes["_orig_"+attr] == undefined) return;
+		e.attributes[attr].value = e.attributes["_orig_" + attr].value;
+		e.removeAttribute("_orig_"+attr);
+	}
+	function g_to_text(e) {
+		var text = find_child(e, "title").firstChild.nodeValue;
+		return (text)
+	}
+	function g_to_func(e) {
+		var func = g_to_text(e);
+		// if there's any manipulation we want to do to the function
+		// name before it's searched, do it here before returning.
+		return (func);
+	}
+	function update_text(e) {
+		var r = find_child(e, "rect");
+		var t = find_child(e, "text");
+		var w = parseFloat(r.attributes.width.value) -3;
+		var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,"");
+		t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
+
+		// Smaller than this size won't fit anything
+		if (w < 2 * 12 * 0.59) {
+			t.textContent = "";
+			return;
+		}
+
+		t.textContent = txt;
+		var sl = t.getSubStringLength(0, txt.length);
+		// check if only whitespace or if we can fit the entire string into width w
+		if (/^ *$/.test(txt) || sl < w)
+			return;
+
+		// this isn't perfect, but gives a good starting point
+		// and avoids calling getSubStringLength too often
+		var start = Math.floor((w/sl) * txt.length);
+		for (var x = start; x > 0; x = x-2) {
+			if (t.getSubStringLength(0, x + 2) <= w) {
+				t.textContent = txt.substring(0, x) + "..";
+				return;
+			}
+		}
+		t.textContent = "";
+	}
+
+	// zoom
+	function zoom_reset(e) {
+		if (e.attributes != undefined) {
+			orig_load(e, "x");
+			orig_load(e, "width");
+		}
+		if (e.childNodes == undefined) return;
+		for (var i = 0, c = e.childNodes; i < c.length; i++) {
+			zoom_reset(c[i]);
+		}
+	}
+	function zoom_child(e, x, ratio) {
+		if (e.attributes != undefined) {
+			if (e.attributes.x != undefined) {
+				orig_save(e, "x");
+				e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
+				if (e.tagName == "text")
+					e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
+			}
+			if (e.attributes.width != undefined) {
+				orig_save(e, "width");
+				e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio;
+			}
+		}
+
+		if (e.childNodes == undefined) return;
+		for (var i = 0, c = e.childNodes; i < c.length; i++) {
+			zoom_child(c[i], x - 10, ratio);
+		}
+	}
+	function zoom_parent(e) {
+		if (e.attributes) {
+			if (e.attributes.x != undefined) {
+				orig_save(e, "x");
+				e.attributes.x.value = 10;
+			}
+			if (e.attributes.width != undefined) {
+				orig_save(e, "width");
+				e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2);
+			}
+		}
+		if (e.childNodes == undefined) return;
+		for (var i = 0, c = e.childNodes; i < c.length; i++) {
+			zoom_parent(c[i]);
+		}
+	}
+	function zoom(node) {
+		var attr = find_child(node, "rect").attributes;
+		var width = parseFloat(attr.width.value);
+		var xmin = parseFloat(attr.x.value);
+		var xmax = parseFloat(xmin + width);
+		var ymin = parseFloat(attr.y.value);
+		var ratio = (svg.width.baseVal.value - 2 * 10) / width;
+
+		// XXX: Workaround for JavaScript float issues (fix me)
+		var fudge = 0.0001;
+
+		unzoombtn.classList.remove("hide");
+
+		var el = document.getElementById("frames").children;
+		for (var i = 0; i < el.length; i++) {
+			var e = el[i];
+			var a = find_child(e, "rect").attributes;
+			var ex = parseFloat(a.x.value);
+			var ew = parseFloat(a.width.value);
+			var upstack;
+			// Is it an ancestor
+			if (0 == 0) {
+				upstack = parseFloat(a.y.value) > ymin;
+			} else {
+				upstack = parseFloat(a.y.value) < ymin;
+			}
+			if (upstack) {
+				// Direct ancestor
+				if (ex <= xmin && (ex+ew+fudge) >= xmax) {
+					e.classList.add("parent");
+					zoom_parent(e);
+					update_text(e);
+				}
+				// not in current path
+				else
+					e.classList.add("hide");
+			}
+			// Children maybe
+			else {
+				// no common path
+				if (ex < xmin || ex + fudge >= xmax) {
+					e.classList.add("hide");
+				}
+				else {
+					zoom_child(e, xmin, ratio);
+					update_text(e);
+				}
+			}
+		}
+		search();
+	}
+	function unzoom(dont_update_text) {
+		unzoombtn.classList.add("hide");
+		var el = document.getElementById("frames").children;
+		for(var i = 0; i < el.length; i++) {
+			el[i].classList.remove("parent");
+			el[i].classList.remove("hide");
+			zoom_reset(el[i]);
+			if(!dont_update_text) update_text(el[i]);
+		}
+		search();
+	}
+	function clearzoom() {
+		unzoom();
+
+		// remove zoom state
+		var params = get_params();
+		if (params.x) delete params.x;
+		if (params.y) delete params.y;
+		history.replaceState(null, null, parse_params(params));
+	}
+
+	// search
+	function toggle_ignorecase() {
+		ignorecase = !ignorecase;
+		if (ignorecase) {
+			ignorecaseBtn.classList.add("show");
+		} else {
+			ignorecaseBtn.classList.remove("show");
+		}
+		reset_search();
+		search();
+	}
+	function reset_search() {
+		var el = document.querySelectorAll("#frames rect");
+		for (var i = 0; i < el.length; i++) {
+			orig_load(el[i], "fill")
+		}
+		var params = get_params();
+		delete params.s;
+		history.replaceState(null, null, parse_params(params));
+	}
+	function search_prompt() {
+		if (!searching) {
+			var term = prompt("Enter a search term (regexp " +
+			    "allowed, eg: ^ext4_)"
+			    + (ignorecase ? ", ignoring case" : "")
+			    + "\nPress Ctrl-i to toggle case sensitivity", "");
+			if (term != null) search(term);
+		} else {
+			reset_search();
+			searching = 0;
+			currentSearchTerm = null;
+			searchbtn.classList.remove("show");
+			searchbtn.firstChild.nodeValue = "Search"
+			matchedtxt.classList.add("hide");
+			matchedtxt.firstChild.nodeValue = ""
+		}
+	}
+	function search(term) {
+		if (term) currentSearchTerm = term;
+
+		var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : '');
+		var el = document.getElementById("frames").children;
+		var matches = new Object();
+		var maxwidth = 0;
+		for (var i = 0; i < el.length; i++) {
+			var e = el[i];
+			var func = g_to_func(e);
+			var rect = find_child(e, "rect");
+			if (func == null || rect == null)
+				continue;
+
+			// Save max width. Only works as we have a root frame
+			var w = parseFloat(rect.attributes.width.value);
+			if (w > maxwidth)
+				maxwidth = w;
+
+			if (func.match(re)) {
+				// highlight
+				var x = parseFloat(rect.attributes.x.value);
+				orig_save(rect, "fill");
+				rect.attributes.fill.value = "rgb(230,0,230)";
+
+				// remember matches
+				if (matches[x] == undefined) {
+					matches[x] = w;
+				} else {
+					if (w > matches[x]) {
+						// overwrite with parent
+						matches[x] = w;
+					}
+				}
+				searching = 1;
+			}
+		}
+		if (!searching)
+			return;
+		var params = get_params();
+		params.s = currentSearchTerm;
+		history.replaceState(null, null, parse_params(params));
+
+		searchbtn.classList.add("show");
+		searchbtn.firstChild.nodeValue = "Reset Search";
+
+		// calculate percent matched, excluding vertical overlap
+		var count = 0;
+		var lastx = -1;
+		var lastw = 0;
+		var keys = Array();
+		for (k in matches) {
+			if (matches.hasOwnProperty(k))
+				keys.push(k);
+		}
+		// sort the matched frames by their x location
+		// ascending, then width descending
+		keys.sort(function(a, b){
+			return a - b;
+		});
+		// Step through frames saving only the biggest bottom-up frames
+		// thanks to the sort order. This relies on the tree property
+		// where children are always smaller than their parents.
+		var fudge = 0.0001;	// JavaScript floating point
+		for (var k in keys) {
+			var x = parseFloat(keys[k]);
+			var w = matches[keys[k]];
+			if (x >= lastx + lastw - fudge) {
+				count += w;
+				lastx = x;
+				lastw = w;
+			}
+		}
+		// display matched percent
+		matchedtxt.classList.remove("hide");
+		var pct = 100 * count / maxwidth;
+		if (pct != 100) pct = pct.toFixed(1)
+		matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
+	}
+]]>
+</script>
+<rect x="0.0" y="0" width="1200.0" height="934.0" fill="url(#background)"  />
+<text id="title" x="600.00" y="24" >Flame Graph</text>
+<text id="details" x="10.00" y="917" > </text>
+<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
+<text id="search" x="1090.00" y="24" >Search</text>
+<text id="ignorecase" x="1174.00" y="24" >ic</text>
+<text id="matched" x="1090.00" y="917" > </text>
+<g id="frames">
+<g >
+<title>[unknown] (140,334 samples, 0.09%)</title><rect x="311.8" y="469" width="1.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
+<text  x="314.78" y="479.5" ></text>
+</g>
+<g >
+<title>clang::Sema::CheckSingleAssignmentConstraints (259,475 samples, 0.17%)</title><rect x="665.4" y="325" width="2.0" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
+<text  x="668.36" y="335.5" ></text>...
[truncated]

necto

One note about the illustrations:
Thank you for adding them, I think they are invaluable to understand the docs!
Yet, I feel they could be smaller both in detail and in the file sizes.

Fewer details means the reader will quicker understand the points you want to make.
Lower file sizes will save bandwidth and disk space to thousands of people and robots that checkout LLVM every day. Especially for such docs that are unlikely to be viewed by more than a hundred of people.

For example, here is how I minimized my PNG illustration using ImageMagick:

convert speedscope.png -resize 1825x900 +dither -colors 64 -strip -quality 90 -define png:compression-level=9 speedscope-low.png

I played with the resolution, colors, and quality parameters trying to find the smallest values that don't destroy the readability completely.

Xazax-hun

Looks wonderful! Added some nits inline.

Xazax-hun · 2025-02-10T14:56:02Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \


I wonder if a simpler/smaller CSA invocation would suffice here for demonstration purposes.

I wanted to sick with a similar invocation as was present in the beginning of this file.
If we were t simplify this, we should harmonize them.

Fixed in steakhal@169664c.

Xazax-hun · 2025-02-10T14:58:40Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.


s/could/can/

Also do that substitution on other places.

Replaced both could with can in steakhal@1b105e0.

Xazax-hun · 2025-02-10T14:59:42Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.


While I agree that perf is excellent, I wonder if we in general want to stay away from value judgements in documentation.

Rephrased into Perf is a tool for conducting sampling-based profiling.
Fixed in steakhal@aa5a285.

necto · 2025-02-10T15:07:57Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.


Suggested change

In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.

In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.

Thanks, fixed in steakhal@3b2d323.

necto · 2025-02-10T15:08:53Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.


Would be interesting to include the typical slowdown factor.
Also, I think, it is important to note the substantial disk space requirement

I didn't measure. The best I could find on the internet was that it's not as slow as cachegrind. I'd avoid mentioning this though.
I added a remark about high storage requirement in steakhal@7a76bd5.

necto · 2025-02-10T15:11:10Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.


To me it makes more sense to put the link on the documentation rather than "generic" "common options" noun

Suggested change

For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.

For the common options, refer to `the ``uftrace`` documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.

Be careful -- the RST format has a dumb limitation that inline formatting cannot be nested, so I'd guess that the suggested "monospace text within link text" nesting wouldn't work (but I'm not 100% sure).

I moved the link to the word documentation in steakhal@4aa1f34.

necto · 2025-02-10T15:38:28Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.


This looks out of place. I guess it goes with the second sentence of this paragraph, but not with the third (which now immediately precedes it).
I think you can make it more clear if you avoid "that":

Suggested change

In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.

If you do not apply filters on recording, you will collect a large trace and every dump operation would need to sieve through the much larger recording which may be annoying if done repeatedly.

Accepted as-is in steakhal@196dd50.

necto · 2025-02-10T15:39:20Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.


Suggested change

If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.

If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.

Accepted as-is in steakhal@004b8a6.

steakhal

Sorry about the noise of "closed PRs" due to the removal of my repo.
I had difficulties with Github that made me try different solutions to solve that.

In any case, I fixed all the review comments.
Thanks for the excellent recommendations!

steakhal · 2025-02-11T12:31:19Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+
+   # -F: Sampling frequency, use `-F max` for maximal frequency
+   # -g: Enable call-graph recording for both kernel and user space
+   perf record -F 99 -g --  clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \


Fixed in steakhal@169664c.

steakhal · 2025-02-11T12:32:37Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.


Rephrased into Perf is a tool for conducting sampling-based profiling.
Fixed in steakhal@aa5a285.

steakhal · 2025-02-11T12:34:41Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.


Replaced both could with can in steakhal@1b105e0.

steakhal · 2025-02-11T12:36:32Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.


Thanks, fixed in steakhal@3b2d323.

steakhal · 2025-02-11T12:44:30Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.


I didn't measure. The best I could find on the internet was that it's not as slow as cachegrind. I'd avoid mentioning this though.
I added a remark about high storage requirement in steakhal@7a76bd5.

steakhal · 2025-02-11T12:46:59Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.


I moved the link to the word documentation in steakhal@4aa1f34.

steakhal · 2025-02-11T12:48:38Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.


Accepted as-is in steakhal@196dd50.

steakhal · 2025-02-11T12:49:29Z

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.


Accepted as-is in steakhal@004b8a6.

[analyzer][docs] Document how to use perf and uftrace to debug perfor…

94f0b3c

…mance issues

steakhal added the clang:static analyzer label Feb 10, 2025

steakhal requested review from Xazax-hun, haoNoQ and NagyDonat February 10, 2025 14:14

llvmbot added the clang Clang issues not falling into any other category label Feb 10, 2025

necto reviewed Feb 10, 2025

View reviewed changes

Xazax-hun approved these changes Feb 10, 2025

View reviewed changes

necto reviewed Feb 10, 2025

View reviewed changes

steakhal closed this by deleting the head repository Feb 11, 2025

steakhal commented Feb 11, 2025

View reviewed changes

steakhal mentioned this pull request Feb 11, 2025

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126724

Merged

	In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.
	In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.

	For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.
	For the common options, refer to `the ``uftrace`` documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.

	In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
	If you do not apply filters on recording, you will collect a large trace and every dump operation would need to sieve through the much larger recording which may be annoying if done repeatedly.

	If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.
	If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126520

[analyzer][docs] Document how to use perf and uftrace to debug performance issues #126520

Uh oh!

Conversation

steakhal commented Feb 10, 2025

Uh oh!

steakhal commented Feb 10, 2025

Uh oh!

llvmbot commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

necto left a comment

Choose a reason for hiding this comment

Uh oh!

Xazax-hun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal left a comment

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Feb 10, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading

steakhal Feb 11, 2025 •

edited

Loading