Skip to content

Commit 5270c14

Browse files
committed
ctf: do not emit types in functions, or types that point into them
Even though dwarf2ctf's users are only interested in top-level types and variables, we have always recursed through functions and emitted all structure and union types found there. This is because some versions of GCC translate 'struct foo * bar' into a subroutine DIE 'struct foo' and a *global* pointer DIE, and since we emit all global type DIEs we would fail horribly if we didn't also find the things they might point to. But this is very wasteful: READ_ONCE() defines an anonymous union, and over a thousand of these, used inside static and inline functions defined in header files, are found in the generated CTFA, in addition to a number of other useless structures. GCC 8 has perturbed things enough that we are getting assertion failures when emitting some of these useless types: enough. Rather than spending time and disk space emitting types nobody will ever access, let's just remove them. But this wasn't being done already because it's rather tricky. We can avoid the obvious cases by just not recursing into subroutine DIEs, but if we are also to avoid the top-level DIEs with type DIEs inside subroutines that caused this problem in the first place, we have to detect and avoid to remove all DIEs which have a type DIE which is inside a subroutine, recursively. And *that* requires being able to determine, for an arbitrary DIE which we reached by following a type DIE pointer from some other type (rather than via recursion) whether it is inside a subroutine or not. And elfutils does not make that easy: we go through contortions in several places to avoid having to figure this out, but here we cannot get away from it. You can get from a DIE to its children easily, via dwarf_child() -- but there is no dwarf_parent(): elfutils does not expose child->parent pointers at all. So we need to build a map linking children to their parents. Since type DIEs can point forward in files as well as backward, we need the entire child->parent graph of any given file before calling any process_file() filter functions: so we must traverse the entire file redundantly before that, which is not cheap. We can amortize the cost of this by building the map once on first opening any given file and keeping it across the lifetime of dwarf2ctf, in a hash of hashes, absolute file name -> DIE offset -> DIE offset: DIE offsets are unique within files in the kernel build process so this is sufficient. This uses a few dozen megabytes of RAM (fairly insignificant): since it both saves time (through emitting less) and spends it, we end up almost a wash: a 20s slowdown in total. The CTF file shrinks substantially: in my testing, 6398074 -> 6252458 bytes, regaining most of the space we lost in the libdtrace-ctf range expansion that accompanied libdtrace-ctf 1.0.0. The shared type repo loses about 220 useless types: vmlinux.ctf loses about 2500 (!): other CTF repos lose some too, but obviously not quite as many. Of course the shared type repo is the *full* one, and even after this it's nearly blowing the 0x8000 types limit of libdtrace-ctf < 1.0.0, so the range expansion was still necessary. Orabug: 29054887 Signed-off-by: Nick Alcock <[email protected]> Reported-and-tested-by: Victor Erminpour <[email protected]> Reviewed-by: Kris Van Hees <[email protected]>
1 parent 2c0658c commit 5270c14

File tree

1 file changed

+175
-66
lines changed

1 file changed

+175
-66
lines changed

scripts/dwarf2ctf/dwarf2ctf.c

Lines changed: 175 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,13 @@ typedef struct ctf_memb_count {
141141
size_t count;
142142
} ctf_memb_count_t;
143143

144+
/*
145+
* A mapping from the absolute pathname of a TU to a hashtable mapping
146+
* DIE offsets of child DIEs to DIE offsets of parents. Populated on first
147+
* iteration.
148+
*/
149+
static GHashTable *fn_to_die_to_parent;
150+
144151
/*
145152
* Get a ctf_file out of the per_module hash for a given module.
146153
*/
@@ -226,6 +233,17 @@ static void init_ctf_table(const char *module_name);
226233
static ctf_id_t ctf_void_type;
227234
static ctf_id_t ctf_funcptr_type;
228235

236+
/*
237+
* Initialize the child->parent DIE mapping for a single file.
238+
*/
239+
static void init_parent_die(const char *file_name, Dwfl *dwfl);
240+
241+
/*
242+
* Initialize one layer of a child->parent mapping.
243+
*/
244+
static void init_parent_die_internal(const char *file_name,
245+
GHashTable *offs, Dwarf_Die *parent);
246+
229247
/*
230248
* Override the presence and value of FORM_u/sdata attributes on DWARF DIEs,
231249
* either adding to it, or replacing it.
@@ -297,11 +315,12 @@ static void process_file(const char *file_name,
297315
void *data);
298316

299317
/*
300-
* process_file() helper, walking over subroutines recursively and picking up
301-
* types therein.
318+
* process_file() helper, walking over the top level and picking up types
319+
* therein.
302320
*/
303321
static void process_tu_func(const char *module_name,
304322
const char *file_name,
323+
Dwarf *dwarf,
305324
Dwarf_Die *parent_die,
306325
Dwarf_Die *die,
307326
void (*dwarf_process)(const char *module_name,
@@ -627,7 +646,9 @@ ASSEMBLY_FUN(variable);
627646
* processing function: it can be used to rapidly determine that this DIE is not
628647
* worth processing. (It should return 0 in this case, and nonzero otherwise.)
629648
*/
630-
typedef int (*ctf_assembly_filter_fun)(Dwarf_Die *die,
649+
typedef int (*ctf_assembly_filter_fun)(const char *file_name,
650+
Dwarf *dwarf,
651+
Dwarf_Die *die,
631652
Dwarf_Die *parent_die);
632653

633654
/*
@@ -638,7 +659,9 @@ typedef int (*ctf_assembly_filter_fun)(Dwarf_Die *die,
638659
* functions, because GCC may emit references to the opaque variants of those
639660
* types from file scope.)
640661
*/
641-
static int filter_ctf_file_scope(Dwarf_Die *die,
662+
static int filter_ctf_file_scope(const char *file_name,
663+
Dwarf *dwarf,
664+
Dwarf_Die *die,
642665
Dwarf_Die *parent_die);
643666

644667
/*
@@ -647,7 +670,9 @@ static int filter_ctf_file_scope(Dwarf_Die *die,
647670
* interesting. (DTrace userspace contains a similar list, but the two lists
648671
* need not be in sync.)
649672
*/
650-
static int filter_ctf_uninteresting(Dwarf_Die *die,
673+
static int filter_ctf_uninteresting(const char *file_name,
674+
Dwarf *dwarf,
675+
Dwarf_Die *die,
651676
Dwarf_Die *parent_die);
652677

653678
/*
@@ -679,13 +704,13 @@ static struct assembly_tab_t
679704
{ DW_TAG_subrange_type, NULL, assemble_ctf_array_dimension },
680705
{ DW_TAG_const_type, filter_ctf_file_scope, assemble_ctf_cvr_qual },
681706
{ DW_TAG_restrict_type, filter_ctf_file_scope, assemble_ctf_cvr_qual },
682-
{ DW_TAG_enumeration_type, NULL, assemble_ctf_enumeration },
683-
{ DW_TAG_enumerator, NULL, assemble_ctf_enumerator },
707+
{ DW_TAG_enumeration_type, filter_ctf_file_scope, assemble_ctf_enumeration },
708+
{ DW_TAG_enumerator, filter_ctf_file_scope, assemble_ctf_enumerator },
684709
{ DW_TAG_pointer_type, filter_ctf_file_scope, assemble_ctf_pointer },
685-
{ DW_TAG_structure_type, NULL, assemble_ctf_struct_union },
686-
{ DW_TAG_union_type, NULL, assemble_ctf_struct_union },
687-
{ DW_TAG_member, NULL, assemble_ctf_su_member },
688-
{ DW_TAG_typedef, NULL, assemble_ctf_typedef },
710+
{ DW_TAG_structure_type, filter_ctf_file_scope, assemble_ctf_struct_union },
711+
{ DW_TAG_union_type, filter_ctf_file_scope, assemble_ctf_struct_union },
712+
{ DW_TAG_member, filter_ctf_file_scope, assemble_ctf_su_member },
713+
{ DW_TAG_typedef, filter_ctf_file_scope, assemble_ctf_typedef },
689714
{ DW_TAG_variable, filter_ctf_uninteresting, assemble_ctf_variable },
690715
{ DW_TAG_volatile_type, filter_ctf_file_scope, assemble_ctf_cvr_qual },
691716
{ 0, NULL }};
@@ -816,6 +841,11 @@ static void private_per_module_free(void *per_module);
816841
*/
817842
static void free_duplicates_id_file(void *id_file);
818843

844+
/*
845+
* Free a fn_to_die_to_parent subhash.
846+
*/
847+
static void private_fn_die_parent_free(void *ptr);
848+
819849
/* Initialization. */
820850

821851
int main(int argc, char *argv[])
@@ -926,6 +956,8 @@ static void run(char *output, int standalone)
926956
private_per_module_free);
927957
variable_blacklist = g_hash_table_new_full(g_str_hash, g_str_equal,
928958
free, free);
959+
fn_to_die_to_parent = g_hash_table_new_full(g_str_hash, g_str_equal, free,
960+
private_fn_die_parent_free);
929961

930962
dw_ctf_trace("Initializing...\n");
931963

@@ -951,6 +983,8 @@ static void run(char *output, int standalone)
951983
g_hash_table_destroy(id_to_type);
952984
g_hash_table_destroy(id_to_module);
953985
g_hash_table_destroy(per_module);
986+
g_hash_table_destroy(variable_blacklist);
987+
g_hash_table_destroy(fn_to_die_to_parent);
954988
}
955989

956990

@@ -1316,6 +1350,70 @@ static void init_ctf_table(const char *module_name)
13161350

13171351
/* DWARF walkers. */
13181352

1353+
/*
1354+
* Initialize the child->parent DIE mapping for a single file.
1355+
*/
1356+
static void init_parent_die(const char *file_name, Dwfl *dwfl)
1357+
{
1358+
GHashTable *offs;
1359+
Dwarf_Die *tu_die = NULL;
1360+
Dwarf_Addr junk;
1361+
1362+
offs = g_hash_table_new(g_direct_hash, g_direct_equal);
1363+
if (offs == NULL) {
1364+
fprintf(stderr, "Out of memory creating DIE offset hash\n");
1365+
exit(1);
1366+
}
1367+
1368+
while ((tu_die = dwfl_nextcu(dwfl, tu_die, &junk)) != NULL) {
1369+
init_parent_die_internal(file_name, offs, tu_die);
1370+
}
1371+
1372+
g_hash_table_insert(fn_to_die_to_parent,
1373+
strdup(abs_file_name(file_name)), offs);
1374+
}
1375+
1376+
/*
1377+
* Initialize one layer of a child->parent mapping.
1378+
*/
1379+
static void init_parent_die_internal(const char *file_name,
1380+
GHashTable *offs, Dwarf_Die *parent)
1381+
{
1382+
Dwarf_Die child;
1383+
int sib_ret;
1384+
Dwarf_Off parent_offset;
1385+
const char *err;
1386+
1387+
switch (dwarf_child(parent, &child)) {
1388+
case -1:
1389+
err = "child DIEs";
1390+
goto err;
1391+
case 1: /* This DIE has no children */
1392+
return;
1393+
}
1394+
1395+
parent_offset = dwarf_dieoffset(parent);
1396+
1397+
do {
1398+
g_hash_table_insert(offs,
1399+
GUINT_TO_POINTER(dwarf_dieoffset(&child)),
1400+
GUINT_TO_POINTER(parent_offset));
1401+
init_parent_die_internal(file_name, offs, &child);
1402+
} while ((sib_ret = dwarf_siblingof (&child, &child)) == 0);
1403+
1404+
if (sib_ret == -1) {
1405+
err = "sibling DIEs";
1406+
goto err;
1407+
}
1408+
return;
1409+
err:
1410+
fprintf(stderr, "Cannot fetch %s of DIE at offset %lu in %s: %s\n",
1411+
err, (unsigned long) dwarf_dieoffset(parent), file_name,
1412+
dwarf_errmsg(dwarf_errno()));
1413+
exit(1);
1414+
1415+
}
1416+
13191417
/*
13201418
* Type ID computation.
13211419
*
@@ -1707,7 +1805,9 @@ static void process_file(const char *file_name,
17071805
char *fn_module_name = fn_to_module(file_name);
17081806
const char *module_name = fn_module_name;
17091807

1710-
Dwfl *dwfl = simple_dwfl_new(file_name, NULL);
1808+
Dwfl_Module *mod;
1809+
Dwfl *dwfl;
1810+
Dwarf *dwarf;
17111811
GHashTable *seen_before = g_hash_table_new_full(g_str_hash, g_str_equal,
17121812
free, free);
17131813
Dwarf_Die *tu_die = NULL;
@@ -1718,6 +1818,18 @@ static void process_file(const char *file_name,
17181818
exit(1);
17191819
}
17201820

1821+
dwfl = simple_dwfl_new(file_name, &mod);
1822+
dwarf = dwfl_module_getdwarf(mod, &junk);
1823+
1824+
/*
1825+
* On first traversal, make sure the DIE parent mapping is populated,
1826+
* so that filters and processing functions can use it.
1827+
*/
1828+
if (!g_hash_table_lookup_extended(fn_to_die_to_parent,
1829+
abs_file_name(file_name),
1830+
NULL, NULL))
1831+
init_parent_die(file_name, dwfl);
1832+
17211833
while ((tu_die = dwfl_nextcu(dwfl, tu_die, &junk)) != NULL) {
17221834
const char *tu_name;
17231835

@@ -1772,7 +1884,7 @@ static void process_file(const char *file_name,
17721884
if (tu_init != NULL)
17731885
tu_init(module_name, file_name, tu_die, data);
17741886

1775-
process_tu_func(module_name, file_name, tu_die, &die,
1887+
process_tu_func(module_name, file_name, dwarf, tu_die, &die,
17761888
dwarf_process, data);
17771889

17781890
if (tu_done != NULL)
@@ -1792,11 +1904,12 @@ static void process_file(const char *file_name,
17921904
}
17931905

17941906
/*
1795-
* process_file() helper, walking over subroutines and their contained blocks
1796-
* recursively and picking up types therein.
1907+
* process_file() helper, walking over the top level and picking up types
1908+
* therein.
17971909
*/
17981910
static void process_tu_func(const char *module_name,
17991911
const char *file_name,
1912+
Dwarf *dwarf,
18001913
Dwarf_Die *parent_die,
18011914
Dwarf_Die *die,
18021915
void (*dwarf_process)(const char *module_name,
@@ -1811,35 +1924,16 @@ static void process_tu_func(const char *module_name,
18111924

18121925
/*
18131926
* We are only interested in definitions for which we can (eventually)
1814-
* emit CTF: call the processing function for all such. Recurse into
1815-
* subprograms to catch type declarations there as well, since there may
1816-
* be definitions of aggregates referred to outside this function only
1817-
* opaquely.
1927+
* emit CTF: call the processing function for all such.
18181928
*/
18191929
do {
18201930
if ((dwarf_tag(die) <= assembly_len) &&
18211931
(assembly_filter_tab[dwarf_tag(die)] == NULL ||
1822-
assembly_filter_tab[dwarf_tag(die)](die, parent_die)) &&
1932+
assembly_filter_tab[dwarf_tag(die)](file_name, dwarf, die,
1933+
parent_die)) &&
18231934
(assembly_tab[dwarf_tag(die)] != NULL))
18241935
dwarf_process(module_name, file_name, die,
18251936
parent_die, data);
1826-
1827-
if ((dwarf_tag(die) == DW_TAG_subprogram) ||
1828-
(dwarf_tag(die) == DW_TAG_lexical_block)) {
1829-
Dwarf_Die subroutine_die;
1830-
1831-
switch (dwarf_child(die, &subroutine_die)) {
1832-
case -1:
1833-
err = "fetch first child of subroutine";
1834-
goto fail;
1835-
case 1: /* No DIEs at all in this subroutine */
1836-
continue;
1837-
default: /* Child DIEs exist. */
1838-
break;
1839-
}
1840-
process_tu_func(module_name, file_name, die,
1841-
&subroutine_die, dwarf_process, data);
1842-
}
18431937
} while ((sib_ret = dwarf_siblingof(die, die)) == 0);
18441938

18451939
if (sib_ret == -1) {
@@ -3107,49 +3201,56 @@ static ctf_id_t lookup_ctf_type(const char *module_name, const char *file_name,
31073201

31083202
/*
31093203
* A CTF assembly filter function which excludes all types not at the global
3110-
* scope (i.e. whose immediate parent is not a CU DIE) and which does not have a
3111-
* structure or union as its ultimate dependent type. (All structures and
3112-
* unions and everything dependent on them must be recorded, even inside
3113-
* functions, because GCC may emit references to the opaque variants of those
3114-
* types from file scope.)
3204+
* scope (i.e. whose immediate parent is not a CU DIE), and all types which
3205+
* reference a type which is not at the global scope (thus ruling out local type
3206+
* definitions for which the compiler is not consistently emitting all
3207+
* intermediate types at the local scope).
31153208
*/
3116-
static int filter_ctf_file_scope(Dwarf_Die *die, Dwarf_Die *parent_die)
3209+
static int filter_ctf_file_scope(const char *file_name, Dwarf *dwarf,
3210+
Dwarf_Die *die, Dwarf_Die *parent_die)
31173211
{
3212+
Dwarf_Die type_die;
3213+
GHashTable *parents;
3214+
31183215
/*
3119-
* Find the ultimate parent of this DIE.
3216+
* A type not dependent on another is acceptable iff it is at the global
3217+
* scope.
31203218
*/
3219+
if (private_dwarf_type(die, &type_die) == NULL)
3220+
return (dwarf_tag(parent_die) == DW_TAG_compile_unit);
31213221

3122-
Dwarf_Die dependent_die;
3123-
Dwarf_Die *dependent_diep = private_dwarf_type(die, &dependent_die);
3124-
3125-
if (dependent_diep != NULL) {
3126-
Dwarf_Die *possible_depp = dependent_diep;
3127-
do {
3128-
Dwarf_Die possible_dep;
3129-
possible_depp = private_dwarf_type(possible_depp,
3130-
&possible_dep);
3222+
/*
3223+
* No type we reference may have a subprogram DIE as any of its parents.
3224+
*/
3225+
parents = g_hash_table_lookup(fn_to_die_to_parent,
3226+
abs_file_name(file_name));
31313227

3132-
if (possible_depp != NULL)
3133-
dependent_die = possible_dep;
3134-
} while (possible_depp != NULL);
3135-
}
3228+
do {
3229+
Dwarf_Die parent = type_die;
3230+
Dwarf_Off parent_off = 0;
31363231

3137-
if (dependent_diep)
3138-
return (dwarf_tag(dependent_diep) == DW_TAG_structure_type ||
3139-
dwarf_tag(dependent_diep) == DW_TAG_union_type ||
3140-
dwarf_tag(dependent_diep) == DW_TAG_enumeration_type ||
3141-
dwarf_tag(parent_die) == DW_TAG_compile_unit);
3142-
else
3143-
return (dwarf_tag(parent_die) == DW_TAG_compile_unit);
3232+
do {
3233+
if (parent_off != 0 &&
3234+
!dwarf_offdie(dwarf, parent_off, &parent))
3235+
break;
3236+
if (dwarf_tag(&parent) == DW_TAG_subprogram)
3237+
return 0;
3238+
} while ((parent_off = GPOINTER_TO_UINT(g_hash_table_lookup(parents,
3239+
GUINT_TO_POINTER(dwarf_dieoffset(&parent)))))
3240+
!= 0);
3241+
} while (private_dwarf_type(&type_die, &type_die) != NULL);
3242+
3243+
return 1;
31443244
}
31453245

31463246
/*
31473247
* A CTF assembly filter function which excludes all names not at the global
31483248
* scope, and all names whose names are unlikely to be interesting. (DTrace
31493249
* userspace contains a similar list, but the two lists need not be in sync.)
31503250
*/
3151-
static int filter_ctf_uninteresting(Dwarf_Die *die,
3152-
Dwarf_Die *parent_die)
3251+
static int filter_ctf_uninteresting(const char *file_name __unused__,
3252+
Dwarf *dwarf __unused__,
3253+
Dwarf_Die *die, Dwarf_Die *parent_die)
31533254
{
31543255
const char *sym_name = dwarf_diename(die);
31553256

@@ -4666,6 +4767,14 @@ static void private_per_module_free(void *per_module)
46664767
free(per_module);
46674768
}
46684769

4770+
/*
4771+
* Free a fn_to_die_to_parent subhash.
4772+
*/
4773+
static void private_fn_die_parent_free(void *ptr)
4774+
{
4775+
g_hash_table_destroy((GHashTable *) ptr);
4776+
}
4777+
46694778
/*
46704779
* Get a ctf_file out of the per_module hash for a given module.
46714780
*/

0 commit comments

Comments
 (0)