|
| 1 | +// Copyright 2015 The Rust Project Developers. See the COPYRIGHT |
| 2 | +// file at the top-level directory of this distribution and at |
| 3 | +// http://!rust-lang.org/COPYRIGHT. |
| 4 | +// |
| 5 | +// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or |
| 6 | +// http://!www.apache.org/licenses/LICENSE-2.0> or the MIT license |
| 7 | +// <LICENSE-MIT or http://!opensource.org/licenses/MIT>, at your |
| 8 | +// option. This file may not be copied, modified, or distributed |
| 9 | +// except according to those terms. |
| 10 | + |
| 11 | +//! # Debug Info Module |
| 12 | +//! |
| 13 | +//! This module serves the purpose of generating debug symbols. We use LLVM's |
| 14 | +//! [source level debugging](http://!llvm.org/docs/SourceLevelDebugging.html) |
| 15 | +//! features for generating the debug information. The general principle is |
| 16 | +//! this: |
| 17 | +//! |
| 18 | +//! Given the right metadata in the LLVM IR, the LLVM code generator is able to |
| 19 | +//! create DWARF debug symbols for the given code. The |
| 20 | +//! [metadata](http://!llvm.org/docs/LangRef.html#metadata-type) is structured |
| 21 | +//! much like DWARF *debugging information entries* (DIE), representing type |
| 22 | +//! information such as datatype layout, function signatures, block layout, |
| 23 | +//! variable location and scope information, etc. It is the purpose of this |
| 24 | +//! module to generate correct metadata and insert it into the LLVM IR. |
| 25 | +//! |
| 26 | +//! As the exact format of metadata trees may change between different LLVM |
| 27 | +//! versions, we now use LLVM |
| 28 | +//! [DIBuilder](http://!llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html) |
| 29 | +//! to create metadata where possible. This will hopefully ease the adaption of |
| 30 | +//! this module to future LLVM versions. |
| 31 | +//! |
| 32 | +//! The public API of the module is a set of functions that will insert the |
| 33 | +//! correct metadata into the LLVM IR when called with the right parameters. |
| 34 | +//! The module is thus driven from an outside client with functions like |
| 35 | +//! `debuginfo::create_local_var_metadata(bcx: block, local: &ast::local)`. |
| 36 | +//! |
| 37 | +//! Internally the module will try to reuse already created metadata by |
| 38 | +//! utilizing a cache. The way to get a shared metadata node when needed is |
| 39 | +//! thus to just call the corresponding function in this module: |
| 40 | +//! |
| 41 | +//! let file_metadata = file_metadata(crate_context, path); |
| 42 | +//! |
| 43 | +//! The function will take care of probing the cache for an existing node for |
| 44 | +//! that exact file path. |
| 45 | +//! |
| 46 | +//! All private state used by the module is stored within either the |
| 47 | +//! CrateDebugContext struct (owned by the CrateContext) or the |
| 48 | +//! FunctionDebugContext (owned by the FunctionContext). |
| 49 | +//! |
| 50 | +//! This file consists of three conceptual sections: |
| 51 | +//! 1. The public interface of the module |
| 52 | +//! 2. Module-internal metadata creation functions |
| 53 | +//! 3. Minor utility functions |
| 54 | +//! |
| 55 | +//! |
| 56 | +//! ## Recursive Types |
| 57 | +//! |
| 58 | +//! Some kinds of types, such as structs and enums can be recursive. That means |
| 59 | +//! that the type definition of some type X refers to some other type which in |
| 60 | +//! turn (transitively) refers to X. This introduces cycles into the type |
| 61 | +//! referral graph. A naive algorithm doing an on-demand, depth-first traversal |
| 62 | +//! of this graph when describing types, can get trapped in an endless loop |
| 63 | +//! when it reaches such a cycle. |
| 64 | +//! |
| 65 | +//! For example, the following simple type for a singly-linked list... |
| 66 | +//! |
| 67 | +//! ``` |
| 68 | +//! struct List { |
| 69 | +//! value: int, |
| 70 | +//! tail: Option<Box<List>>, |
| 71 | +//! } |
| 72 | +//! ``` |
| 73 | +//! |
| 74 | +//! will generate the following callstack with a naive DFS algorithm: |
| 75 | +//! |
| 76 | +//! ``` |
| 77 | +//! describe(t = List) |
| 78 | +//! describe(t = int) |
| 79 | +//! describe(t = Option<Box<List>>) |
| 80 | +//! describe(t = Box<List>) |
| 81 | +//! describe(t = List) // at the beginning again... |
| 82 | +//! ... |
| 83 | +//! ``` |
| 84 | +//! |
| 85 | +//! To break cycles like these, we use "forward declarations". That is, when |
| 86 | +//! the algorithm encounters a possibly recursive type (any struct or enum), it |
| 87 | +//! immediately creates a type description node and inserts it into the cache |
| 88 | +//! *before* describing the members of the type. This type description is just |
| 89 | +//! a stub (as type members are not described and added to it yet) but it |
| 90 | +//! allows the algorithm to already refer to the type. After the stub is |
| 91 | +//! inserted into the cache, the algorithm continues as before. If it now |
| 92 | +//! encounters a recursive reference, it will hit the cache and does not try to |
| 93 | +//! describe the type anew. |
| 94 | +//! |
| 95 | +//! This behaviour is encapsulated in the 'RecursiveTypeDescription' enum, |
| 96 | +//! which represents a kind of continuation, storing all state needed to |
| 97 | +//! continue traversal at the type members after the type has been registered |
| 98 | +//! with the cache. (This implementation approach might be a tad over- |
| 99 | +//! engineered and may change in the future) |
| 100 | +//! |
| 101 | +//! |
| 102 | +//! ## Source Locations and Line Information |
| 103 | +//! |
| 104 | +//! In addition to data type descriptions the debugging information must also |
| 105 | +//! allow to map machine code locations back to source code locations in order |
| 106 | +//! to be useful. This functionality is also handled in this module. The |
| 107 | +//! following functions allow to control source mappings: |
| 108 | +//! |
| 109 | +//! + set_source_location() |
| 110 | +//! + clear_source_location() |
| 111 | +//! + start_emitting_source_locations() |
| 112 | +//! |
| 113 | +//! `set_source_location()` allows to set the current source location. All IR |
| 114 | +//! instructions created after a call to this function will be linked to the |
| 115 | +//! given source location, until another location is specified with |
| 116 | +//! `set_source_location()` or the source location is cleared with |
| 117 | +//! `clear_source_location()`. In the later case, subsequent IR instruction |
| 118 | +//! will not be linked to any source location. As you can see, this is a |
| 119 | +//! stateful API (mimicking the one in LLVM), so be careful with source |
| 120 | +//! locations set by previous calls. It's probably best to not rely on any |
| 121 | +//! specific state being present at a given point in code. |
| 122 | +//! |
| 123 | +//! One topic that deserves some extra attention is *function prologues*. At |
| 124 | +//! the beginning of a function's machine code there are typically a few |
| 125 | +//! instructions for loading argument values into allocas and checking if |
| 126 | +//! there's enough stack space for the function to execute. This *prologue* is |
| 127 | +//! not visible in the source code and LLVM puts a special PROLOGUE END marker |
| 128 | +//! into the line table at the first non-prologue instruction of the function. |
| 129 | +//! In order to find out where the prologue ends, LLVM looks for the first |
| 130 | +//! instruction in the function body that is linked to a source location. So, |
| 131 | +//! when generating prologue instructions we have to make sure that we don't |
| 132 | +//! emit source location information until the 'real' function body begins. For |
| 133 | +//! this reason, source location emission is disabled by default for any new |
| 134 | +//! function being translated and is only activated after a call to the third |
| 135 | +//! function from the list above, `start_emitting_source_locations()`. This |
| 136 | +//! function should be called right before regularly starting to translate the |
| 137 | +//! top-level block of the given function. |
| 138 | +//! |
| 139 | +//! There is one exception to the above rule: `llvm.dbg.declare` instruction |
| 140 | +//! must be linked to the source location of the variable being declared. For |
| 141 | +//! function parameters these `llvm.dbg.declare` instructions typically occur |
| 142 | +//! in the middle of the prologue, however, they are ignored by LLVM's prologue |
| 143 | +//! detection. The `create_argument_metadata()` and related functions take care |
| 144 | +//! of linking the `llvm.dbg.declare` instructions to the correct source |
| 145 | +//! locations even while source location emission is still disabled, so there |
| 146 | +//! is no need to do anything special with source location handling here. |
| 147 | +//! |
| 148 | +//! ## Unique Type Identification |
| 149 | +//! |
| 150 | +//! In order for link-time optimization to work properly, LLVM needs a unique |
| 151 | +//! type identifier that tells it across compilation units which types are the |
| 152 | +//! same as others. This type identifier is created by |
| 153 | +//! TypeMap::get_unique_type_id_of_type() using the following algorithm: |
| 154 | +//! |
| 155 | +//! (1) Primitive types have their name as ID |
| 156 | +//! (2) Structs, enums and traits have a multipart identifier |
| 157 | +//! |
| 158 | +//! (1) The first part is the SVH (strict version hash) of the crate they |
| 159 | +//! wereoriginally defined in |
| 160 | +//! |
| 161 | +//! (2) The second part is the ast::NodeId of the definition in their |
| 162 | +//! originalcrate |
| 163 | +//! |
| 164 | +//! (3) The final part is a concatenation of the type IDs of their concrete |
| 165 | +//! typearguments if they are generic types. |
| 166 | +//! |
| 167 | +//! (3) Tuple-, pointer and function types are structurally identified, which |
| 168 | +//! means that they are equivalent if their component types are equivalent |
| 169 | +//! (i.e. (int, int) is the same regardless in which crate it is used). |
| 170 | +//! |
| 171 | +//! This algorithm also provides a stable ID for types that are defined in one |
| 172 | +//! crate but instantiated from metadata within another crate. We just have to |
| 173 | +//! take care to always map crate and node IDs back to the original crate |
| 174 | +//! context. |
| 175 | +//! |
| 176 | +//! As a side-effect these unique type IDs also help to solve a problem arising |
| 177 | +//! from lifetime parameters. Since lifetime parameters are completely omitted |
| 178 | +//! in debuginfo, more than one `Ty` instance may map to the same debuginfo |
| 179 | +//! type metadata, that is, some struct `Struct<'a>` may have N instantiations |
| 180 | +//! with different concrete substitutions for `'a`, and thus there will be N |
| 181 | +//! `Ty` instances for the type `Struct<'a>` even though it is not generic |
| 182 | +//! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as |
| 183 | +//! cheap identifier for type metadata---we have done this in the past, but it |
| 184 | +//! led to unnecessary metadata duplication in the best case and LLVM |
| 185 | +//! assertions in the worst. However, the unique type ID as described above |
| 186 | +//! *can* be used as identifier. Since it is comparatively expensive to |
| 187 | +//! construct, though, `ty::type_id()` is still used additionally as an |
| 188 | +//! optimization for cases where the exact same type has been seen before |
| 189 | +//! (which is most of the time). |
0 commit comments