|
| 1 | +Compact ImageMap Format |
| 2 | +======================= |
| 3 | + |
| 4 | +A process' address space contains (among other things) the set of |
| 5 | +dynamically loaded images that have been mapped into that address |
| 6 | +space. When generating crash logs or symbolicating backtraces, we |
| 7 | +need to be able to capture and potentially store the list of images |
| 8 | +that has been loaded, as well as some of the attributes of those |
| 9 | +images, including each image's |
| 10 | + |
| 11 | +- Path |
| 12 | +- Build ID (aka UUID) |
| 13 | +- Base address |
| 14 | +- End-of-text address |
| 15 | + |
| 16 | +Compact ImageMap Format (CIF) is a binary format for holding this |
| 17 | +information. |
| 18 | + |
| 19 | +### General Format |
| 20 | + |
| 21 | +Compact ImageMap Format data is byte aligned and starts with an |
| 22 | +information byte: |
| 23 | + |
| 24 | +~~~ |
| 25 | + 7 6 5 4 3 2 1 0 |
| 26 | + ┌───────────────────────┬───────┐ |
| 27 | + │ version │ size │ |
| 28 | + └───────────────────────┴───────┘ |
| 29 | +~~~ |
| 30 | + |
| 31 | +The `version` field identifies the version of CIF that is in use; this |
| 32 | +document describes version `0`. The `size` field is encoded as |
| 33 | +follows: |
| 34 | + |
| 35 | +| `size` | Machine word size | |
| 36 | +| :----: | :---------------- | |
| 37 | +| 00 | 16-bit | |
| 38 | +| 01 | 32-bit | |
| 39 | +| 10 | 64-bit | |
| 40 | +| 11 | Reserved | |
| 41 | + |
| 42 | +This is followed immediately by a field encoding the number of images |
| 43 | +in the image map; this field is encoded as a sequence of bytes, each |
| 44 | +holding seven bits of data, with the top bit clear for the final byte. |
| 45 | +The most significant byte is the first. e.g. |
| 46 | + |
| 47 | +| `count` | Encoding | |
| 48 | +| ------: | :---------- | |
| 49 | +| 0 | 00 | |
| 50 | +| 1 | 01 | |
| 51 | +| 127 | 7f | |
| 52 | +| 128 | 81 00 | |
| 53 | +| 129 | 81 01 | |
| 54 | +| 700 | 85 3c | |
| 55 | +| 1234 | 89 52 | |
| 56 | +| 16384 | 81 80 00 | |
| 57 | +| 65535 | 83 ff 7f | |
| 58 | +| 2097152 | 81 80 80 00 | |
| 59 | + |
| 60 | +This in turn is followed by the list of images, stored in order of |
| 61 | +increasing base address. For each image, we start with a header byte: |
| 62 | + |
| 63 | +~~~ |
| 64 | + 7 6 5 4 3 2 1 0 |
| 65 | + ┌───┬───┬───────────┬───────────┐ |
| 66 | + │ r │ 0 │ acount │ ecount │ |
| 67 | + └───┴───┴───────────┴───────────┘ |
| 68 | +~~~ |
| 69 | + |
| 70 | +If `r` is set, then the base address is understood to be relative to |
| 71 | +the previously computed base address. |
| 72 | + |
| 73 | +This byte is followed by `acount + 1` bytes of base address, then |
| 74 | +`ecount + 1` bytes of offset to the end of text. |
| 75 | + |
| 76 | +Following this is an encoded count of bytes in the build ID, |
| 77 | +encoded using the 7-bit scheme we used to encode the image count, and |
| 78 | +then after that come the build ID bytes themselves. |
| 79 | + |
| 80 | +Finally, we encode the path string using the scheme below. |
| 81 | + |
| 82 | +### String Encoding |
| 83 | + |
| 84 | +Image paths contain a good deal of redundancy; paths are therefore |
| 85 | +encoded using a prefix compression scheme. The basic idea here is |
| 86 | +that while generating or reading the data, we maintain a mapping from |
| 87 | +small integers to path prefix segments. |
| 88 | + |
| 89 | +The mapping is initialised with the following fixed list that never |
| 90 | +need to be stored in CIF data: |
| 91 | + |
| 92 | +| code | Path prefix | |
| 93 | +| :--: | :---------------------------------- | |
| 94 | +| 0 | `/lib` | |
| 95 | +| 1 | `/usr/lib` | |
| 96 | +| 2 | `/usr/local/lib` | |
| 97 | +| 3 | `/opt/lib` | |
| 98 | +| 4 | `/System/Library/Frameworks` | |
| 99 | +| 5 | `/System/Library/PrivateFrameworks` | |
| 100 | +| 6 | `/System/iOSSupport` | |
| 101 | +| 7 | `/Library/Frameworks` | |
| 102 | +| 8 | `/System/Applications` | |
| 103 | +| 9 | `/Applications` | |
| 104 | +| 10 | `C:\Windows\System32` | |
| 105 | +| 11 | `C:\Program Files\` | |
| 106 | + |
| 107 | +Codes below 32 are reserved for future expansion of the fixed list. |
| 108 | + |
| 109 | +Strings are encoded as a sequence of bytes, as follows: |
| 110 | + |
| 111 | +| `opcode` | Mnemonic | Meaning | |
| 112 | +| :--------: | :-------- | :---------------------------------------- | |
| 113 | +| `00000000` | `end` | Marks the end of the string | |
| 114 | +| `00xxxxxx` | `str` | Raw string data | |
| 115 | +| `01xxxxxx` | `framewk` | Names a framework | |
| 116 | +| `1exxxxxx` | `expand` | Identifies a prefix in the table | |
| 117 | + |
| 118 | +#### `end` |
| 119 | + |
| 120 | +##### Encoding |
| 121 | + |
| 122 | +~~~ |
| 123 | + 7 6 5 4 3 2 1 0 |
| 124 | + ┌───────────────────────────────┐ |
| 125 | + │ 0 0 0 0 0 0 0 0 │ end |
| 126 | + └───────────────────────────────┘ |
| 127 | +~~~ |
| 128 | + |
| 129 | +#### Meaning |
| 130 | + |
| 131 | +Marks the end of the string |
| 132 | + |
| 133 | +#### `str` |
| 134 | + |
| 135 | +##### Encoding |
| 136 | + |
| 137 | +~~~ |
| 138 | + 7 6 5 4 3 2 1 0 |
| 139 | + ┌───────┬───────────────────────┐ |
| 140 | + │ 0 0 │ count │ str |
| 141 | + └───────┴───────────────────────┘ |
| 142 | +~~~ |
| 143 | + |
| 144 | +##### Meaning |
| 145 | + |
| 146 | +The next `count` bytes are included in the string verbatim. |
| 147 | +Additionally, all path prefixes of this string data will be added to |
| 148 | +the current prefix table. For instance, if the string data is |
| 149 | +`/swift/linux/x86_64/libfoo.so`, then the prefix `/swift` will be |
| 150 | +assigned the next available code, `/swift/linux` the code after that, |
| 151 | +and `/swift/linux/x86_64` the code following that one. |
| 152 | + |
| 153 | +#### `framewk` |
| 154 | + |
| 155 | +##### Encoding |
| 156 | + |
| 157 | +~~~ |
| 158 | + 7 6 5 4 3 2 1 0 |
| 159 | + ┌───────┬───────────────────────┐ |
| 160 | + │ 0 1 │ count │ framewk |
| 161 | + └───────┴───────────────────────┘ |
| 162 | +~~~ |
| 163 | + |
| 164 | +##### Meaning |
| 165 | + |
| 166 | +The next byte is a version character (normally `A`, but some |
| 167 | +frameworks use higher characters), after which there are `count + 1` |
| 168 | +bytes of name. |
| 169 | + |
| 170 | +This is expanded using the pattern |
| 171 | +`/<name>.framework/Versions/<version>/<name>`. This also marks the |
| 172 | +end of the string. |
| 173 | + |
| 174 | +#### `expand` |
| 175 | + |
| 176 | +##### Encoding |
| 177 | + |
| 178 | +~~~ |
| 179 | + 7 6 5 4 3 2 1 0 |
| 180 | + ┌───┬───┬───────────────────────┐ |
| 181 | + │ 1 │ e │ code │ expand |
| 182 | + └───┴───┴───────────────────────┘ |
| 183 | +~~~ |
| 184 | + |
| 185 | +##### Meaning |
| 186 | + |
| 187 | +If `e` is `0`, `code` is the index into the prefix table for the |
| 188 | +prefix that should be appended to the string at this point. |
| 189 | + |
| 190 | +If `e` is `1`, this opcode is followed by `code + 1` bytes that give |
| 191 | +a value `v` such that `v + 64` is the index into the prefix table for |
| 192 | +the prefix that should be appended to the string at this point. |
| 193 | + |
| 194 | +#### Example |
| 195 | + |
| 196 | +Let's say we wish to encode the following strings: |
| 197 | + |
| 198 | + /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit |
| 199 | + /System/Library/Frameworks/Photos.framework/Versions/A/Photos |
| 200 | + /usr/lib/libobjc.A.dylib |
| 201 | + /usr/lib/libz.1.dylib |
| 202 | + /usr/lib/swift/libswiftCore.dylib |
| 203 | + /usr/lib/libSystem.B.dylib |
| 204 | + /usr/lib/libc++.1.dylib |
| 205 | + |
| 206 | +We would encode |
| 207 | + |
| 208 | + <84> <45> CAppKit <00> |
| 209 | + |
| 210 | +We then follow with |
| 211 | + |
| 212 | + <84> <45> APhotos <00> |
| 213 | + |
| 214 | +Next we have |
| 215 | + |
| 216 | + <81> <10> /libobjc.A.dylib <00> |
| 217 | + <81> <0d> /libz.1.dylib <00> |
| 218 | + <81> <19> /swift/libswiftCore.dylib <00> |
| 219 | + |
| 220 | +assigning code 32 to `/swift`, then |
| 221 | + |
| 222 | + <81> <12> /libSystem.B.dylib <00> |
| 223 | + <81> <0f> /libc++.1.dylib <00> |
| 224 | + |
| 225 | +In total the original data would have taken up 256 bytes. Instead, we |
| 226 | +have used 122 bytes, a saving of over 50%. |
0 commit comments