swiftlang
diff --git a/‎docs/Backtracing.rst
Lines changed: 3 additions & 0 deletions b/‎docs/Backtracing.rst
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/CompactBacktraceFormat.md
Lines changed: 1 addition & 1 deletion b/‎docs/CompactBacktraceFormat.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/CompactImageMapFormat.md
Lines changed: 226 additions & 0 deletions b/‎docs/CompactImageMapFormat.md
Lines changed: 226 additions & 0 deletions
@@ -327,3 +327,6 @@ Backtraces are stored internally in a format called :download:`Compact Backtrace
 Format <CompactBacktraceFormat.md>`.  This provides us with a way to store a
 large number of frames in a much smaller space than would otherwise be possible.
 
+Similarly, where we need to store address to image mappings, we
+use :download:`Compact ImageMap Format <CompactImageMapFormat.md>` to minimise
+storage requirements.
@@ -25,7 +25,7 @@ information byte:
 ~~~
 
 The `version` field identifies the version of CBF that is in use; this
-document describes version `0`. The `size` field is encoded as
+document describes version `0`. The `size` field is encqoded as
 follows:
 
 | `size` | Machine word size |
 
@@ -0,0 +1,226 @@
+Compact ImageMap Format
+=======================
+
+A process' address space contains (among other things) the set of
+dynamically loaded images that have been mapped into that address
+space.  When generating crash logs or symbolicating backtraces, we
+need to be able to capture and potentially store the list of images
+that has been loaded, as well as some of the attributes of those
+images, including each image's
+
+- Path
+- Build ID (aka UUID)
+- Base address
+- End-of-text address
+
+Compact ImageMap Format (CIF) is a binary format for holding this
+information.
+
+### General Format
+
+Compact ImageMap Format data is byte aligned and starts with an
+information byte:
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───────────────────────┬───────┐
+ │ version               │ size  │
+ └───────────────────────┴───────┘
+~~~
+
+The `version` field identifies the version of CIF that is in use; this
+document describes version `0`. The `size` field is encoded as
+follows:
+
+| `size` | Machine word size |
+| :----: | :---------------- |
+|   00   | 16-bit            |
+|   01   | 32-bit            |
+|   10   | 64-bit            |
+|   11   | Reserved          |
+
+This is followed immediately by a field encoding the number of images
+in the image map; this field is encoded as a sequence of bytes, each
+holding seven bits of data, with the top bit clear for the final byte.
+The most significant byte is the first.  e.g.
+
+| `count` | Encoding    |
+| ------: | :---------- |
+|       0 | 00          |
+|       1 | 01          |
+|     127 | 7f          |
+|     128 | 81 00       |
+|     129 | 81 01       |
+|     700 | 85 3c       |
+|    1234 | 89 52       |
+|   16384 | 81 80 00    |
+|   65535 | 83 ff 7f    |
+| 2097152 | 81 80 80 00 |
+
+This in turn is followed by the list of images, stored in order of
+increasing base address.  For each image, we start with a header byte:
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───┬───┬───────────┬───────────┐
+ │ r │ 0 │ acount    │ ecount    │
+ └───┴───┴───────────┴───────────┘
+~~~
+
+If `r` is set, then the base address is understood to be relative to
+the previously computed base address.
+
+This byte is followed by `acount + 1` bytes of base address, then
+`ecount + 1` bytes of offset to the end of text.
+
+Following this is an encoded count of bytes in the build ID,
+encoded using the 7-bit scheme we used to encode the image count, and
+then after that come the build ID bytes themselves.
+
+Finally, we encode the path string using the scheme below.
+
+### String Encoding
+
+Image paths contain a good deal of redundancy; paths are therefore
+encoded using a prefix compression scheme.  The basic idea here is
+that while generating or reading the data, we maintain a mapping from
+small integers to path prefix segments.
+
+The mapping is initialised with the following fixed list that never
+need to be stored in CIF data:
+
+| code | Path prefix                         |
+| :--: | :---------------------------------- |
+|   0  | `/lib`                              |
+|   1  | `/usr/lib`                          |
+|   2  | `/usr/local/lib`                    |
+|   3  | `/opt/lib`                          |
+|   4  | `/System/Library/Frameworks`        |
+|   5  | `/System/Library/PrivateFrameworks` |
+|   6  | `/System/iOSSupport`                |
+|   7  | `/Library/Frameworks`               |
+|   8  | `/System/Applications`              |
+|   9  | `/Applications`                     |
+|  10  | `C:\Windows\System32`               |
+|  11  | `C:\Program Files\`                 |
+
+Codes below 32 are reserved for future expansion of the fixed list.
+
+Strings are encoded as a sequence of bytes, as follows:
+
+|  `opcode`  | Mnemonic  | Meaning                                   |
+| :--------: | :-------- | :---------------------------------------- |
+| `00000000` | `end`     | Marks the end of the string               |
+| `00xxxxxx` | `str`     | Raw string data                           |
+| `01xxxxxx` | `framewk` | Names a framework                         |
+| `1exxxxxx` | `expand`  | Identifies a prefix in the table          |
+
+#### `end`
+
+##### Encoding
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───────────────────────────────┐
+ │ 0   0   0   0   0   0   0   0 │  end
+ └───────────────────────────────┘
+~~~
+
+#### Meaning
+
+Marks the end of the string
+
+#### `str`
+
+##### Encoding
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───────┬───────────────────────┐
+ │ 0   0 │ count                 │  str
+ └───────┴───────────────────────┘
+~~~
+
+##### Meaning
+
+The next `count` bytes are included in the string verbatim.
+Additionally, all path prefixes of this string data will be added to
+the current prefix table.  For instance, if the string data is
+`/swift/linux/x86_64/libfoo.so`, then the prefix `/swift` will be
+assigned the next available code, `/swift/linux` the code after that,
+and `/swift/linux/x86_64` the code following that one.
+
+#### `framewk`
+
+##### Encoding
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───────┬───────────────────────┐
+ │ 0   1 │ count                 │  framewk
+ └───────┴───────────────────────┘
+~~~
+
+##### Meaning
+
+The next byte is a version character (normally `A`, but some
+frameworks use higher characters), after which there are `count + 1`
+bytes of name.
+
+This is expanded using the pattern
+`/<name>.framework/Versions/<version>/<name>`.  This also marks the
+end of the string.
+
+#### `expand`
+
+##### Encoding
+
+~~~
+   7   6   5   4   3   2   1   0
+ ┌───┬───┬───────────────────────┐
+ │ 1 │ e │ code                  │  expand
+ └───┴───┴───────────────────────┘
+~~~
+
+##### Meaning
+
+If `e` is `0`, `code` is the index into the prefix table for the
+prefix that should be appended to the string at this point.
+
+If `e` is `1`, this opcode is followed by `code + 1` bytes that give
+a value `v` such that `v + 64` is the index into the prefix table for
+the prefix that should be appended to the string at this point.
+
+#### Example
+
+Let's say we wish to encode the following strings:
+
+    /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
+    /System/Library/Frameworks/Photos.framework/Versions/A/Photos
+    /usr/lib/libobjc.A.dylib
+    /usr/lib/libz.1.dylib
+    /usr/lib/swift/libswiftCore.dylib
+    /usr/lib/libSystem.B.dylib
+    /usr/lib/libc++.1.dylib
+
+We would encode
+
+    <84> <45> CAppKit <00>
+
+We then follow with
+
+    <84> <45> APhotos <00>
+
+Next we have
+
+    <81> <10> /libobjc.A.dylib <00>
+    <81> <0d> /libz.1.dylib <00>
+    <81> <19> /swift/libswiftCore.dylib <00>
+
+assigning code 32 to `/swift`, then
+
+    <81> <12> /libSystem.B.dylib <00>
+    <81> <0f> /libc++.1.dylib <00>
+
+In total the original data would have taken up 256 bytes.  Instead, we
+have used 122 bytes, a saving of over 50%.