Skip to content

Commit d9e336d

Browse files
committed
supervisor translate: explain the dictionary
1 parent 9abfc51 commit d9e336d

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

supervisor/shared/translate.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,19 @@
4343
// (building the huffman encoding on UTF-16 code points gave better
4444
// compression than building it on UTF-8 bytes)
4545
//
46+
// - code points starting at 128 (word_start) and potentially extending
47+
// to 255 (word_end) (but never interfering with the target
48+
// language's used code points) stand for dictionary entries in a
49+
// dictionary with size up to 256 code points. The dictionary entries
50+
// are computed with a heuristic based on frequent substrings of 2 to
51+
// 9 code points. These are called "words" but are not, grammatically
52+
// speaking, words. They're just spans of code points that frequently
53+
// occur together.
54+
//
55+
// - dictionary entries are non-overlapping, and the _ending_ index of each
56+
// entry is stored in an array. Since the index given is the ending
57+
// index, the array is called "wends".
58+
//
4659
// The "data" / "tail" construct is so that the struct's last member is a
4760
// "flexible array". However, the _only_ member is not permitted to be
4861
// a flexible member, so we have to declare the first byte as a separte

0 commit comments

Comments
 (0)