File tree Expand file tree Collapse file tree 1 file changed +13
-0
lines changed Expand file tree Collapse file tree 1 file changed +13
-0
lines changed Original file line number Diff line number Diff line change 43
43
// (building the huffman encoding on UTF-16 code points gave better
44
44
// compression than building it on UTF-8 bytes)
45
45
//
46
+ // - code points starting at 128 (word_start) and potentially extending
47
+ // to 255 (word_end) (but never interfering with the target
48
+ // language's used code points) stand for dictionary entries in a
49
+ // dictionary with size up to 256 code points. The dictionary entries
50
+ // are computed with a heuristic based on frequent substrings of 2 to
51
+ // 9 code points. These are called "words" but are not, grammatically
52
+ // speaking, words. They're just spans of code points that frequently
53
+ // occur together.
54
+ //
55
+ // - dictionary entries are non-overlapping, and the _ending_ index of each
56
+ // entry is stored in an array. Since the index given is the ending
57
+ // index, the array is called "wends".
58
+ //
46
59
// The "data" / "tail" construct is so that the struct's last member is a
47
60
// "flexible array". However, the _only_ member is not permitted to be
48
61
// a flexible member, so we have to declare the first byte as a separte
You can’t perform that action at this time.
0 commit comments