Skip to content

Commit 0318eb3

Browse files
committed
makeqstrdata: Work around python3.6 compatibility problem
Discord user Folknology encountered a problem building with Python 3.6.9, `TypeError: ord() expected a character, but string of length 0 found`. I was able to reproduce the problem using Python3.5*, and discovered that the meaning of the regular expression `"|."` had changed in 3.7. Before, ``` >>> [m.group(0) for m in re.finditer("|.", "hello")] ['', '', '', '', '', ''] ``` After: ``` >>> [m.group(0) for m in re.finditer("|.", "hello")] ['', 'h', '', 'e', '', 'l', '', 'l', '', 'o', ''] ``` Check if `words` is empty and if so use `"."` as the regular expression instead. This gives the same result on both versions: ``` ['h', 'e', 'l', 'l', 'o'] ``` and fixes the generation of the huffman dictionary. Folknology verified that this fix worked for them. * I could easily install 3.5 but not 3.6. 3.5 reproduced the same problem
1 parent 8eda917 commit 0318eb3

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

py/makeqstrdata.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,11 @@ class TextSplitter:
109109
def __init__(self, words):
110110
words.sort(key=lambda x: len(x), reverse=True)
111111
self.words = set(words)
112-
self.pat = re.compile("|".join(re.escape(w) for w in words) + "|.", flags=re.DOTALL)
112+
if words:
113+
pat = "|".join(re.escape(w) for w in words) + "|."
114+
else:
115+
pat = "."
116+
self.pat = re.compile(pat, flags=re.DOTALL)
113117

114118
def iter_words(self, text):
115119
s = []

0 commit comments

Comments
 (0)