Releases: PyThaiNLP/pythainlp
Releases · PyThaiNLP/pythainlp
PyThaiNLP 2.1.dev5
- Change from
marisa-trie
to a Trie implementation written in python
PyThaiNLP 2.1.dev4
Merge pull request #273 from PyThaiNLP/ner-tag Add test cases for NER
PyThaiNLP 2.0.7
PyThaiNLP 2.0.7 Release
change log
- Bug fix: Include case THANTHAKHAT and SARA U, UU too (pythainlp.util.normalize) #244
Upgrade : pip install -U pythainlp
Docs : https://thainlp.org/pythainlp/docs/2.0/
User guide: https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb
PyThaiNLP 2.1.dev2
Update Version
PyThaiNLP 2.0.6
- fixed #230
- new train ThaiNER
PyThaiNLP 2.0.5
- Clean word lists in
pythainlp.corpus
(remove duplicates, etc.) - Fix/add return type hinting for functions in
pythainlp.corpus
- Fix deprecated inline flag for regular expression in
pythainlp.corpus.tnc
(Thai National Corpus) - Bug fix: reorder condition checks in
pythainlp.tokenize.dict_trie
so it catchTrie
beforeIterable
PyThaiNLP 2.0.4
word_tokenize()
's argumentwhitespaces
is nowkeep_whitespace
to make is more explicit, default behavior is to keep whitespacesword_tokenize()
can now take a custom dictionary throughtcustom_dict
parameterdict_word_tokenize()
will be deprecated soon
PyThaiNLP 2.0.3
- Fix TCC (Thai Textbook Corpus) corpus always downloading new file issue
- Words and their frequencies from TTC (Thai Textbook Corpus) now has a local copy at
ttc_freq.txt
insidepythainlp.corpus
. - Other refactoring and code improvements, including ones related to subword tokenization (Thai Character Cluster / TCC and ETCC), see #193
PyThaiNLP 2.0.2
- Fixed tree map
- Subword tokeniser documentation improvement #190