Skip to content

Releases: PyThaiNLP/pythainlp

PyThaiNLP 2.1.dev5

26 Sep 09:03
50a8e4e
Compare
Choose a tag to compare
PyThaiNLP 2.1.dev5 Pre-release
Pre-release
  • Change from marisa-trie to a Trie implementation written in python

PyThaiNLP 2.1.dev4

21 Sep 06:57
b9025aa
Compare
Choose a tag to compare
PyThaiNLP 2.1.dev4 Pre-release
Pre-release
Merge pull request #273 from PyThaiNLP/ner-tag

Add test cases for NER

PyThaiNLP 2.0.7

16 Aug 04:47
ac77e21
Compare
Choose a tag to compare

PyThaiNLP 2.0.7 Release
change log

  • Bug fix: Include case THANTHAKHAT and SARA U, UU too (pythainlp.util.normalize) #244

Upgrade : pip install -U pythainlp
Docs : https://thainlp.org/pythainlp/docs/2.0/
User guide: https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb

PyThaiNLP 2.1.dev2

19 Jul 12:51
Compare
Choose a tag to compare
PyThaiNLP 2.1.dev2 Pre-release
Pre-release
Update Version

PyThaiNLP 2.0.6

27 Jun 15:32
ed34e2c
Compare
Choose a tag to compare
  • fixed #230
  • new train ThaiNER

PyThaiNLP 2.0.5

09 May 10:43
a5bf6b4
Compare
Choose a tag to compare
  • Clean word lists in pythainlp.corpus (remove duplicates, etc.)
  • Fix/add return type hinting for functions in pythainlp.corpus
  • Fix deprecated inline flag for regular expression in pythainlp.corpus.tnc (Thai National Corpus)
  • Bug fix: reorder condition checks in pythainlp.tokenize.dict_trie so it catch Trie before Iterable

PyThaiNLP 2.0.4

20 Apr 23:15
5fb581f
Compare
Choose a tag to compare
  • word_tokenize()'s argument whitespaces is now keep_whitespace to make is more explicit, default behavior is to keep whitespaces
  • word_tokenize() can now take a custom dictionary throught custom_dict parameter
    • dict_word_tokenize() will be deprecated soon

PyThaiNLP 2.0.3

14 Apr 23:30
efc515f
Compare
Choose a tag to compare
  • Fix TCC (Thai Textbook Corpus) corpus always downloading new file issue
  • Words and their frequencies from TTC (Thai Textbook Corpus) now has a local copy at ttc_freq.txt inside pythainlp.corpus.
  • Other refactoring and code improvements, including ones related to subword tokenization (Thai Character Cluster / TCC and ETCC), see #193

PyThaiNLP 2.0.2

11 Apr 13:23
Compare
Choose a tag to compare
  • Fixed tree map
  • Subword tokeniser documentation improvement #190

PyThaiNLP 2.0.1

11 Apr 08:31
5fe7ad7
Compare
Choose a tag to compare
  • Add Tokenizer from pythainlp.tokenize.Tokenizer 79432c2
  • NER fixes, code cleaning, and type hinting #186