Skip to content

Releases: PyThaiNLP/pythainlp

PyThaiNLP v5.0.0 Released!

10 Feb 05:31
Compare
Choose a tag to compare

We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!

With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.

Install: pip install pythainlp
Upgrade: pip install -U pythainlp

See PyThaiNLP 5.0 Change Log: #788.

What is new?

License information

Deprecation and other API changes

  • Change default NER to thainer-v2 5e97e7c
  • Move pythainlp.util.is_native_thai to pythainlp.morpheme.is_native_thai 524759a

Dependency

New API

Improve

  • Update code comments and clean up codes by @BLKSerene in #845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
  • Fix tests of khavee functions by @BLKSerene in #854
  • Update Git Actions versions by @bact in #878
  • Fix ruff args in workflow by @bact in #880
  • Revise ruff args in workflow by @bact in #881
  • Fix coref return type and add fallback by @bact in #883
  • Fix wrong/incompatible types, code readability by @bact in #884
  • Bump protobuf from 3.20 to 3.20.2 by #885
  • Add license info to /tests and README_TH.md by @bact in #886
  • phayathaibert, khavee, parse: Code clean up by @bact in #889
  • ruff: docstring-code-format = true by @bact in #892

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph #804
  • Add paragraph_threshold into paragraph_tokenize() function #806 by @pavaris-pm in
  • Add 🪿 Han-solo by @wannaphong in #830
  • Fix newmm to better handle non-Thai characters in tokens #856 by @konbraphat51
  • Fix incorrect passing of flags to re.split by @hauntsaninja in #832
  • Add syllable_tokenize by @wannaphong in #834
  • Add wanchanberta_thai_grammarly by @wannaphong in #836
  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856

Tag

Chat

Translate

Transliterate

Corpus

  • Add pythainlp.corpus.thai_orst_words() Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong
  • Add pythainlp.corpus.thai_wikipedia_titles() Thai word list (noun and noun phrases) from Thai Uncyclopedia titles #869 by @konbraphat51
  • Add pythainlp.corpus.thai_volubilis_words() Thai word list from Volubilis dictionary #870 by @konbraphat51
  • Add pythainlp.corpus.thai_icu_words() Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm
  • Rename Volubilis/Uncyclopedia corpus function names for consistency / Fix types by @bact in #882

Util

New Contributors

Full Changelog: v4.0.2...v5.0.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

PyThaiNLP v5.0.0-beta1

05 Feb 05:37
Compare
Choose a tag to compare
Pre-release

Schedule

  • First Beta release: 5 February 2024
  • Production release: 10 February 2024

See 5.0 Milestone.

What is new?

License information

  • Use SPDX license identifier at the header of source code #876

Deprecation and other API changes

  • Change default NER to thainer-v2 5e97e7c
  • Move pythainlp.util.is_native_thai to pythainlp.morpheme.is_native_thai 524759a

Dependency

New API

Improve

  • Update code comments and clean up codes by @BLKSerene in #845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
  • Fix tests of khavee functions by @BLKSerene in #854
  • Update Git Actions versions by @bact in #878
  • Fix ruff args in workflow by @bact in #880
  • Revise ruff args in workflow by @bact in #881
  • Fix coref return type and add fallback by @bact in #883
  • Fix wrong/incompatible types, code readability by @bact in #884
  • Bump protobuf from 3.20 to 3.20.2 by #885
  • Add license info to /tests and README_TH.md by @bact in #886
  • phayathaibert, khavee, parse: Code clean up by @bact in #889
  • ruff: docstring-code-format = true by @bact in #892

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph #804
  • Add paragraph_threshold into paragraph_tokenize() function #806 by @pavaris-pm in
  • Add 🪿 Han-solo by @wannaphong in #830
  • Fix newmm to better handle non-Thai characters in tokens #856 by @konbraphat51
  • Fix incorrect passing of flags to re.split by @hauntsaninja in #832
  • Add syllable_tokenize by @wannaphong in #834
  • Add wanchanberta_thai_grammarly by @wannaphong in #836
  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856

Tag

Chat

Translate

Transliterate

Corpus

  • Add pythainlp.corpus.thai_orst_words() Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong
  • Add pythainlp.corpus.thai_wikipedia_titles() Thai word list (noun and noun phrases) from Thai Uncyclopedia titles #869 by @konbraphat51
  • Add pythainlp.corpus.thai_volubilis_words() Thai word list from Volubilis dictionary #870 by @konbraphat51
  • Add pythainlp.corpus.thai_icu_words() Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm
  • Rename Volubilis/Uncyclopedia corpus function names for consistency / Fix types by @bact in #882

Util

New Contributors

PyThaiNLP v5.0.0-dev2

15 Jan 07:49
Compare
Choose a tag to compare
PyThaiNLP v5.0.0-dev2 Pre-release
Pre-release

What's Changed

Full Changelog: v5.0.0-dev1...v5.0.0-dev2

PyThaiNLP v5.0.0-dev1

19 Dec 15:48
Compare
Choose a tag to compare
PyThaiNLP v5.0.0-dev1 Pre-release
Pre-release

What's Changed

  • Add Thai word list from Volubilis dictionary by @konbraphat51 in #870
  • Add Thai word list from Thai Uncyclopedia titles by @konbraphat51 in #869
  • switch PyThaiNLP source code to SPDX license ID by @pavaris-pm in #876
  • Add pythainlp.util.to_idn by @wannaphong in #875
  • Update Git Actions versions by @bact in #878
  • Fix ruff args in workflow by @bact in #880
  • Revise ruff args in workflow by @bact in #881
  • Add Thai word list from ICU BreakIterator dictionary by @pavaris-pm in #879
  • Rename Volubilis/Uncyclopedia corpus function names for consistency / Fix types by @bact in #882
  • Fix coref return type and add fallback by @bact in #883
  • Fix wrong/incompatible types, code readability by @bact in #884
  • Bump protobuf from 3.20 to 3.20.2 by @dependabot in #885
  • Add license info to /tests and README_TH.md by @bact in #886
  • Add PhayaThaiBERT engine with new features [WIP] by @pavaris-pm in #873
  • phayathaibert, khavee, parse: Code clean up by @bact in #889
  • Add pythainlp.corpus.find_synonyms by @wannaphong in #890
  • ruff: docstring-code-format = true by @bact in #892
  • Add pythainlp.util.morse by @wannaphong in #891

Full Changelog: v5.0.0-dev0...v5.0.0-dev1

PyThaiNLP v5.0.0-dev0

26 Nov 09:22
abfbf02
Compare
Choose a tag to compare
PyThaiNLP v5.0.0-dev0 Pre-release
Pre-release

What's Changed

  • Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
  • Update code comments and clean up codes by @BLKSerene in #845
  • Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
  • Fix ISO 11940 duplicate keys by @bact in #851
  • Add pythainlp.util.rhyme by @wannaphong in #849
  • Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
  • Fix tests of khavee functions by @BLKSerene in #854
  • Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
  • add function for pos tag with transformers by @MpolaarbearM in #857
  • Add: remove_trailing_repeat_consonants() by @konbraphat51 in #862
  • Update pos_tag_transformers function by @pavaris-pm in #865

New Contributors

Full Changelog: v4.1.0-beta5...v5.0.0-dev0

PyThaiNLP v4.1.0-beta5

24 Sep 08:57
Compare
Choose a tag to compare
Pre-release

Docs: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What's Changed

Full Changelog: v4.1.0-beta4...v4.1.0-beta5

PyThaiNLP v4.1.0-beta4

05 Sep 04:30
Compare
Choose a tag to compare
Pre-release

Docs: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What's Changed

New Contributors

Full Changelog: v4.1.0-beta3...v4.1.0-beta4

PyThaiNLP v4.1.0-beta3

04 Aug 06:02
Compare
Choose a tag to compare
Pre-release

What's Changed

Full Changelog: v4.1.0-beta2...v4.1.0-beta3

PyThaiNLP v4.1.0-beta2

27 Jul 13:54
Compare
Choose a tag to compare
Pre-release

What is change?

Full Changelog: v4.1.0-beta1...v4.1.0-beta2

PyThaiNLP v4.1.0-beta1

24 Jul 05:02
da86fe2
Compare
Choose a tag to compare
Pre-release

Schedule

  • First Beta release: 24 July 2023

Docs: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues

Install: pip install --pre pythanlp

See 4.1 Milestone.

What is new?

Deprecation and other API changes

  • 5e97e7c Change the default NER to thainer-v2

New API

  • Add pythainlp.coref: Add pythainlp.coref for support Thai Coreference resolution #802
  • Add wtpsplit to sentence segmentation & paragraph segmentation #804 and add paragraph_threshold into paragraph_tokenize function #806
  • Add word approximation to pythainlp.soundex.sound by @wannaphong in #809
  • Add pythainlp.wsd for Thai Word Sense Disambiguation by @wannaphong in #818
  • Add pythainlp.chat and WangChanGLM to pythainlp.generate by @wannaphong in #819
  • Add a param-free classification model (pythainlp.cls) by @c4n in #821
  • Add pythainlp.el by @wannaphong in #822
  • Add pythainlp.util.abbreviation_to_full_text #826 by @wannaphong in #826

Tokenizer

  • Add wtpsplit engine to sentence_tokenize #804
  • New paragraph_tokenize funtion to split Thai text to a paragraph. #804
  • add paragraph_threshold into paragraph_tokenize function by @pavaris-pm in #806

Translate

Corpus

Util

New Contributors

Full Changelog: v4.0.0...v4.1.0-beta1