v1.0.0
Highlights
- Support for Nomic models
- Support for Flash Attention for Jina models
- Metal backend for M* users
/tokenize
route to directly access the internal TEI tokenizer/embed_all
route to allow client level pooling
What's Changed
- fix: limit the number of buckets for prom metrics by @OlivierDehaene in #114
- feat: support flash attention for Jina by @OlivierDehaene in #119
- feat: add support for Metal by @OlivierDehaene in #120
- fix: fix turing for Jina and limit concurrency in docker build by @OlivierDehaene in #121
- fix(router): fix panics on partial_cmp and empty req.texts by @OlivierDehaene in #138
- feat(router): add /tokenize route by @OlivierDehaene in #139
- feat(backend): support classification for bert by @OlivierDehaene in #155
- feat: add embed_raw route to get all embeddings without pooling by @OlivierDehaene in #154
- added docs for
OTLP_ENDPOINT
around the defaults and format sent by @MarcusDunn in #157 - fix: use mimalloc to solve memory "leak" by @OlivierDehaene in #161
- fix: remove modif of tokenizer by @OlivierDehaene in #163
- fix: add cors_allow_origin to cli by @OlivierDehaene in #162
- fix: use st max_seq_length by @OlivierDehaene in #167
- feat: support nomic models by @OlivierDehaene in #166
New Contributors
- @MarcusDunn made their first contribution in #157
Full Changelog: v0.6.0...v1.0.0