Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v1.7.1
What's Changed
- [Docs] Update quick tour by @NielsRogge in #574
- Update
README.md
andsupported_models.md
by @alvarobartt in #572 - Back with linting. by @Narsil in #577
- [Docs] Add cloud run example by @NielsRogge in #573
- Fixup by @Narsil in #578
- Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
- Removing requirements file. by @Narsil in #585
- Removing candle-extensions to live on crates.io by @Narsil in #583
- Bump
sccache
to 0.10.0 andsccache-action
to 0.0.9 by @alvarobartt in #586 - optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
- Revert "Removing requirements file. (#585)" by @Narsil in #588
- Get opentelemetry trace id from request headers by @kozistr in #425
- Add argument for configuring Prometheus port by @kozistr in #589
- Adding missing
head.
prefix in the weight name inModernBertClassificationHead
by @kozistr in #591 - Fixing the CI (grpc path). by @Narsil in #593
- fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
- enable flash mistral model for HPU device by @kaixuanliu in #594
- remove optimum-habana dependency by @kaixuanliu in #599
- Support NomicBert MoE by @kozistr in #596
- Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
- Update
text-embeddings-router --help
output by @alvarobartt in #603 - Warmup padded models too. by @Narsil in #592
- Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
- Gte diffs by @Narsil in #604
- Fix the weight name in GTEClassificationHead by @kozistr in #606
- upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
- upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
- Patch DistilBERT variants with different weight keys by @alvarobartt in #614
- add offline modeling for model
jinaai/jina-embeddings-v2-base-code
to avoidauto_map
to other repository by @kaixuanliu in #612 - Add mean pooling strategy for Modernbert classifier by @kwnath in #616
- Using serde for pool validation. by @Narsil in #620
- Preparing the update to 1.7.1 by @Narsil in #623
New Contributors
- @NielsRogge made their first contribution in #574
- @cebtenzzre made their first contribution in #602
- @kwnath made their first contribution in #616
Full Changelog: v1.7.0...v1.7.1
v1.7.0
Notable changes
- Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)
- Added ModernBert support by @kozistr !
What's Changed
- Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in #542
- Upgrade candle2 by @Narsil in #543
- Upgrade candle3 by @Narsil in #545
- Fixing the static-linking. by @Narsil in #547
- Fix linking bis by @Narsil in #549
- Make
sliding_window
forQwen2
optional by @alvarobartt in #546 - Optimize the performance of FlashBert on HPU by using fast mode softmax by @kaixuanliu in #555
- Fixing cudarc to the latest unified bindings. by @Narsil in #558
- Fix typos / formatting in CLI args in Markdown files by @alvarobartt in #552
- Use custom
serde
deserializer for JinaBERT models by @alvarobartt in #559 - Implement the
ModernBert
model by @kozistr in #459 - Fixing FlashAttention ModernBert. by @Narsil in #560
- Enable ModernBert on metal by @ivarflakstad in #562
- Fix
{Bert,DistilBert}SpladeHead
when loading from Safetensors by @alvarobartt in #564 - add related docs for intel cpu/xpu/hpu container by @kaixuanliu in #550
- Update the doc for submodule. by @Narsil in #567
- Update
docs/source/en/custom_container.md
by @alvarobartt in #568 - Preparing for release 1.7.0 (candle update + modernbert). by @Narsil in #570
New Contributors
- @ivarflakstad made their first contribution in #562
Full Changelog: v1.6.1...v1.7.0
v1.6.1
What's Changed
- Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in #245
- add reranker model support for python backend by @kaixuanliu in #386
- (FIX): CI Security Fix - branchname injection by @glegendre01 in #479
- Upgrade TEI. by @Narsil in #501
- Pin
cargo-chef
installation to 0.1.62 by @alvarobartt in #469 - add
TRUST_REMOTE_CODE
param to python backend. by @kaixuanliu in #485 - Enable splade embeddings for Python backend by @pi314ever in #493
- Hpu bucketing by @kaixuanliu in #489
- Optimize flash bert path for hpu device by @kaixuanliu in #509
- upgrade ipex to 2.6 version for cpu/xpu by @kaixuanliu in #510
- fix bug for
MaskedLanguageModel
class` by @kaixuanliu in #513 - Fix double incrementing
te_request_count
metric by @kozistr in #486 - Add intel based images to the CI by @baptistecolle in #518
- Fix typo on intel docker image by @baptistecolle in #529
- chore: Upgrade to tokenizers 0.21.0 by @lightsofapollo in #512
- feat: add support for "model_type": "gte" by @anton-pt in #519
- Update
README.md
to include ONNX by @alvarobartt in #507 - Fusing both Gte Configs. by @Narsil in #530
- Add
HF_HUB_USER_AGENT_ORIGIN
by @alvarobartt in #534 - Use
--hf-token
instead of--hf-api-token
by @alvarobartt in #535 - Fixing the tests. by @Narsil in #531
- Support classification head for DistilBERT by @kozistr in #487
- add CLI flag
disable-spans
to toggle span trace logging by @obloomfield in #481 - feat: support HF_ENDPOINT environment when downloading model by @StrayDragon in #505
- Small fixup. by @Narsil in #537
- Fix
VarBuilder
handling in GTE e.g.gte-multilingual-reranker-base
by @Narsil in #538 - make a WA in case Bert model do not have
safetensor
file by @kaixuanliu in #515 - Add missing
match
ononnx/model.onnx
download by @alvarobartt in #472 - Fixing the impure flake devShell to be able to run python code. by @Narsil in #539
- Prepare for release. by @Narsil in #540
New Contributors
- @yuanwu2017 made their first contribution in #245
- @kaixuanliu made their first contribution in #386
- @Narsil made their first contribution in #501
- @pi314ever made their first contribution in #493
- @baptistecolle made their first contribution in #518
- @lightsofapollo made their first contribution in #512
- @anton-pt made their first contribution in #519
- @obloomfield made their first contribution in #481
- @StrayDragon made their first contribution in #505
Full Changelog: v1.6.0...v1.6.1
v1.6.0
What's Changed
- feat: support multiple backends at the same time by @OlivierDehaene in #440
- feat: GTE classification head by @kozistr in #441
- feat: Implement GTE model to support the non-flash-attn version by @kozistr in #446
- feat: Implement MPNet model (#363) by @kozistr in #447
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- Download
model.onnx_data
by @kozistr in #343 - Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
- fix: add serde default for truncation direction by @drbh in #399
- fix: metrics unbounded memory by @OlivierDehaene in #409
- Fix to allow health check w/o auth by @kozistr in #360
- Update
ort
crate version to2.0.0-rc.4
to support onnx IR version 10 by @kozistr in #361 - adds curl to fix healthcheck by @WissamAntoun in #376
- fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
- fix: use status code 400 when batch is empty by @OlivierDehaene in #413
- fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
- feat: auto limit string if truncate is set by @OlivierDehaene in #428
New Contributors
- @Wauplin made their first contribution in #342
- @XciD made their first contribution in #345
- @WissamAntoun made their first contribution in #376
Full Changelog: v1.5.0...v1.5.1
v1.5.0
Notable Changes
- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
- Add
/similarity
route
What's Changed
- tokenizer max limit on input size by @ErikKaum in #324
- docs: air-gapped deployments by @OlivierDehaene in #326
- feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
- feat: add
/similarity
route by @OlivierDehaene in #331 - fix(ort): fix mean pooling by @OlivierDehaene in #332
- chore(candle): update flash attn by @OlivierDehaene in #335
- v1.5.0 by @OlivierDehaene in #336
New Contributors
Full Changelog: v1.4.0...v1.5.0
v1.4.0
Notable Changes
- Cuda support for the Qwen2 model architecture
What's Changed
- feat(candle): support Qwen2 on Cuda by @OlivierDehaene in #316
- fix(candle): fix last token pooling
Full Changelog: v1.3.0...v1.4.0
v1.3.0
Notable changes
- New truncation direction parameter
- Cuda support for JinaCode model architecture
- Cuda support for Mistral model architecture
- Cuda support for Alibaba GTE model architecture
- New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.
What's Changed
- Ci migration to K8s by @glegendre01 in #269
- chore: map compute_cap from GPU name by @haixiw in #276
- chore: cover Nvidia T4/L4 GPU by @haixiw in #284
- feat(ci): add trufflehog secrets detection by @McPatate in #286
- Community contribution code of conduct by @LysandreJik in #291
- Update README.md by @michaelfeil in #277
- Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in #266
- Add env for OTLP service name by @kozistr in #285
- Fix CI build timeout by @fxmarty in #296
- fix(router): payload limit was not correctly applied by @OlivierDehaene in #298
- feat(candle): better cuda error by @OlivierDehaene in #300
- feat(router): add truncation direction parameter by @OlivierDehaene in #299
- Support for Jina Code model by @patricebechard in #292
- feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in #301
- fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in #302
- fix: use malloc_trim to cleanup pages by @OlivierDehaene in #307
- feat(candle): add FlashMistral by @OlivierDehaene in #308
- feat(candle): add flash gte by @OlivierDehaene in #310
- feat: add default prompts by @OlivierDehaene in #312
- Add optional CORS allow any option value in http server cli by @kir-gadjello in #260
- Update
HUGGING_FACE_HUB_TOKEN
toHF_API_TOKEN
in README by @kevinhu in #263 - v1.3.0 by @OlivierDehaene in #313
New Contributors
- @haixiw made their first contribution in #276
- @McPatate made their first contribution in #286
- @LysandreJik made their first contribution in #291
- @michaelfeil made their first contribution in #277
- @scriptator made their first contribution in #266
- @fxmarty made their first contribution in #296
- @patricebechard made their first contribution in #292
- @kir-gadjello made their first contribution in #260
- @kevinhu made their first contribution in #263
Full Changelog: v1.2.3...v1.3.0
v1.2.3
What's Changed
- fix limit peak memory to build cuda-all docker image by @OlivierDehaene in #246
Full Changelog: v1.2.2...v1.2.3
v1.2.2
What's Changed
- fix(gke): accept null values for vertex env vars by @OlivierDehaene in #243
- fix: fix cpu image to not default on the sagemaker entrypoint
Full Changelog: v1.2.1...v1.2.2