Releases · huggingface/text-embeddings-inference

03 Jun 13:38

Narsil

v1.7.1

006e16b

v1.7.1 Latest

Latest

What's Changed

[Docs] Update quick tour by @NielsRogge in #574
Update README.md and supported_models.md by @alvarobartt in #572
Back with linting. by @Narsil in #577
[Docs] Add cloud run example by @NielsRogge in #573
Fixup by @Narsil in #578
Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in #576
Removing requirements file. by @Narsil in #585
Removing candle-extensions to live on crates.io by @Narsil in #583
Bump sccache to 0.10.0 and sccache-action to 0.0.9 by @alvarobartt in #586
optimize the performance of FlashBert Path for HPU by @kaixuanliu in #575
Revert "Removing requirements file. (#585)" by @Narsil in #588
Get opentelemetry trace id from request headers by @kozistr in #425
Add argument for configuring Prometheus port by @kozistr in #589
Adding missing head. prefix in the weight name in ModernBertClassificationHead by @kozistr in #591
Fixing the CI (grpc path). by @Narsil in #593
fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in #595
enable flash mistral model for HPU device by @kaixuanliu in #594
remove optimum-habana dependency by @kaixuanliu in #599
Support NomicBert MoE by @kozistr in #596
Remove duplicate short option '-p' to fix router executable by @cebtenzzre in #602
Update text-embeddings-router --help output by @alvarobartt in #603
Warmup padded models too. by @Narsil in #592
Add support for JinaAI Re-Rankers V1 by @alvarobartt in #582
Gte diffs by @Narsil in #604
Fix the weight name in GTEClassificationHead by @kozistr in #606
upgrade pytorch and ipex to 2.7 version by @kaixuanliu in #607
upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in #608
Patch DistilBERT variants with different weight keys by @alvarobartt in #614
add offline modeling for model jinaai/jina-embeddings-v2-base-code to avoid auto_map to other repository by @kaixuanliu in #612
Add mean pooling strategy for Modernbert classifier by @kwnath in #616
Using serde for pool validation. by @Narsil in #620
Preparing the update to 1.7.1 by @Narsil in #623

New Contributors

@NielsRogge made their first contribution in #574
@cebtenzzre made their first contribution in #602
@kwnath made their first contribution in #616

Full Changelog: v1.7.0...v1.7.1

Contributors

Narsil, kaixuanliu, and 5 other contributors

Assets 2

08 Apr 11:54

Narsil

v1.7.0

72dac20

v1.7.0

Notable changes

Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)
Added ModernBert support by @kozistr !

What's Changed

Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in #542
Upgrade candle2 by @Narsil in #543
Upgrade candle3 by @Narsil in #545
Fixing the static-linking. by @Narsil in #547
Fix linking bis by @Narsil in #549
Make sliding_window for Qwen2 optional by @alvarobartt in #546
Optimize the performance of FlashBert on HPU by using fast mode softmax by @kaixuanliu in #555
Fixing cudarc to the latest unified bindings. by @Narsil in #558
Fix typos / formatting in CLI args in Markdown files by @alvarobartt in #552
Use custom serde deserializer for JinaBERT models by @alvarobartt in #559
Implement the ModernBert model by @kozistr in #459
Fixing FlashAttention ModernBert. by @Narsil in #560
Enable ModernBert on metal by @ivarflakstad in #562
Fix {Bert,DistilBert}SpladeHead when loading from Safetensors by @alvarobartt in #564
add related docs for intel cpu/xpu/hpu container by @kaixuanliu in #550
Update the doc for submodule. by @Narsil in #567
Update docs/source/en/custom_container.md by @alvarobartt in #568
Preparing for release 1.7.0 (candle update + modernbert). by @Narsil in #570

New Contributors

@ivarflakstad made their first contribution in #562

Full Changelog: v1.6.1...v1.7.0

Contributors

Narsil, kaixuanliu, and 3 other contributors

Assets 2

28 Mar 08:47

Narsil

v1.6.1

875239e

v1.6.1

What's Changed

Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in #245
add reranker model support for python backend by @kaixuanliu in #386
(FIX): CI Security Fix - branchname injection by @glegendre01 in #479
Upgrade TEI. by @Narsil in #501
Pin cargo-chef installation to 0.1.62 by @alvarobartt in #469
add TRUST_REMOTE_CODE param to python backend. by @kaixuanliu in #485
Enable splade embeddings for Python backend by @pi314ever in #493
Hpu bucketing by @kaixuanliu in #489
Optimize flash bert path for hpu device by @kaixuanliu in #509
upgrade ipex to 2.6 version for cpu/xpu by @kaixuanliu in #510
fix bug for MaskedLanguageModel class` by @kaixuanliu in #513
Fix double incrementing te_request_count metric by @kozistr in #486
Add intel based images to the CI by @baptistecolle in #518
Fix typo on intel docker image by @baptistecolle in #529
chore: Upgrade to tokenizers 0.21.0 by @lightsofapollo in #512
feat: add support for "model_type": "gte" by @anton-pt in #519
Update README.md to include ONNX by @alvarobartt in #507
Fusing both Gte Configs. by @Narsil in #530
Add HF_HUB_USER_AGENT_ORIGIN by @alvarobartt in #534
Use --hf-token instead of --hf-api-token by @alvarobartt in #535
Fixing the tests. by @Narsil in #531
Support classification head for DistilBERT by @kozistr in #487
add CLI flag disable-spans to toggle span trace logging by @obloomfield in #481
feat: support HF_ENDPOINT environment when downloading model by @StrayDragon in #505
Small fixup. by @Narsil in #537
Fix VarBuilder handling in GTE e.g. gte-multilingual-reranker-base by @Narsil in #538
make a WA in case Bert model do not have safetensor file by @kaixuanliu in #515
Add missing match on onnx/model.onnx download by @alvarobartt in #472
Fixing the impure flake devShell to be able to run python code. by @Narsil in #539
Prepare for release. by @Narsil in #540

New Contributors

@yuanwu2017 made their first contribution in #245
@kaixuanliu made their first contribution in #386
@Narsil made their first contribution in #501
@pi314ever made their first contribution in #493
@baptistecolle made their first contribution in #518
@lightsofapollo made their first contribution in #512
@anton-pt made their first contribution in #519
@obloomfield made their first contribution in #481
@StrayDragon made their first contribution in #505

Full Changelog: v1.6.0...v1.6.1

Contributors

Narsil, lightsofapollo, and 10 other contributors

Assets 2

13 Dec 15:52

OlivierDehaene

v1.6.0

57d8fc8

v1.6.0

What's Changed

feat: support multiple backends at the same time by @OlivierDehaene in #440
feat: GTE classification head by @kozistr in #441
feat: Implement GTE model to support the non-flash-attn version by @kozistr in #446
feat: Implement MPNet model (#363) by @kozistr in #447

Full Changelog: v1.5.1...v1.6.0

Contributors

kozistr and OlivierDehaene

Assets 2

05 Nov 15:17

OlivierDehaene

v1.5.1

76b29f1

v1.5.1

What's Changed

Download model.onnx_data by @kozistr in #343
Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
fix: add serde default for truncation direction by @drbh in #399
fix: metrics unbounded memory by @OlivierDehaene in #409
Fix to allow health check w/o auth by @kozistr in #360
Update ort crate version to 2.0.0-rc.4 to support onnx IR version 10 by @kozistr in #361
adds curl to fix healthcheck by @WissamAntoun in #376
fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
fix: use status code 400 when batch is empty by @OlivierDehaene in #413
fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
feat: auto limit string if truncate is set by @OlivierDehaene in #428

New Contributors

@Wauplin made their first contribution in #342
@XciD made their first contribution in #345
@WissamAntoun made their first contribution in #376

Full Changelog: v1.5.0...v1.5.1

Contributors

XciD, drbh, and 4 other contributors

Assets 2

10 Jul 15:34

OlivierDehaene

v1.5.0

661a77f

v1.5.0

Notable Changes

ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
Add /similarity route

What's Changed

tokenizer max limit on input size by @ErikKaum in #324
docs: air-gapped deployments by @OlivierDehaene in #326
feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
feat: add /similarity route by @OlivierDehaene in #331
fix(ort): fix mean pooling by @OlivierDehaene in #332
chore(candle): update flash attn by @OlivierDehaene in #335
v1.5.0 by @OlivierDehaene in #336

New Contributors

@ErikKaum made their first contribution in #324

Full Changelog: v1.4.0...v1.5.0

Contributors

OlivierDehaene and ErikKaum

Assets 2

02 Jul 15:17

OlivierDehaene

v1.4.0

a0549e6

v1.4.0

Notable Changes

Cuda support for the Qwen2 model architecture

What's Changed

feat(candle): support Qwen2 on Cuda by @OlivierDehaene in #316
fix(candle): fix last token pooling

Full Changelog: v1.3.0...v1.4.0

Contributors

OlivierDehaene

Assets 2

28 Jun 11:37

OlivierDehaene

v1.3.0

6c6cd93

v1.3.0

Notable changes

New truncation direction parameter
Cuda support for JinaCode model architecture
Cuda support for Mistral model architecture
Cuda support for Alibaba GTE model architecture
New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.

What's Changed

Ci migration to K8s by @glegendre01 in #269
chore: map compute_cap from GPU name by @haixiw in #276
chore: cover Nvidia T4/L4 GPU by @haixiw in #284
feat(ci): add trufflehog secrets detection by @McPatate in #286
Community contribution code of conduct by @LysandreJik in #291
Update README.md by @michaelfeil in #277
Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in #266
Add env for OTLP service name by @kozistr in #285
Fix CI build timeout by @fxmarty in #296
fix(router): payload limit was not correctly applied by @OlivierDehaene in #298
feat(candle): better cuda error by @OlivierDehaene in #300
feat(router): add truncation direction parameter by @OlivierDehaene in #299
Support for Jina Code model by @patricebechard in #292
feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in #301
fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in #302
fix: use malloc_trim to cleanup pages by @OlivierDehaene in #307
feat(candle): add FlashMistral by @OlivierDehaene in #308
feat(candle): add flash gte by @OlivierDehaene in #310
feat: add default prompts by @OlivierDehaene in #312
Add optional CORS allow any option value in http server cli by @kir-gadjello in #260
Update HUGGING_FACE_HUB_TOKEN to HF_API_TOKEN in README by @kevinhu in #263
v1.3.0 by @OlivierDehaene in #313