Skip to content

Releases: huggingface/text-embeddings-inference

v0.2.0

18 Oct 11:40
Compare
Choose a tag to compare

What's Changed

  • add support for XLM-RoBERTa in #5
  • get number of tokenization workers from the number of CPU cores in #8
  • prefetch batch in #10
  • support loading from .pth in #12
  • add --pooling arg in #14
  • fix compute cap matching in #21

Full Changelog: v0.1.0...v0.2.0

v0.1.0

13 Oct 13:46
Compare
Choose a tag to compare
  • No compilation step
  • Dynamic shapes
  • Small docker images and fast boot times. Get ready for true serverless!
  • Token based dynamic batching
  • Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
  • Safetensors weight loading
  • Production ready (distributed tracing with Open Telemetry, Prometheus metrics)