Skip to content

v0.1.0

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 13 Oct 13:46
· 259 commits to main since this release
  • No compilation step
  • Dynamic shapes
  • Small docker images and fast boot times. Get ready for true serverless!
  • Token based dynamic batching
  • Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
  • Safetensors weight loading
  • Production ready (distributed tracing with Open Telemetry, Prometheus metrics)