Skip to content

v0.5.0

Compare
Choose a tag to compare
@OlivierDehaene OlivierDehaene released this 11 Apr 18:32
· 1251 commits to main since this release
6f0f1d7

Features

  • server: add flash-attention based version of Llama
  • server: add flash-attention based version of Santacoder
  • server: support OPT models
  • router: make router input validation optional
  • docker: improve layer caching

Fix

  • server: improve token streaming decoding
  • server: fix escape charcaters in stop sequences
  • router: fix NCCL desync issues
  • router: use buckets for metrics histograms