Skip to content

Releases: huggingface/text-generation-inference

v0.2.1

07 Feb 14:41
2fe5e1b
Compare
Choose a tag to compare

Fix

  • server: fix bug with repetition penalty when using GPUs and inference mode

v0.2.0

03 Feb 11:56
20c3c59
Compare
Choose a tag to compare

Features

  • router: support Token streaming using Server Side Events
  • router: support seeding
  • server: support gpt-neox
  • server: support santacoder
  • server: support repetition penalty
  • server: allow the server to use a local weight cache

Breaking changes

  • router: refactor Token API
  • router: modify /generate API to only return generated text

Misc

  • router: use background task to manage request queue
  • ci: docker build/push on update