Releases: huggingface/text-generation-inference
Releases · huggingface/text-generation-inference
v0.2.1
Fix
- server: fix bug with repetition penalty when using GPUs and inference mode
v0.2.0
Features
- router: support Token streaming using Server Side Events
- router: support seeding
- server: support gpt-neox
- server: support santacoder
- server: support repetition penalty
- server: allow the server to use a local weight cache
Breaking changes
- router: refactor Token API
- router: modify /generate API to only return generated text
Misc
- router: use background task to manage request queue
- ci: docker build/push on update