v0.2.0
Features
- router: support Token streaming using Server Side Events
- router: support seeding
- server: support gpt-neox
- server: support santacoder
- server: support repetition penalty
- server: allow the server to use a local weight cache
Breaking changes
- router: refactor Token API
- router: modify /generate API to only return generated text
Misc
- router: use background task to manage request queue
- ci: docker build/push on update