v0.3.0
Features
- server: support t5 models
- router: add max_total_tokens and empty_input validation
- launcher: add the possibility to disable custom CUDA kernels
- server: add automatic safetensors conversion
- router: add prometheus scrape endpoint
- server, router: add distributed tracing
Fix
- launcher: copy current env vars to subprocesses
- docker: add note around shared memory