v3.3.1
This release updates TGI to Torch 2.7 and CUDA 12.8.
What's Changed
- change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #3217
- adjust the
round_up_seq
logic to align with prefill warmup phase on… by @kaixuanliu in #3224 - Update to Torch 2.7.0 by @danieldk in #3221
- Enable Llama4 for gaudi backend by @yuanwu2017 in #3223
- fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all by @drbh in #3230
- Deepseek r1 by @sywangyi in #3211
- Refine warmup and upgrade to synapse AI 1.21.0 by @sywangyi in #3234
- fix the crash in default ATTENTION path by @sywangyi in #3235
- Switch to punica-sgmv kernel from the Hub by @danieldk in #3236
- move input_ids to hpu and remove disposal of adapter_meta by @sywangyi in #3237
- Prepare for 3.3.1 by @danieldk in #3238
New Contributors
- @kaixuanliu made their first contribution in #3217
Full Changelog: v3.3.0...v3.3.1