v3.3.1

danieldk released this 22 May 07:49

767a652

This release updates TGI to Torch 2.7 and CUDA 12.8.

What's Changed

change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #3217
adjust the round_up_seq logic to align with prefill warmup phase on… by @kaixuanliu in #3224
Update to Torch 2.7.0 by @danieldk in #3221
Enable Llama4 for gaudi backend by @yuanwu2017 in #3223
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all by @drbh in #3230
Deepseek r1 by @sywangyi in #3211
Refine warmup and upgrade to synapse AI 1.21.0 by @sywangyi in #3234
fix the crash in default ATTENTION path by @sywangyi in #3235
Switch to punica-sgmv kernel from the Hub by @danieldk in #3236
move input_ids to hpu and remove disposal of adapter_meta by @sywangyi in #3237
Prepare for 3.3.1 by @danieldk in #3238

New Contributors

@kaixuanliu made their first contribution in #3217

Full Changelog: v3.3.0...v3.3.1

Contributors

danieldk, drbh, and 3 other contributors

Assets 2