-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
test: strictly constraint disaggregated serving llama4 to H200
#5085
opened Jun 10, 2025 by
StanleySun639
Loading…
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test
#5083
opened Jun 10, 2025 by
ruodil
Loading…
fix: fix cuda graph max batch size for spec decoding cases.
#5076
opened Jun 10, 2025 by
lfr-0531
Loading…
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn
#5074
opened Jun 10, 2025 by
yibinl-nvidia
•
Draft
fix: remove duplicate trust_remote_code from serve command
#5072
opened Jun 10, 2025 by
yechank-nvidia
Loading…
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨
#5066
opened Jun 10, 2025 by
venkywonka
Loading…
4 tasks done
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test
#5065
opened Jun 10, 2025 by
ChristinaZ
Loading…
bugfix [AutoDeploy]: Correct usage of pytorch_config in autodeploy integration of trtllm-bench
#5059
opened Jun 10, 2025 by
suyoggupta
Loading…
[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20)
#5058
opened Jun 10, 2025 by
lfr-0531
Loading…
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (pyt, fp8)
#5057
opened Jun 10, 2025 by
venkywonka
•
Draft
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.