-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: fix cuda graph max batch size for spec decoding cases.
#5076
opened Jun 10, 2025 by
lfr-0531
Loading…
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn
#5074
opened Jun 10, 2025 by
yibinl-nvidia
•
Draft
fix: remove duplicate trust_remote_code from serve command
#5072
opened Jun 10, 2025 by
yechank-nvidia
Loading…
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨
#5066
opened Jun 10, 2025 by
venkywonka
Loading…
4 tasks done
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test
#5065
opened Jun 10, 2025 by
ChristinaZ
Loading…
bugfix [AutoDeploy]: Correct usage of pytorch_config in autodeploy integration of trtllm-bench
#5059
opened Jun 10, 2025 by
suyoggupta
Loading…
[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20)
#5058
opened Jun 10, 2025 by
lfr-0531
Loading…
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (pyt, fp8)
#5057
opened Jun 10, 2025 by
venkywonka
•
Draft
chore: Include prompt_token_ids only for context-only disagg requests
#5055
opened Jun 10, 2025 by
pcastonguay
Loading…
chore: Merge remaining changes from feat/large-ep branch to main
#5039
opened Jun 9, 2025 by
syuoni
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2025-06-07.