fix: fix cuda graph max batch size for spec decoding cases. #5076

lfr-0531 · 2025-06-10T07:58:46Z

Description

Fix the max_cuda_graph_bs.
Add assertions to output more useful information.

Before this fix, if we set a small max_num_tokens, the max_cuda_graph_bs would be equal to max_num_tokens. But when running models with spec decoding, the real input length (1+max_draft_len) * max_cuda_graph_bs will exceed the
max_num_tokens.

lfr-0531 · 2025-06-10T07:59:38Z

/bot run

tensorrt-cicd · 2025-06-10T08:05:44Z

PR_Github #8243 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T12:08:26Z

PR_Github #8243 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5974 completed with status: 'FAILURE'

lfr-0531 · 2025-06-10T12:30:16Z

/bot run

tensorrt-cicd · 2025-06-10T12:36:27Z

PR_Github #8293 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T15:19:56Z

PR_Github #8293 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6005 completed with status: 'FAILURE'

lfr-0531 · 2025-06-10T15:41:34Z

/bot run

tensorrt-cicd · 2025-06-10T15:47:14Z

PR_Github #8318 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T22:01:34Z

PR_Github #8318 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6023 completed with status: 'FAILURE'

lfr-0531 · 2025-06-11T02:27:46Z

/bot run

tensorrt-cicd · 2025-06-11T02:33:40Z

PR_Github #8375 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T05:13:27Z

PR_Github #8375 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6067 completed with status: 'FAILURE'

lfr-0531 · 2025-06-11T07:51:59Z

/bot run

tensorrt-cicd · 2025-06-11T07:58:14Z

PR_Github #8446 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T13:59:05Z

PR_Github #8446 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6118 completed with status: 'FAILURE'

lfr-0531 · 2025-06-11T15:43:35Z

/bot run

tensorrt-cicd · 2025-06-11T15:48:54Z

PR_Github #8513 [ run ] triggered by Bot

mikeiovine

Thanks for fixing! This should unblock perf testing on trunk with trtllm-bench.

tensorrt-cicd · 2025-06-11T18:46:38Z

PR_Github #8513 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6172 completed with status: 'FAILURE'

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 · 2025-06-11T23:27:04Z

/bot run

tensorrt-cicd · 2025-06-11T23:33:23Z

PR_Github #8556 [ run ] triggered by Bot

lfr-0531 requested a review from mikeiovine June 10, 2025 07:58

lfr-0531 requested a review from a team as a code owner June 10, 2025 07:58

lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from fbb3b43 to b14159b Compare June 10, 2025 07:59

lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from b14159b to 69ef4e4 Compare June 10, 2025 15:41

lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from 69ef4e4 to 4820210 Compare June 11, 2025 02:27

lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from 4820210 to f08e2ac Compare June 11, 2025 07:51

mikeiovine approved these changes Jun 11, 2025

View reviewed changes

fix cuda graph max batch size for spec decoding cases.

47d05a3

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from f08e2ac to 47d05a3 Compare June 11, 2025 23:26

fix: fix cuda graph max batch size for spec decoding cases. #5076

Are you sure you want to change the base?

fix: fix cuda graph max batch size for spec decoding cases. #5076

Conversation

lfr-0531 commented Jun 10, 2025

Description

Uh oh!

lfr-0531 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

lfr-0531 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

lfr-0531 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

lfr-0531 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

lfr-0531 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

lfr-0531 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

mikeiovine left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

lfr-0531 commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

Uh oh!