Skip to content

fix: fix cuda graph max batch size for spec decoding cases. #5076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lfr-0531
Copy link
Collaborator

Description

  • Fix the max_cuda_graph_bs.
  • Add assertions to output more useful information.

Before this fix, if we set a small max_num_tokens, the max_cuda_graph_bs would be equal to max_num_tokens. But when running models with spec decoding, the real input length (1+max_draft_len) * max_cuda_graph_bs will exceed the
max_num_tokens.

@lfr-0531 lfr-0531 requested a review from mikeiovine June 10, 2025 07:58
@lfr-0531 lfr-0531 requested a review from a team as a code owner June 10, 2025 07:58
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from fbb3b43 to b14159b Compare June 10, 2025 07:59
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8243 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8243 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5974 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8293 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8293 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6005 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from b14159b to 69ef4e4 Compare June 10, 2025 15:41
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8318 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8318 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6023 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from 69ef4e4 to 4820210 Compare June 11, 2025 02:27
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8375 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8375 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6067 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from 4820210 to f08e2ac Compare June 11, 2025 07:51
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8446 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8446 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6118 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8513 [ run ] triggered by Bot

Copy link
Collaborator

@mikeiovine mikeiovine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! This should unblock perf testing on trunk with trtllm-bench.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8513 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6172 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_max_cuda_graph_bs branch from f08e2ac to 47d05a3 Compare June 11, 2025 23:26
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8556 [ run ] triggered by Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants