[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20) #5058

lfr-0531 · 2025-06-10T01:57:25Z

Description

Skip the results of CUDA graph padding in mtp_sampler
Add cuda graph padding + mtp + attention dp tests

Test Coverage

accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding[mtp_nextn=2]
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=0]
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=2]

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 · 2025-06-10T01:57:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-10T02:04:02Z

PR_Github #8182 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T18:14:12Z

PR_Github #8182 [ run ] completed with state SUCCESS
/LLM/release-0.20/L0_MergeRequest_PR pipeline #203 completed with status: 'SUCCESS'

fix cuda graph padding for spec decoding (only for 0.20).

262d299

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested a review from a team as a code owner June 10, 2025 01:57

lfr-0531 mentioned this pull request Jun 10, 2025

Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4853 #4997

Closed

litaotju approved these changes Jun 10, 2025

View reviewed changes

lfr-0531 enabled auto-merge (squash) June 10, 2025 08:05

lfr-0531 merged commit bfa3b59 into NVIDIA:release/0.20 Jun 10, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20) #5058

[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20) #5058

Uh oh!

lfr-0531 commented Jun 10, 2025 •

edited

Loading

Uh oh!

lfr-0531 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20) #5058

[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20) #5058

Uh oh!

Conversation

lfr-0531 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Uh oh!

lfr-0531 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

lfr-0531 commented Jun 10, 2025 •

edited

Loading