-
Notifications
You must be signed in to change notification settings - Fork 1.5k
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ #5066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ #5066
Conversation
Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
Signed-off-by: Venky Ganesh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds additional performance tests for Llama-Nemotron models, refining test harness logic and updating configuration mappings to support FP8 backends for nano, super, and ultra variants.
- Added FP8 prequantized performance tests for nano, super, and ultra models in the YAML test list.
- Updated test mapping in test_perf.py to include FP8 variants.
- Enhanced pattern-based model configuration in pytorch_model_config.py to support new FP8 tests and disable attention_dp for certain models.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml | Added new FP8 and extended performance test cases with clarifying comments. |
tests/integration/defs/perf/test_perf.py | Updated model mapping to include FP8 variants for performance tests. |
tests/integration/defs/perf/pytorch_model_config.py | Introduced pattern-based configuration updates and adjustments for FP8 model tests. |
Comments suppressed due to low confidence (3)
tests/integration/defs/perf/pytorch_model_config.py:82
- Consider adding a reference, such as a ticket or issue ID, in the comment regarding the hang issue to provide better context for future maintainability.
'enable_attention_dp': False,
tests/integration/defs/perf/test_perf.py:59
- [nitpick] Ensure that the updated FP8 model naming is consistent across all mapping dictionaries to avoid potential confusion during usage.
"llama_v3.1_nemotron_nano_8b_fp8": "Llama-3.1-Nemotron-Nano-8B-v1-FP8",
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml:294
- [nitpick] Consider expanding the inline comments to provide more context on the test categories, which can improve clarity and maintainability.
# pyt
/bot run --disable-fail-fast |
PR_Github #8197 [ run ] triggered by Bot |
Signed-off-by: Venky Ganesh <[email protected]>
/bot run |
PR_Github #8207 [ run ] triggered by Bot |
PR_Github #8197 [ run ] completed with state |
Signed-off-by: Venky Ganesh <[email protected]>
/bot run |
PR_Github #8209 [ run ] triggered by Bot |
PR_Github #8207 [ run ] completed with state |
Signed-off-by: Venky Ganesh <[email protected]>
/bot run --disable-fail-fast |
PR_Github #8216 [ run ] triggered by Bot |
PR_Github #8209 [ run ] completed with state |
PR_Github #8216 [ run ] completed with state |
Signed-off-by: Venky <[email protected]>
/bot reuse-pipeline --number 5959 |
PR_Github #8539 [ reuse-pipeline ] triggered by Bot |
PR_Github #8539 [ reuse-pipeline ] completed with state |
Description
extra-llma-api-args
that was previously verbose for including exceptions.Performance Summary
Llama v3.1 Nemotron Nano 8B
Invariants:
llama_v3.1_nemotron_nano_8b_fp8
Llama v3.3 Nemotron Super 49B (FP8)
Invariants:
llama_v3.3_nemotron_super_49b_fp8
Llama v3.3 Nemotron Super 49B (BF16)
Invariants:
llama_v3.3_nemotron_super_49b
Llama v3.1 Nemotron Ultra 253B
Invariants:
llama_v3.1_nemotron_ultra_253b_fp8
NOTES
enable_attention_dp=False
override. (Enabling it was causing hangs that are tracked be several bugs).