This repository was archived by the owner on Aug 7, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 19
[9/x]: make dynamic scaling default in Float8Linear #300
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: 1. makes dynamic scaling default in Float8Linear for an easier migration of callsites which currently use Float8DynamicLinear. Fixes tests as needed. 2. updates the README to reference Float8Linear for dynamic scaling Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
This was referenced Jul 2, 2024
This was referenced Jul 2, 2024
vkuzo
added a commit
that referenced
this pull request
Jul 2, 2024
Summary: 1. makes dynamic scaling default in Float8Linear for an easier migration of callsites which currently use Float8DynamicLinear. Fixes tests as needed. 2. updates the README to reference Float8Linear for dynamic scaling Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: bb605c2 Pull Request resolved: #300
drisspg
approved these changes
Jul 2, 2024
@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request has been merged in d4cf2ad. |
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 3, 2024
Summary: In the stack ending in pytorch-labs/float8_experimental#300 in float8_experimental, we are unifying `Float8DynamicLinear` and `Float8Linear`, with a future PR being planned to delete the `Float8DynamicLinear` object. After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 3, 2024
Summary: In the stack ending in pytorch-labs/float8_experimental#300 in float8_experimental, we are unifying `Float8DynamicLinear` and `Float8Linear`, with a future PR being planned to delete the `Float8DynamicLinear` object. After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 3, 2024
Summary: In the stack ending in pytorch-labs/float8_experimental#300 in float8_experimental, we are unifying `Float8DynamicLinear` and `Float8Linear`, with a future PR being planned to delete the `Float8DynamicLinear` object. After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 3, 2024
Summary: In the stack ending in pytorch-labs/float8_experimental#300 in float8_experimental, we are unifying `Float8DynamicLinear` and `Float8Linear`, with a future PR being planned to delete the `Float8DynamicLinear` object. After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
that referenced
this pull request
Jul 3, 2024
Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Jul 3, 2024
Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8ab4833 Pull Request resolved: #304
facebook-github-bot
pushed a commit
that referenced
this pull request
Jul 5, 2024
Summary: Pull Request resolved: #304 We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Reviewed By: drisspg Differential Revision: D59342767 fbshipit-source-id: cfb09dd5f6517cfbf41d8b46eb6d7d6a5266006a
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 8, 2024
Summary: In the stack ending in pytorch-labs/float8_experimental#300 in float8_experimental, we are unifying `Float8DynamicLinear` and `Float8Linear`, with a future PR being planned to delete the `Float8DynamicLinear` object. After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
to pytorch/torchtitan
that referenced
this pull request
Jul 8, 2024
Summary: After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags:
vkuzo
added a commit
that referenced
this pull request
Jul 9, 2024
Summary: missed this in #300 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
tianyu-l
added a commit
to tianyu-l/torchtitan_intern24
that referenced
this pull request
Jul 11, 2024
* Set `record_shapes=True` for profiler ghstack-source-id: 6f1ed49 Pull Request resolved: pytorch#419 * Improved `repeat_kv` eager perf ghstack-source-id: 39e4849 Pull Request resolved: pytorch#418 * Adding FSDP Memory Tracking and Estimation ghstack-source-id: c8ed20f Pull Request resolved: pytorch#425 * Adding integration test for FSDP Memory Tracking and Estimation ghstack-source-id: cc224db Pull Request resolved: pytorch#426 * by default disable heavy memory profiling ghstack-source-id: cad7b3c Pull Request resolved: pytorch#430 * Add the option to turn on async-TP ghstack-source-id: 0a03379 Pull Request resolved: pytorch#429 * Modifying memory estimation options and minor changes ghstack-source-id: 5f09824 Pull Request resolved: pytorch#435 * add comment pointing to Sequence Parallel optimization example ghstack-source-id: 6fa0dcd Pull Request resolved: pytorch#438 * switch float8 logic from Float8DynamicLinear to Float8Linear (pytorch#436) Summary: After pytorch-labs/float8_experimental#300, `Float8Linear` with default settings is equivalent to `Float8DynamicLinear`. This PR changes `torchtitan` to use `Float8Linear`. To support the new UX of `float8_experimental` better, I also switched the `fp8_linear` configuration to be a boolean on whether to swap the linears or not. In the future we can add new options on how to configure each linear (scaling type, scaling granularity, etc) - saving that for a future PR. Test Plan: ``` // run baseline (Float8DynamicLinear) for llama3_8b for 50 iterations on 4 GPUs, // verify performance and loss values do not change meaningfully between // baseline and this PR // baseline (before this PR) // 1. compile, bf16 // 2. compile, float8 // 3. compile, float8, fdsp_fp8_allgather=True // 4. compile, float8, fdsp_fp8_allgather=True, tp=2 // logs: https://gist.github.com/vkuzo/e6d5f3b15349862bfad3706baad8c9ce // experiment (this PR): repeat all of the above, but with Float8Linear // logs: https://gist.github.com/vkuzo/a4d6754358facffa64df931654459631 ``` Reviewers: Subscribers: Tasks: Tags: * Removed `_experimental_support_context_fn_in_torch_utils_checkpoint` ghstack-source-id: 50b2d0c Pull Request resolved: pytorch#444 * Reordered TP parallel plan to follow execution order ghstack-source-id: b492495 Pull Request resolved: pytorch#445 * Made some stylistic changes to `apply_dp` ghstack-source-id: fb78e9e Pull Request resolved: pytorch#446 * Refactored activation checkpointing ghstack-source-id: 785c7e4 Pull Request resolved: pytorch#447 * compiled RMSNorm ghstack-source-id: c4efb81 Pull Request resolved: pytorch#442 * Renamed parallel styles for transformer block weights ghstack-source-id: 5fb0bf3 Pull Request resolved: pytorch#448 * Added type annotations and more stylistic changes ghstack-source-id: 1bd5b9d Pull Request resolved: pytorch#449 --------- Co-authored-by: Andrew Gu <[email protected]> Co-authored-by: Sanket Jayant Purandare <[email protected]> Co-authored-by: Yifu Wang <[email protected]> Co-authored-by: Vasiliy Kuznetsov <[email protected]>
vkuzo
added a commit
that referenced
this pull request
Jul 12, 2024
Summary: missed this in #300 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
vkuzo
added a commit
that referenced
this pull request
Jul 12, 2024
Summary: missed this in #300 Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
facebook-github-bot
pushed a commit
that referenced
this pull request
Jul 12, 2024
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Summary:
of callsites which currently use Float8DynamicLinear. Fixes
tests as needed.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D59305790