Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error #264

Merged
merged 2 commits into from
May 20, 2024

Conversation

weifengpy
Copy link
Contributor

@weifengpy weifengpy commented May 20, 2024

Stack from ghstack (oldest at bottom):

E             File "/home/weif/local/pytorch-official/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 205, in forward
E               output = self.output(h).float()
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
E               return self._call_impl(*args, **kwargs)
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1541, in _call_impl
E               return forward_call(*args, **kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_dynamic_linear.py", line 71, in forward
E               y = torch.nn.functional.linear(x_fp8, w_fp8, self.bias)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_tensor.py", line 297, in __torch_dispatch__
E               return FLOAT8_OPS_TABLE[func](func, args, kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_ops.py", line 151, in float8_mm
E               tensor_out, amax = addmm_float8_unwrapped(
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_python_api.py", line 55, in addmm_float8_unwrapped
E               output, output_amax = torch._scaled_mm(
E           RuntimeError: mat2 shape (768x8 must be divisible by 16
E           Exception raised from _scaled_mm_out_cuda at /data/users/weif/pytorch-official/pytorch/aten/src/ATen/native/cuda/Blas.cpp:874 (most recent call first):

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request May 20, 2024
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 9bc32b0
Pull Request resolved: #264
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2024
@weifengpy weifengpy requested a review from awgu May 20, 2024 23:33
Copy link

@awgu awgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awgu
Copy link

awgu commented May 20, 2024

Check lint before landing :)

… 16` error"


```
E             File "/home/weif/local/pytorch-official/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 205, in forward
E               output = self.output(h).float()
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
E               return self._call_impl(*args, **kwargs)
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1541, in _call_impl
E               return forward_call(*args, **kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_dynamic_linear.py", line 71, in forward
E               y = torch.nn.functional.linear(x_fp8, w_fp8, self.bias)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_tensor.py", line 297, in __torch_dispatch__
E               return FLOAT8_OPS_TABLE[func](func, args, kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_ops.py", line 151, in float8_mm
E               tensor_out, amax = addmm_float8_unwrapped(
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_python_api.py", line 55, in addmm_float8_unwrapped
E               output, output_amax = torch._scaled_mm(
E           RuntimeError: mat2 shape (768x8 must be divisible by 16
E           Exception raised from _scaled_mm_out_cuda at /data/users/weif/pytorch-official/pytorch/aten/src/ATen/native/cuda/Blas.cpp:874 (most recent call first):
```

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request May 20, 2024
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 6e80e7c
Pull Request resolved: #264
@weifengpy weifengpy merged commit a73ad63 into gh/weifengpy/1/base May 20, 2024
3 checks passed
@awgu
Copy link

awgu commented May 20, 2024

@weifengpy I think you merged it into your base branch, not the actual main branch.

I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

@weifengpy
Copy link
Contributor Author

@weifengpy I think you merged it into your base branch, not the actual main branch.

I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

no wonder everyone is importing to fbcode. good to learn it. will open another PR

@weifengpy
Copy link
Contributor Author

@weifengpy I think you merged it into your base branch, not the actual main branch.
I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

no wonder everyone is importing to fbcode. good to learn it. will open another PR

opened #265 for review and imported it into fbcode

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants