[FSDP2] set `vocab_size=32` to avoid `must be divisible by 16` error #264

weifengpy · 2024-05-20T23:32:16Z

Stack from ghstack (oldest at bottom):

-> [FSDP2] set vocab_size=32 to avoid must be divisible by 16 error #264

E             File "/home/weif/local/pytorch-official/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 205, in forward
E               output = self.output(h).float()
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
E               return self._call_impl(*args, **kwargs)
E             File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1541, in _call_impl
E               return forward_call(*args, **kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_dynamic_linear.py", line 71, in forward
E               y = torch.nn.functional.linear(x_fp8, w_fp8, self.bias)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_tensor.py", line 297, in __torch_dispatch__
E               return FLOAT8_OPS_TABLE[func](func, args, kwargs)
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_ops.py", line 151, in float8_mm
E               tensor_out, amax = addmm_float8_unwrapped(
E             File "/data/users/weif/float8_experimental/float8_experimental/float8_python_api.py", line 55, in addmm_float8_unwrapped
E               output, output_amax = torch._scaled_mm(
E           RuntimeError: mat2 shape (768x8 must be divisible by 16
E           Exception raised from _scaled_mm_out_cuda at /data/users/weif/pytorch-official/pytorch/aten/src/ATen/native/cuda/Blas.cpp:874 (most recent call first):

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9bc32b0 Pull Request resolved: #264

awgu

Thanks!

I think PyTorch got changed (https://github.com/pytorch/pytorch/pull/122997/files#diff-ec1bf5cce4e0a9e8c7ad286df0886127bc5b2ef73d183b270be8ece3eb719fe6R95) and hence broke this.

awgu · 2024-05-20T23:41:13Z

Check lint before landing :)

… 16` error" ``` E File "/home/weif/local/pytorch-official/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 205, in forward E output = self.output(h).float() E File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl E return self._call_impl(*args, **kwargs) E File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1541, in _call_impl E return forward_call(*args, **kwargs) E File "/data/users/weif/float8_experimental/float8_experimental/float8_dynamic_linear.py", line 71, in forward E y = torch.nn.functional.linear(x_fp8, w_fp8, self.bias) E File "/data/users/weif/float8_experimental/float8_experimental/float8_tensor.py", line 297, in __torch_dispatch__ E return FLOAT8_OPS_TABLE[func](func, args, kwargs) E File "/data/users/weif/float8_experimental/float8_experimental/float8_ops.py", line 151, in float8_mm E tensor_out, amax = addmm_float8_unwrapped( E File "/data/users/weif/float8_experimental/float8_experimental/float8_python_api.py", line 55, in addmm_float8_unwrapped E output, output_amax = torch._scaled_mm( E RuntimeError: mat2 shape (768x8 must be divisible by 16 E Exception raised from _scaled_mm_out_cuda at /data/users/weif/pytorch-official/pytorch/aten/src/ATen/native/cuda/Blas.cpp:874 (most recent call first): ``` [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6e80e7c Pull Request resolved: #264

awgu · 2024-05-20T23:54:35Z

@weifengpy I think you merged it into your base branch, not the actual main branch.

I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

weifengpy · 2024-05-21T00:00:01Z

@weifengpy I think you merged it into your base branch, not the actual main branch.

I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

no wonder everyone is importing to fbcode. good to learn it. will open another PR

weifengpy · 2024-05-21T00:16:22Z

@weifengpy I think you merged it into your base branch, not the actual main branch.
I think for float8_experimental, you need to import to diff in fbcode and land from fbcode.

no wonder everyone is importing to fbcode. good to learn it. will open another PR

opened #265 for review and imported it into fbcode

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error

cdbf57b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

weifengpy added a commit that referenced this pull request May 20, 2024

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error

0f2157b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9bc32b0 Pull Request resolved: #264

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2024

weifengpy requested a review from awgu May 20, 2024 23:33

awgu approved these changes May 20, 2024

View reviewed changes

weifengpy added a commit that referenced this pull request May 20, 2024

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error

81bd6f4

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6e80e7c Pull Request resolved: #264

weifengpy merged commit a73ad63 into gh/weifengpy/1/base May 20, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP2] set `vocab_size=32` to avoid `must be divisible by 16` error #264

[FSDP2] set `vocab_size=32` to avoid `must be divisible by 16` error #264

Uh oh!

weifengpy commented May 20, 2024 •

edited

Loading

Uh oh!

awgu left a comment

Uh oh!

awgu commented May 20, 2024

Uh oh!

Uh oh!

awgu commented May 20, 2024

Uh oh!

weifengpy commented May 21, 2024

Uh oh!

weifengpy commented May 21, 2024

Uh oh!

Uh oh!

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error #264

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error #264

Uh oh!

Conversation

weifengpy commented May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awgu left a comment

Choose a reason for hiding this comment

Uh oh!

awgu commented May 20, 2024

Uh oh!

Uh oh!

awgu commented May 20, 2024

Uh oh!

weifengpy commented May 21, 2024

Uh oh!

weifengpy commented May 21, 2024

Uh oh!

Uh oh!

[FSDP2] set `vocab_size=32` to avoid `must be divisible by 16` error #264

[FSDP2] set `vocab_size=32` to avoid `must be divisible by 16` error #264

weifengpy commented May 20, 2024 •

edited

Loading