-
Notifications
You must be signed in to change notification settings - Fork 19
[FSDP2] set vocab_size=32
to avoid must be divisible by 16
error
#264
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think PyTorch got changed (https://github.com/pytorch/pytorch/pull/122997/files#diff-ec1bf5cce4e0a9e8c7ad286df0886127bc5b2ef73d183b270be8ece3eb719fe6R95) and hence broke this.
Check lint before landing :) |
… 16` error" ``` E File "/home/weif/local/pytorch-official/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 205, in forward E output = self.output(h).float() E File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl E return self._call_impl(*args, **kwargs) E File "/home/weif/local/pytorch-official/pytorch/torch/nn/modules/module.py", line 1541, in _call_impl E return forward_call(*args, **kwargs) E File "/data/users/weif/float8_experimental/float8_experimental/float8_dynamic_linear.py", line 71, in forward E y = torch.nn.functional.linear(x_fp8, w_fp8, self.bias) E File "/data/users/weif/float8_experimental/float8_experimental/float8_tensor.py", line 297, in __torch_dispatch__ E return FLOAT8_OPS_TABLE[func](func, args, kwargs) E File "/data/users/weif/float8_experimental/float8_experimental/float8_ops.py", line 151, in float8_mm E tensor_out, amax = addmm_float8_unwrapped( E File "/data/users/weif/float8_experimental/float8_experimental/float8_python_api.py", line 55, in addmm_float8_unwrapped E output, output_amax = torch._scaled_mm( E RuntimeError: mat2 shape (768x8 must be divisible by 16 E Exception raised from _scaled_mm_out_cuda at /data/users/weif/pytorch-official/pytorch/aten/src/ATen/native/cuda/Blas.cpp:874 (most recent call first): ``` [ghstack-poisoned]
@weifengpy I think you merged it into your base branch, not the actual main branch. I think for float8_experimental, you need to import to diff in fbcode and land from fbcode. |
no wonder everyone is importing to fbcode. good to learn it. will open another PR |
opened #265 for review and imported it into fbcode |
Stack from ghstack (oldest at bottom):
vocab_size=32
to avoidmust be divisible by 16
error #264