Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

[FSDP2] set vocab_size=32 to avoid must be divisible by 16 error #264

Merged
merged 2 commits into from
May 20, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion test/test_fsdp2/test_fsdp2_eager.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def init_multi_module(self) -> nn.Module:
def init_transformer(self, weight_tying: bool) -> nn.Module:
torch.manual_seed(42)
args = ModelArgs(
n_layers=3, dim=768, n_heads=12, dropout_p=0.0, weight_tying=weight_tying
n_layers=3, dim=768, n_heads=12, dropout_p=0.0, weight_tying=weight_tying, vocab_size=32,
)
module = Transformer(args).cuda()
self.broadcast_module(module)
Expand Down
Loading