-
Notifications
You must be signed in to change notification settings - Fork 607
Add tests for qwen + allow uninitialized weights in Llama model #8552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8552
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit 8b51959 with merge base 2859e47 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
5ef3fef
to
c58edc5
Compare
c58edc5
to
955b991
Compare
try: | ||
# assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
# Because we are using device="meta", tensors do not have memory associated with them | ||
# and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
missing, unexpected = self.model_.load_state_dict( | ||
checkpoint, | ||
strict=False, | ||
assign=True, | ||
) # self.model_ = Transformer(gptconf) | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth
or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
try: | ||
# assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
# Because we are using device="meta", tensors do not have memory associated with them | ||
# and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
missing, unexpected = self.model_.load_state_dict( | ||
checkpoint, | ||
strict=False, | ||
assign=True, | ||
) # self.model_ = Transformer(gptconf) | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth
or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
EXECUTORCH_DEFINED_MODELS = [ | ||
"stories110m", | ||
"llama2", | ||
"llama3", | ||
"llama3_1", | ||
"llama3_2", | ||
"static_llama", | ||
"qwen2_5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I accidentally deleted the original comment about ordering, but I was going to say that I think this is clearer to list all the llama models first
47bed4c
to
9b5516b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the changes but I'm really concerned on llama/model.py and I think we should clean it up. I'll create a separate issue.
Summary
Add basic ci test for Qwen model. Requires some changes to
llama/model.py
to allow uninitialized (random) weights.Test plan