Skip to content

llama : add support for GPT2, Bloom and CodeShell tied word embeddings #12456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 19, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Mar 18, 2025

Also remove weight duplication from said models on conversion.

I've converted and tested the following models, confirming that they do not initially have output weights (except for CodeShell, see below) but rely on word embeddings and output weights being tied together at runtime:

  • openai-community/gpt2
  • bigscience/bloomz-560m
  • WisdomShell/CodeShell-7B-Chat
  • WisdomShell/Shell-7B-Chat

For some reason CodeShell has inverted ties; output weights are provided in the bin/safetensors, but not word embeddings, even though our conversion code seems to imply otherwise.

Added a workaround for transformer.wte.weight being in the CodeShell weight map even though it's not in the tensor file(s), causing a conversion error unless you edit the .index.json file.

@CISC CISC requested a review from ngxson March 18, 2025 19:48
@github-actions github-actions bot added the python python script changes label Mar 18, 2025
CISC added 3 commits March 18, 2025 23:44
It appears transformer.wte.weight is in the weight map even though the weights are not there, remove it if output weights are encountered first.
@ngxson ngxson merged commit 108e53c into ggml-org:master Mar 19, 2025
50 checks passed
@CISC CISC deleted the tied-word-embeddings branch March 19, 2025 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants