Skip to content

finetune: rename feed-forward tensors (w1/w2/w3) #4839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented Jan 9, 2024

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate, ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the purpose of the tensors. This also seems to be inline with the names used in the llama_layer struct in llama.cpp.

@ggerganov ggerganov requested a review from xaedes January 9, 2024 14:24
Copy link
Collaborator

@xaedes xaedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
For consistency, can you do the same for examples/train-text-from-scratch.cpp inside this PR?

@danbev
Copy link
Collaborator Author

danbev commented Jan 10, 2024

For consistency, can you do the same for examples/train-text-from-scratch.cpp inside this PR?

Absolutely, I'll take a look at that examples as well 👍

@danbev
Copy link
Collaborator Author

danbev commented Jan 10, 2024

The ci failure does not look related to this PR as far as I can tell.
Would someone with the correct permissions be able to re-run the job in question?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sometimes this job fails - restarted it

@ggerganov ggerganov requested a review from xaedes January 11, 2024 21:24
@danbev
Copy link
Collaborator Author

danbev commented Jan 15, 2024

@xaedes Would you be able to take a look at the changes to train-text-from-scratch.cpp? Thanks

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp

Signed-off-by: Daniel Bevenius <[email protected]>
@danbev danbev force-pushed the finetune-ff-tensor-names branch from 77cfcb4 to 35670e7 Compare February 13, 2024 11:27
@ggerganov ggerganov merged commit 2639789 into ggml-org:master Feb 13, 2024
@danbev danbev deleted the finetune-ff-tensor-names branch February 16, 2024 10:47
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* finetune: rename feed-forward tensors (w1/w2/w3)

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp.

Signed-off-by: Daniel Bevenius <[email protected]>

* train-text-from-scratch: rename ff tensors

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* finetune: rename feed-forward tensors (w1/w2/w3)

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp.

Signed-off-by: Daniel Bevenius <[email protected]>

* train-text-from-scratch: rename ff tensors

This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate,
ffn_down and ffn_up respectively.

The motivation for this change is to make it easier to understand the
purpose of the tensors. This also seems to be inline with the names
used in the llama_layer struct in llama.cpp

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants