Skip to content

convert: remove most of the n_mult usage in convert.py #3098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 10, 2023

Conversation

Green-Sky
Copy link
Collaborator

Little bit of clean up of n_mult. it is only used to calculate n_ff and the formula only works for llama and close derivatives. (eg not for falcon)

@Green-Sky Green-Sky changed the title convert: remove most n_mult usage convert: remove most of the n_mult usage in convert.py Sep 9, 2023
@Green-Sky Green-Sky force-pushed the convert_reduce_n_mult branch from f862b9e to ecd7bed Compare September 9, 2023 15:51
@Green-Sky Green-Sky requested a review from cebtenzzre September 9, 2023 15:53
@KerfuffleV2
Copy link
Collaborator

Any reason not to remove the find_n_mult function? Nothing uses it after your changes.

@Green-Sky
Copy link
Collaborator Author

Any reason not to remove the find_n_mult function? Nothing uses it after your changes.

hm true. ... time to remove my hack then :)

@goerch
Copy link
Collaborator

goerch commented Sep 9, 2023

Hm. A layman's view: I know of multiple_of as an input to compute the hidden dimensions, is this mapped to ffn_dim_multiplier now? I'd find it easier to follow if you'd be going to support different model classes to back reference the original models.

@Green-Sky Green-Sky force-pushed the convert_reduce_n_mult branch from bda804a to 2f50a58 Compare September 10, 2023 11:52
@Green-Sky
Copy link
Collaborator Author

Hm. A layman's view: I know of multiple_of as an input to compute the hidden dimensions, is this mapped to ffn_dim_multiplier now? I'd find it easier to follow if you'd be going to support different model classes to back reference the original models.

I am not sure where they come from, but yea, they AND the formulas describe a relationship in the architecture. But multiple_of can be a different value too, there are always multiple solutions. They also only seem to be specific to llama. falcon for example just uses 4 * hidden_size . Huggingface models also don't contain the information at all, they just have the intermediate_size param.

@cebtenzzre cebtenzzre merged commit 6eeb4d9 into ggml-org:master Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants