Skip to content

llama : fix Gemma-2 Query scaling factors #8473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2024
Merged

Conversation

ggerganov
Copy link
Member

cont #8444

danielhanchen and others added 2 commits July 13, 2024 18:41
See google/gemma_pytorch@03e6575

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)
@github-actions github-actions bot added the python python script changes label Jul 13, 2024
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jul 14, 2024
@ggerganov ggerganov merged commit 73cf442 into master Jul 14, 2024
63 checks passed
@ggerganov ggerganov deleted the gg/gemma-2-fix-q-scale branch July 14, 2024 11:05
@danielhanchen
Copy link
Contributor

@ggerganov Thanks for continuing the PR - sorry was out of town for a while so couldn't be of much help - great work as always!

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 15, 2024
* 9B - query_pre_attn_scalar = 256 not 224

See google/gemma_pytorch@03e6575

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

* llama : fix Gemma-2 Query scaling factor

ggml-ci

---------

Co-authored-by: Daniel Han <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* 9B - query_pre_attn_scalar = 256 not 224

See google/gemma_pytorch@03e6575

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

* llama : fix Gemma-2 Query scaling factor

ggml-ci

---------

Co-authored-by: Daniel Han <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants