Skip to content

llama : deci : support ffn-free with attention #13296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented May 4, 2025

Fix partially overlooked change request, sorry for premature merge.

cc/ @ymcki

@CISC CISC requested a review from ggerganov May 4, 2025 10:42
@ymcki
Copy link
Contributor

ymcki commented May 4, 2025

I have yet seen any deci layer that is ffn free yet have attention. Whether that will make a viable model is also unknown. I think it is better to make this change when there is actually a real model with such a layer.

@CISC
Copy link
Collaborator Author

CISC commented May 4, 2025

I have yet seen any deci layer that is ffn free yet have attention. Whether that will make a viable model is also unknown. I think it is better to make this change when there is actually a real model with such a layer.

While probably true I think the issue is that if this model were to exist it would cause the code to proceed to build_ffn for a perfectly preventable reason. :)

@ymcki
Copy link
Contributor

ymcki commented May 7, 2025

While studying the architectures of the major models, I have yet to see a layer that has attention but no feed forward, I think it shouldn't hurt to make this change. However, someone needs to do the testing to make sure it doesn't break things. As I don't have the resource to run the 253B model, let's see if someone can help you on this.

@CISC
Copy link
Collaborator Author

CISC commented May 7, 2025

Should be less likely to break anything than before, and certainly not 253B, we'll deal with any issues that may crop up as/if they do.

@CISC CISC merged commit bc4e112 into master May 7, 2025
51 checks passed
@CISC CISC deleted the cisc/deci-ffn-free-with-attn branch May 7, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants