-
Notifications
You must be signed in to change notification settings - Fork 12.2k
add 128k yarn context for Qwen #10698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks to @ggerganov , I think I've verified my change working.
and then
note, it is a shame that it hid it so early .. on a previous run using only 1/2 the context it was clearly after the 32k original context window:
I've converted this from draft for review :) |
* add 128k yarn context for Qwen * added property for model tensors * removing useless line
Hi @robbiemu , does this PR mean the 128k YaRN context will be enabled by default when running with like Qwen2.5 7B models ? When testing with
Thanks a lot. |
No, you need to follow the instructions the qwen team specified, adding a section to the config.json when generating your gguf, to enable the YaRN long context. |
* add 128k yarn context for Qwen * added property for model tensors * removing useless line
see: discussion
@bartowski1182 -- can I ask you to try this if you have a 7b+ Qwen2.5 handy ? I dont mind testing it but I thought it would be nice if a 3rd party did it.
quick instructions (correct me if Im wrong):
add rope scaling like:
change max_position_embeddings to factor * orig_mpe: