Skip to content

llama: support Qwen3 #12501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

llama: support Qwen3 #12501

wants to merge 3 commits into from

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Mar 21, 2025

Initial draft based on huggingface/transformers#36878

In case models are released before I can have a look at them this weekend:

TODO

  • Set type for all layer sizes in llama_model::load_hparams
  • Test conversion and inference on all models

@github-actions github-actions bot added the python python script changes label Mar 21, 2025
@x0wllaar
Copy link

Are you planning to add MoE support?

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

Are you planning to add MoE support?

I'm focusing on non-MoE for now, so if someone wants to work on Qwen3MoE in the mean time they are more than welcome to. :)

@x0wllaar
Copy link

Thank you! I not sure I'm up to the task though lol

@ngxson
Copy link
Collaborator

ngxson commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE. Diff are:

  • Shared experts are removed
  • Added k_norm and q_norm (similar to qwen3 dense)

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE.

That was my initial impression too, I can have a stab at it if no-one else volunteers, just didn't want to bite off too much at once (esp. given the flustercuck 57B-A14B was). :)

@CISC
Copy link
Collaborator Author

CISC commented Apr 8, 2025

Superseded by #12828

@CISC CISC closed this Apr 8, 2025
@CISC CISC deleted the qwen3 branch April 8, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants