Skip to content

llama: Add support for RWKV v7 architecture(v2) #12412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 17, 2025

Conversation

MollySophia
Copy link
Collaborator

@MollySophia MollySophia commented Mar 16, 2025

@BlinkDL 's explanation of RWKV v7:
RWKV-7 as a meta-in-context learner
Also there are plenty of tests on trained models posted on his x account.

Current available RWKV v7 model repos in HF format:

Base models:

https://huggingface.co/fla-hub/rwkv7-191M-world
https://huggingface.co/fla-hub/rwkv7-0.4B-world
https://huggingface.co/fla-hub/rwkv7-1.5B-world
https://huggingface.co/fla-hub/rwkv7-2.9B-world
https://huggingface.co/fla-hub/rwkv7-0.1B-g1 (Haven't add the option to enable it's capability yet.)

Distilled models:

https://huggingface.co/RWKV-Red-Team/ARWKV-R1-1B5
https://huggingface.co/RWKV-Red-Team/ARWKV-R1-7B
https://huggingface.co/RWKV-Red-Team/ARWKV_7B_R1_16K

This PR contains:

  • GGML_OP_L2_NORM that applies pytorch-style l2 normalization, along the rows. Tested with CPU, CUDA, SYCL, Vulkan, Metal backends.
  • GGML_OP_RWKV_WKV7 which is the core of the RWKV v7 architecture. Implemented the naive recurrent wkv7 kernel in CPU, CUDA, SYCL, Vulkan, Metal.
  • Support inference of RWKV7 and ARWKV7 models.
  • Simple Metal kernel for the old WKV6.
  • Skip unused tokens in last layer ffn computation for rwkv models.
  • Fix inference with RWKV6Qwen2.

TODO:

  • llama-parallel seems broken with all rwkv models. Will check what's wrong and try to fix them tomorrow. (Inference is fixed. But the output seems mixed between these parallel sequences. Haven't figured out what's wrong yet)
  • Why is Musa build failing? (Seems that there's some bugs in their vectorization code. Getting rid of a #pragma unroll in wkv.cu fix the build.

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 16, 2025
Signed-off-by: Molly Sophia <[email protected]>
Signed-off-by: Molly Sophia <[email protected]>
@MollySophia MollySophia requested a review from ggerganov March 17, 2025 07:02
Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concern with the SYCL changes, thanks

@MollySophia MollySophia merged commit 7dfad38 into ggml-org:master Mar 17, 2025
50 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <[email protected]>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <[email protected]>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <[email protected]>

* Apply code-format changes

Signed-off-by: Molly Sophia <[email protected]>

* fix MUSA build

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
@heredos heredos mentioned this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants