-
Notifications
You must be signed in to change notification settings - Fork 12.2k
[SYCL] Update SYCL-Rope op and Refactor #8157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is my first PR to SYCL :). @airMeng, @luoyu-intel, can you please take a look? Do I need other tests to verify it? |
With this PR, the DeepSeek-Coder-V2-Lite-Instruct model is working perfectly. |
3284c9c
to
0ea9ccb
Compare
I apologize for the false feedback. I just discovered that I was building the branch with the wrong flag, and as a result, the build didn't use GPU offload. I didn't notice this because the DeepSeek v2 Lite model runs very quickly. With the correct flag and GPU acceleration enabled, llama-server crashes with 'GGML_ASSERT: S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp:3226: dim == 2,' just like the main branch. |
@characharm do you mean https://github.com/zhentaoyu/llama.cpp/blob/0ea9ccbdda9ce342ef7e800cce3606fca1ff1225/ggml/src/ggml-sycl.cpp#L3014? |
Signed-off-by: Yu Zhentao <[email protected]>
Signed-off-by: Yu Zhentao <[email protected]>
Signed-off-by: Yu Zhentao <[email protected]>
0ea9ccb
to
43aa0d3
Compare
* align with rope.cu and move sycl-op to a single file
* align with rope.cu and move sycl-op to a single file
modifications:
rope.cu
ggml-sycl
folder.UT:
NEAPI_DEVICE_SELECTOR=level_zero:gpu7 ./build/bin/test-backend-ops -b SYCL7 -o ROPE
before:
after:
all contiguous src0 UT cases pass.