[SYCL] Update SYCL-Rope op and Refactor #8157

zhentaoyu · 2024-06-27T08:11:32Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

modifications:

update sycl_op_rope by following ggml : unify rope norm/neox #7634 and rope.cu
move sycl_op_rope into a sing file in ggml-sycl folder.

UT:
NEAPI_DEVICE_SELECTOR=level_zero:gpu7 ./build/bin/test-backend-ops -b SYCL7 -o ROPE
before:

SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 8 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 1| [level_zero:gpu:1]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 2| [level_zero:gpu:2]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 3| [level_zero:gpu:3]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 4| [level_zero:gpu:4]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 5| [level_zero:gpu:5]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 6| [level_zero:gpu:6]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 7| [level_zero:gpu:7]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
  Backend name: SYCL7
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,40,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,52,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,64,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,1,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,71,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,8,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=20,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=32,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): GGML_ASSERT: /home/marvin/yzt/llama.cpp/ggml/src/ggml-sycl.cpp:3514: src2 == nullptr && "TODO: freq_factors not implemented for !is_neox"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

after:

[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 8 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 1| [level_zero:gpu:1]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 2| [level_zero:gpu:2]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 3| [level_zero:gpu:3]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 4| [level_zero:gpu:4]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 5| [level_zero:gpu:5]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 6| [level_zero:gpu:6]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
| 7| [level_zero:gpu:7]|                Intel Arc A770 Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.27191|
  Backend name: SYCL7
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,40,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,52,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,64,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,1,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,71,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,8,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=20,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=32,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,40,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,52,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,64,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,1,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,71,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,8,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=20,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[80,32,10,1],n_dims=32,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,40,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,52,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,64,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,1,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,71,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,8,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[80,32,10,1],n_dims=20,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[80,32,10,1],n_dims=32,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,40,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,52,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,64,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,1,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,71,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,8,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[80,32,10,1],n_dims=20,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[80,32,10,1],n_dims=32,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f32,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f32,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): OK
  ROPE(type=f16,ne_a=[128,32,10,1],n_dims=128,mode=0,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ROPE(type=f16,ne_a=[64,128,10,1],n_dims=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): OK
  ......

all contiguous src0 UT cases pass.

zhentaoyu · 2024-06-27T08:14:18Z

This is my first PR to SYCL :). @airMeng, @luoyu-intel, can you please take a look? Do I need other tests to verify it?

characharm · 2024-06-28T02:42:56Z

With this PR, the DeepSeek-Coder-V2-Lite-Instruct model is working perfectly.

characharm · 2024-06-28T13:36:36Z

With this PR, the DeepSeek-Coder-V2-Lite-Instruct model is working perfectly.

I apologize for the false feedback. I just discovered that I was building the branch with the wrong flag, and as a result, the build didn't use GPU offload. I didn't notice this because the DeepSeek v2 Lite model runs very quickly. With the correct flag and GPU acceleration enabled, llama-server crashes with 'GGML_ASSERT: S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp:3226: dim == 2,' just like the main branch.

airMeng · 2024-06-28T14:28:10Z

@characharm do you mean https://github.com/zhentaoyu/llama.cpp/blob/0ea9ccbdda9ce342ef7e800cce3606fca1ff1225/ggml/src/ggml-sycl.cpp#L3014?
if yes, we can support later.

Signed-off-by: Yu Zhentao <[email protected]>

* align with rope.cu and move sycl-op to a single file

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jun 27, 2024

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 27, 2024

airMeng approved these changes Jun 28, 2024

View reviewed changes

zhentaoyu force-pushed the sycl_rope_refactor branch from 3284c9c to 0ea9ccb Compare June 28, 2024 02:55

luoyu-intel approved these changes Jun 28, 2024

View reviewed changes

zhentaoyu added 3 commits July 1, 2024 01:24

align with rope.cu and move sycl-op to a single file

6514f17

Signed-off-by: Yu Zhentao <[email protected]>

fall back rope when src0 is not contiguous

ec55dc5

Signed-off-by: Yu Zhentao <[email protected]>

rebase and fix compile

43aa0d3

Signed-off-by: Yu Zhentao <[email protected]>

zhentaoyu force-pushed the sycl_rope_refactor branch from 0ea9ccb to 43aa0d3 Compare July 1, 2024 01:27

airMeng merged commit 197fe6c into ggml-org:master Jul 1, 2024
53 checks passed

MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024

[SYCL] Update SYCL-Rope op and Refactor (ggml-org#8157)

9fe0ee9

* align with rope.cu and move sycl-op to a single file

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 2, 2024

[SYCL] Update SYCL-Rope op and Refactor (ggml-org#8157)

e881132

* align with rope.cu and move sycl-op to a single file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Update SYCL-Rope op and Refactor #8157

[SYCL] Update SYCL-Rope op and Refactor #8157

Uh oh!

zhentaoyu commented Jun 27, 2024 •

edited

Loading

Uh oh!

zhentaoyu commented Jun 27, 2024

Uh oh!

characharm commented Jun 28, 2024

Uh oh!

characharm commented Jun 28, 2024

Uh oh!

airMeng commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

[SYCL] Update SYCL-Rope op and Refactor #8157

[SYCL] Update SYCL-Rope op and Refactor #8157

Uh oh!

Conversation

zhentaoyu commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhentaoyu commented Jun 27, 2024

Uh oh!

characharm commented Jun 28, 2024

Uh oh!

characharm commented Jun 28, 2024

Uh oh!

airMeng commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

zhentaoyu commented Jun 27, 2024 •

edited

Loading