Skip to content

[SYCL] Revert "use the correct SYCL context for host USM allocations" #7858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

AidanBeltonS
Copy link
Contributor

Reverts #7777. This PR broke llama-bench and main as when pinned memory is allocated during the models creating the backend is not initialized. This means the g_sycl_gpu_mgr is not constructed with the relevant devices. Causing a segfault as no devices exist within the manager.

I think we should try to reintroduce #7777 in a more suitable way that addresses this issue.

@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 10, 2024
@AidanBeltonS
Copy link
Contributor Author

Ping @bashbaug, @joeatodd

@bashbaug
Copy link
Contributor

Sorry about that, how can I reproduce this issue?

Copy link
Collaborator

@abhilash1910 abhilash1910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@OuadiElfarouki
Copy link
Contributor

OuadiElfarouki commented Jun 11, 2024

Sorry about that, how can I reproduce this issue?

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

@abhilash1910
Copy link
Collaborator

@AidanBeltonS could you rebase to fix CI? Thanks

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 12, 2024
@airMeng
Copy link
Collaborator

airMeng commented Jun 12, 2024

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

I can't reproduce this on Intel GPU. could you have a deep dive why only issues exist on NVIDIA GPU? Maybe an issue to Intel SYCL team is more appropriate.

cc some SYCL mates @Nuullll

@AidanBeltonS AidanBeltonS force-pushed the revert-7777-host-usm-context-fix branch from 4e4ff76 to a9cae48 Compare June 12, 2024 15:08
@AidanBeltonS AidanBeltonS reopened this Jun 12, 2024
@AidanBeltonS
Copy link
Contributor Author

We've encountered this on Nvidia GPUs for both llama-bench & main, instructions to build SYCL backend for Nvidia devices can be found here : https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#nvidia-gpu

I can't reproduce this on Intel GPU. could you have a deep dive why only issues exist on NVIDIA GPU? Maybe an issue to Intel SYCL team is more appropriate.

cc some SYCL mates @Nuullll

Currently working on making a reproducer. It requires a model which uses pinned memory, it should not be a backend/hardware specific problem

@AidanBeltonS
Copy link
Contributor Author

@airMeng the problem also effects intel devices. I have reproduced the error on a Data Max 1100.

To reproduce:
./bin/llama-bench -m ~/llama_models/Llama-2-7b-chat-Q4_K.gguf -ngl 77 --mmap 0

Backtrace:

| model                          |       size |     params | backend    | ngl | mmap |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | ---------------: |
[SYCL] call ggml_init_sycl
ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: yes
found 4 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|         Intel Data Center GPU Max 1100|    1.3|    448|    1024|   32| 51539M|            1.3.29138|
| 1|     [opencl:gpu:0]|         Intel Data Center GPU Max 1100|    3.0|    448|    1024|   32| 48946M|       24.13.29138.29|
| 2|     [opencl:cpu:0]|                  Intel Xeon Gold 5418Y|    3.0|      2|    8192|   64|201419M|2024.17.3.0.08_160000|
| 3|     [opencl:acc:0]|            Intel FPGA Emulation Device|    1.2|      2|67108864|   64|201419M|2024.17.3.0.08_160000|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:448
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory

Thread 1 "llama-bench" received signal SIGSEGV, Segmentation fault.
0x00007fffead4e644 in sycl::_V1::queue::get_context() const () from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
(gdb) bt
#0  0x00007fffead4e644 in sycl::_V1::queue::get_context() const () from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
#1  0x00007fffeacfd46e in sycl::_V1::malloc_host(unsigned long, sycl::_V1::queue const&, sycl::_V1::detail::code_location const&) ()
   from /opt/slurm/intel/oneapi/2024.1.0.596/compiler/2024.1/lib/libsycl.so.7
#2  0x000000000055587a in ggml_sycl_host_malloc(unsigned long) ()
#3  0x00000000005e7f42 in ggml_backend_sycl_host_buffer_type_alloc_buffer(ggml_backend_buffer_type*, unsigned long) ()
#4  0x00000000006eeafa in alloc_tensor_range ()
#5  0x00000000006eea40 in ggml_backend_alloc_ctx_tensors_from_buft ()
#6  0x00000000006669bf in llm_load_tensors(llama_model_loader&, llama_model&, int, llama_split_mode, int, float const*, bool, bool (*)(float, void*), void*) ()
#7  0x0000000000636eb2 in llama_load_model_from_file ()
#8  0x000000000043768d in main ()

@bashbaug
Copy link
Contributor

the problem also effects intel devices. I have reproduced the error on a Data Max 1100.

Thanks, I can reproduce the error with these steps on an A750 also. Looking now...

@bashbaug
Copy link
Contributor

I suspect this change will fix the problem: #7909.

To be clear: I'm fine merging this PR (to revert #7777) if needed to get things moving again, especially if it's going to take some time to review #7909 - thanks!

joeatodd added a commit that referenced this pull request Jun 13, 2024
@airMeng airMeng closed this Jun 17, 2024
Alcpz pushed a commit to Alcpz/llama.cpp that referenced this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants