Skip to content

SYCL: set extras only on GGML_TYPE_Q4_0 #12366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2025

Conversation

qnixsynapse
Copy link
Collaborator

@qnixsynapse qnixsynapse commented Mar 13, 2025

Commit 08d5986 implemented optimization of Q4_0 tensors on intel GPUs by opting to reorder the Q4 block to separate quantized weights and dequantize scaler.

However, since this commit required setting extras in init_tensor function, this commit did not check if the tensor type is indeed of Q4_0, which resulted in memory leak.

This change adds a condition to prevent memory leak.

ps. This is not a permanent solution. We should remove setting extras inside init_tensor function.

Tested with both non Q4_0 and Q4_0 models.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 13, 2025
@slaren
Copy link
Member

slaren commented Mar 13, 2025

The extras should also be freed in the reset function of the buffer interface, otherwise this will still leak extras when Q4_0 tensors are allocated in a compute buffer (e.g. for KV quantization).

@qnixsynapse
Copy link
Collaborator Author

@NeoZhangJianyu Can you review this PR please?

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great!
Thank you!

@NeoZhangJianyu NeoZhangJianyu merged commit b3c9a65 into ggml-org:master Mar 17, 2025
47 checks passed
@qnixsynapse qnixsynapse deleted the fix/memory_leak branch March 17, 2025 02:28
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants