SYCL: set extras only on GGML_TYPE_Q4_0 #12366
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commit 08d5986 implemented optimization of Q4_0 tensors on intel GPUs by opting to reorder the Q4 block to separate quantized weights and dequantize scaler.
However, since this commit required setting extras in init_tensor function, this commit did not check if the tensor type is indeed of Q4_0, which resulted in memory leak.
This change adds a condition to prevent memory leak.
ps. This is not a permanent solution. We should remove setting extras inside init_tensor function.
Tested with both non Q4_0 and Q4_0 models.