Skip to content

imatrix: fix oob writes if src1 is not contiguous #13286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

The imatrix code implicitly assumes that src1 is contiguous when copying data from a backend to host memory. As a result the vector to which the data is written can end up being resized to a size that is smaller than the amount of data that ggml_backend_tensor_get writes, resulting in out-of-bounds writes.

This PR makes it so that the host buffer always has the exact same size as the amount of data that is being copied. Also, if src1 is not contiguous, then this is considered for calculating the byte addresses of matrix rows.

Unless I'm misunderstanding the code the cases ne12 > 1 and ne13 > 1 are also going to result in unexpected behavior but I don't know what the correct fix would be.

@@ -180,7 +182,7 @@ bool IMatrixCollector::collect_imatrix(struct ggml_tensor * t, bool ask, void *
++e.ncall;
LOG_DBGV(2, "%s[%d]: %32s, %s, %5d x %5d, %d\n", __func__, m_last_call, wname.c_str(), ggml_op_name(t->op), (int)src1->ne[0], (int)src1->ne[1], (int)src1->type);
for (int row = 0; row < (int)src1->ne[1]; ++row) {
const float * x = data + row * src1->ne[0];
const float * x = (const float *) (data + row * src1->nb[1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse my ignorance if I'm asking silly questions, but why nb[1] instead of nb[0]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb[0] is the byte offset when changing dimension 0 by 1, nb[1] is the byte offset when changing dimension 1 by 1.

@JohannesGaessler JohannesGaessler merged commit 3e959f0 into ggml-org:master May 3, 2025
95 of 96 checks passed
@CISC
Copy link
Collaborator

CISC commented May 12, 2025

@JohannesGaessler In order for you not to have to look at the above linked PR, the solution is to loop over each attention head and compute and store them consecutively, meaning that values/counts must be resized to head_dim * n_head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants