-
Notifications
You must be signed in to change notification settings - Fork 12.2k
GGUF: backend support, fixed-width I/O, misc fixes #10655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF: backend support, fixed-width I/O, misc fixes #10655
Conversation
ggml/src/ggml.c
Outdated
if (kv->value.arr.n >= SIZE_MAX/gguf_type_size(kv->value.arr.type)) { | ||
fprintf(stderr, "%s: array size is too large (%" PRIu64 ")\n", __func__, kv->value.arr.n); | ||
fclose(file); | ||
gguf_free(ctx); | ||
return NULL; | ||
} | ||
|
||
kv->value.arr.data = calloc(kv->value.arr.n, gguf_type_size(kv->value.arr.type)); | ||
const size_t nbytes = kv->value.arr.n * gguf_type_size(kv->value.arr.type); | ||
kv->value.arr.data = malloc(nbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change calloc
to malloc
, it is done that way to reduce the chance of security issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add a function like ggml_crealloc
(since to my knowledge no equivalent C function exists)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we should do is rewrite this code in C++. This code is too sensitive to security issues, and C does not provide the necessary abstractions to be able to reason about this code such that we can be reasonably confident that it is memory safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your question, we should return a failure rather than crashing when a memory allocation (or anything else) fails in the GGUF loader. If that's not possible, I guess crashing is better than nothing, but not really by much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already made myself familiar with how the code works anyways so I would be fine with rewriting it as C++. Are there things in particular that would be important to consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They key aspect would be using RAII to handle all the resource allocations. The objects should always be in a consistent state such as the destructor call be called at any time to destroy them if necessary. Use exceptions if fatal errors in inner functions cannot be propagated safely in other way. Generally should use C++ containers and never rely on manual memory management. All memory accesses should be bounds-checked as well, e.g. there are some known issues because the KV array reading functions do not check bounds.
It would be very useful to be able to run the ggml-ci on changes like this one that have a broad impact. However to do that, the branch must be in the llama.cpp repository. |
I don't think it's possible to change the source branch of a PR so I will make a new one once I have the C++ rewrite. |
@@ -135,9 +133,11 @@ static bool gguf_ex_read_0(const std::string & fname) { | |||
|
|||
for (int i = 0; i < n_tensors; ++i) { | |||
const char * name = gguf_get_tensor_name (ctx, i); | |||
const size_t size = gguf_get_tensor_size (ctx, i); | |||
// const size_t size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
// const size_t size = 0; |
@@ -182,9 +182,11 @@ static bool gguf_ex_read_1(const std::string & fname, bool check_data) { | |||
|
|||
for (int i = 0; i < n_tensors; ++i) { | |||
const char * name = gguf_get_tensor_name (ctx, i); | |||
const size_t size = gguf_get_tensor_size (ctx, i); | |||
// const size_t size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// const size_t size = 0; |
if (sizeof(float) != 4) { | ||
GGML_ABORT("support for floats with != 32 bits not implemented"); | ||
} | ||
if (sizeof(double) != 8) { | ||
GGML_ABORT("support for doubles with != 64 bits not implemented"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this is not enough, because floats may be represented in a different format even if the size matches. In C there is __STDC_IEC_559__
to check this, but it seems that not every compiler sets this, so it cannot be used reliably. In C++ it should be possible to use std::numeric_limits<T>::is_iec559
to check that both float
and double
are compliant.
It should probably be a static_assert
somewhere, too much code assumes ieee-754 floats anyway, it's not just the GGUF loader.
This PR refactors the GGUF code. Summary of the functional changes:
gguf_tensor_info
to directly hold aggml_tensor
instead of fields duplicated fromggml_tensor
. This enables the use ofggml_backend_tensor_get
to write data from backends other than the CPU. Thesize
field has been removed since it can be calculated from the tensor dimension and type and was therefore duplicated information.gguf_set_tensor_type
the GGUF context enters an inconsistent state where the sizes and offsets of the tensors don't match the tensor dimensions and types. This is only resolved with another call togguf_set_tensor_data
which then recalculates the offset. But there are no assertions that the tensor dimensions, types, sizes, and offsets are then actually consistent. This PR makes it so that instead the sizes and offsets are immediately recalculated after callinggguf_set_tensor_type
. The user no longer explicitly passes a size togguf_set_tensor_data
but the size is instead calculated automatically.gguf_get_tensor_name
now returnsconst char *
instead ofchar *
since the name should not be modifiable via the getter.gguf_get_tensor_size
to assert that the buffers in user code have the expected size.gguf_set_array_data
with the same key.What could maybe also be done:
uint64_t
for indexing but externally they returnint
. The C standard therefore strictly speaking only guarantees ~32k usable values. Would it make sense to instead internally usesize_t
and make that the type that is being returned? The returned value upon not finding something is -1, checks against this exact value would still work, but not checks for whether the returned value is < 0.gguf_get_data
returns avoid *
to the potentially internally allocated blob. I don't understand what the use case for this function is and I don't see it in use anywhere and if the goal is to calculate a pointer to the tensor data I think it would make more sense to have a dedicated function likegguf_get_tensor_data
. At the very least I think it should returnconst void *
.