Skip to content

GGUF: backend support, fixed-width I/O, misc fixes #10655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented Dec 4, 2024

This PR refactors the GGUF code. Summary of the functional changes:

  • Refactor gguf_tensor_info to directly hold a ggml_tensor instead of fields duplicated from ggml_tensor. This enables the use of ggml_backend_tensor_get to write data from backends other than the CPU. The size field has been removed since it can be calculated from the tensor dimension and type and was therefore duplicated information.
  • Use fixed-width types for storing enums and boolean KV data. On most modern platforms booleans are stored as 1 byte integers and enums as 4 byte integers. These are the types that are now used on all platforms for storing/reading/writing the data. This should make the GGUF files more portable. I don't know how to properly check that floats and doubles are portable since (from what I can tell) there isn't e.g. a single macro that I can check to assert that the data layout is compliant with IEE 754. Right now it is simply asserted that floats and doubles have the expected size when initializing the GGUF context; quite honestly, if those types are somehow implemented differently GGML is probably going to be subtly broken in dozens of other places anyways.
  • On master, when calling gguf_set_tensor_type the GGUF context enters an inconsistent state where the sizes and offsets of the tensors don't match the tensor dimensions and types. This is only resolved with another call to gguf_set_tensor_data which then recalculates the offset. But there are no assertions that the tensor dimensions, types, sizes, and offsets are then actually consistent. This PR makes it so that instead the sizes and offsets are immediately recalculated after calling gguf_set_tensor_type. The user no longer explicitly passes a size to gguf_set_tensor_data but the size is instead calculated automatically.
  • gguf_get_tensor_name now returns const char * instead of char * since the name should not be modifiable via the getter.
  • New function gguf_get_tensor_size to assert that the buffers in user code have the expected size.
  • Fixed or added detection for various memory leaks, in particular when repeatedly calling gguf_set_array_data with the same key.
  • Stricter asserts for the data, e.g. that the row size rather than the total number of elements is divisible by the block size.

What could maybe also be done:

  • Currently KV and tensor indices internally use uint64_t for indexing but externally they return int. The C standard therefore strictly speaking only guarantees ~32k usable values. Would it make sense to instead internally use size_t and make that the type that is being returned? The returned value upon not finding something is -1, checks against this exact value would still work, but not checks for whether the returned value is < 0.
  • gguf_get_data returns a void * to the potentially internally allocated blob. I don't understand what the use case for this function is and I don't see it in use anywhere and if the goal is to calculate a pointer to the tensor data I think it would make more sense to have a dedicated function like gguf_get_tensor_data. At the very least I think it should return const void *.

@github-actions github-actions bot added examples ggml changes relating to the ggml tensor library for machine learning labels Dec 4, 2024
ggml/src/ggml.c Outdated
if (kv->value.arr.n >= SIZE_MAX/gguf_type_size(kv->value.arr.type)) {
fprintf(stderr, "%s: array size is too large (%" PRIu64 ")\n", __func__, kv->value.arr.n);
fclose(file);
gguf_free(ctx);
return NULL;
}

kv->value.arr.data = calloc(kv->value.arr.n, gguf_type_size(kv->value.arr.type));
const size_t nbytes = kv->value.arr.n * gguf_type_size(kv->value.arr.type);
kv->value.arr.data = malloc(nbytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change calloc to malloc, it is done that way to reduce the chance of security issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a function like ggml_crealloc (since to my knowledge no equivalent C function exists)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we should do is rewrite this code in C++. This code is too sensitive to security issues, and C does not provide the necessary abstractions to be able to reason about this code such that we can be reasonably confident that it is memory safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question, we should return a failure rather than crashing when a memory allocation (or anything else) fails in the GGUF loader. If that's not possible, I guess crashing is better than nothing, but not really by much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already made myself familiar with how the code works anyways so I would be fine with rewriting it as C++. Are there things in particular that would be important to consider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They key aspect would be using RAII to handle all the resource allocations. The objects should always be in a consistent state such as the destructor call be called at any time to destroy them if necessary. Use exceptions if fatal errors in inner functions cannot be propagated safely in other way. Generally should use C++ containers and never rely on manual memory management. All memory accesses should be bounds-checked as well, e.g. there are some known issues because the KV array reading functions do not check bounds.

@slaren
Copy link
Member

slaren commented Dec 4, 2024

It would be very useful to be able to run the ggml-ci on changes like this one that have a broad impact. However to do that, the branch must be in the llama.cpp repository.

@JohannesGaessler
Copy link
Collaborator Author

I don't think it's possible to change the source branch of a PR so I will make a new one once I have the C++ rewrite.

@@ -135,9 +133,11 @@ static bool gguf_ex_read_0(const std::string & fname) {

for (int i = 0; i < n_tensors; ++i) {
const char * name = gguf_get_tensor_name (ctx, i);
const size_t size = gguf_get_tensor_size (ctx, i);
// const size_t size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Suggested change
// const size_t size = 0;

@@ -182,9 +182,11 @@ static bool gguf_ex_read_1(const std::string & fname, bool check_data) {

for (int i = 0; i < n_tensors; ++i) {
const char * name = gguf_get_tensor_name (ctx, i);
const size_t size = gguf_get_tensor_size (ctx, i);
// const size_t size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// const size_t size = 0;

Comment on lines +6465 to +6470
if (sizeof(float) != 4) {
GGML_ABORT("support for floats with != 32 bits not implemented");
}
if (sizeof(double) != 8) {
GGML_ABORT("support for doubles with != 64 bits not implemented");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this is not enough, because floats may be represented in a different format even if the size matches. In C there is __STDC_IEC_559__ to check this, but it seems that not every compiler sets this, so it cannot be used reliably. In C++ it should be possible to use std::numeric_limits<T>::is_iec559 to check that both float and double are compliant.

It should probably be a static_assert somewhere, too much code assumes ieee-754 floats anyway, it's not just the GGUF loader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants