GGUF: backend support, fixed-width I/O, misc fixes #10655

JohannesGaessler · 2024-12-04T13:02:26Z

This PR refactors the GGUF code. Summary of the functional changes:

Refactor gguf_tensor_info to directly hold a ggml_tensor instead of fields duplicated from ggml_tensor. This enables the use of ggml_backend_tensor_get to write data from backends other than the CPU. The size field has been removed since it can be calculated from the tensor dimension and type and was therefore duplicated information.
Use fixed-width types for storing enums and boolean KV data. On most modern platforms booleans are stored as 1 byte integers and enums as 4 byte integers. These are the types that are now used on all platforms for storing/reading/writing the data. This should make the GGUF files more portable. I don't know how to properly check that floats and doubles are portable since (from what I can tell) there isn't e.g. a single macro that I can check to assert that the data layout is compliant with IEE 754. Right now it is simply asserted that floats and doubles have the expected size when initializing the GGUF context; quite honestly, if those types are somehow implemented differently GGML is probably going to be subtly broken in dozens of other places anyways.
On master, when calling gguf_set_tensor_type the GGUF context enters an inconsistent state where the sizes and offsets of the tensors don't match the tensor dimensions and types. This is only resolved with another call to gguf_set_tensor_data which then recalculates the offset. But there are no assertions that the tensor dimensions, types, sizes, and offsets are then actually consistent. This PR makes it so that instead the sizes and offsets are immediately recalculated after calling gguf_set_tensor_type. The user no longer explicitly passes a size to gguf_set_tensor_data but the size is instead calculated automatically.
gguf_get_tensor_name now returns const char * instead of char * since the name should not be modifiable via the getter.
New function gguf_get_tensor_size to assert that the buffers in user code have the expected size.
Fixed or added detection for various memory leaks, in particular when repeatedly calling gguf_set_array_data with the same key.
Stricter asserts for the data, e.g. that the row size rather than the total number of elements is divisible by the block size.

What could maybe also be done:

Currently KV and tensor indices internally use uint64_t for indexing but externally they return int. The C standard therefore strictly speaking only guarantees ~32k usable values. Would it make sense to instead internally use size_t and make that the type that is being returned? The returned value upon not finding something is -1, checks against this exact value would still work, but not checks for whether the returned value is < 0.
gguf_get_data returns a void * to the potentially internally allocated blob. I don't understand what the use case for this function is and I don't see it in use anywhere and if the goal is to calculate a pointer to the tensor data I think it would make more sense to have a dedicated function like gguf_get_tensor_data. At the very least I think it should return const void *.

slaren · 2024-12-04T15:30:51Z

ggml/src/ggml.c

                                    if (kv->value.arr.n >= SIZE_MAX/gguf_type_size(kv->value.arr.type)) {
                                        fprintf(stderr, "%s: array size is too large (%" PRIu64 ")\n", __func__, kv->value.arr.n);
                                        fclose(file);
                                        gguf_free(ctx);
                                        return NULL;
                                    }

-                                    kv->value.arr.data = calloc(kv->value.arr.n, gguf_type_size(kv->value.arr.type));
+                                    const size_t nbytes = kv->value.arr.n * gguf_type_size(kv->value.arr.type);
+                                    kv->value.arr.data = malloc(nbytes);


Do not change calloc to malloc, it is done that way to reduce the chance of security issues.

Should we also add a function like ggml_crealloc (since to my knowledge no equivalent C function exists)?

I think what we should do is rewrite this code in C++. This code is too sensitive to security issues, and C does not provide the necessary abstractions to be able to reason about this code such that we can be reasonably confident that it is memory safe.

To answer your question, we should return a failure rather than crashing when a memory allocation (or anything else) fails in the GGUF loader. If that's not possible, I guess crashing is better than nothing, but not really by much.

I've already made myself familiar with how the code works anyways so I would be fine with rewriting it as C++. Are there things in particular that would be important to consider?

They key aspect would be using RAII to handle all the resource allocations. The objects should always be in a consistent state such as the destructor call be called at any time to destroy them if necessary. Use exceptions if fatal errors in inner functions cannot be propagated safely in other way. Generally should use C++ containers and never rely on manual memory management. All memory accesses should be bounds-checked as well, e.g. there are some known issues because the KV array reading functions do not check bounds.

ggml/src/ggml.c

slaren · 2024-12-04T16:32:58Z

It would be very useful to be able to run the ggml-ci on changes like this one that have a broad impact. However to do that, the branch must be in the llama.cpp repository.

JohannesGaessler · 2024-12-04T16:54:14Z

I don't think it's possible to change the source branch of a PR so I will make a new one once I have the C++ rewrite.

slaren · 2024-12-04T16:48:12Z

examples/gguf/gguf.cpp

@@ -135,9 +133,11 @@ static bool gguf_ex_read_0(const std::string & fname) {

        for (int i = 0; i < n_tensors; ++i) {
            const char * name   = gguf_get_tensor_name  (ctx, i);
+            const size_t size   = gguf_get_tensor_size  (ctx, i);
+            // const size_t size   = 0;


Leftover?

Suggested change

// const size_t size = 0;

slaren · 2024-12-04T16:48:22Z

examples/gguf/gguf.cpp

@@ -182,9 +182,11 @@ static bool gguf_ex_read_1(const std::string & fname, bool check_data) {

        for (int i = 0; i < n_tensors; ++i) {
            const char * name   = gguf_get_tensor_name  (ctx, i);
+            const size_t size   = gguf_get_tensor_size  (ctx, i);
+            // const size_t size   = 0;


Suggested change

// const size_t size = 0;

slaren · 2024-12-04T16:56:00Z

ggml/src/ggml.c

+    if (sizeof(float)  != 4) {
+        GGML_ABORT("support for floats with != 32 bits not implemented");
+    }
+    if (sizeof(double) != 8) {
+        GGML_ABORT("support for doubles with != 64 bits not implemented");
+    }


Strictly speaking this is not enough, because floats may be represented in a different format even if the size matches. In C there is __STDC_IEC_559__ to check this, but it seems that not every compiler sets this, so it cannot be used reliably. In C++ it should be possible to use std::numeric_limits<T>::is_iec559 to check that both float and double are compliant.

It should probably be a static_assert somewhere, too much code assumes ieee-754 floats anyway, it's not just the GGUF loader.

GGUF: backend support, fixed-width I/O, misc fixes

b887270

github-actions bot added examples ggml changes relating to the ggml tensor library for machine learning labels Dec 4, 2024

fix wrong type in print

096b847

slaren reviewed Dec 4, 2024

View reviewed changes

ggml/src/ggml.c Show resolved Hide resolved

use calloc instead of malloc

a8046c8

JohannesGaessler closed this Dec 4, 2024

slaren reviewed Dec 4, 2024

View reviewed changes

JohannesGaessler mentioned this pull request Dec 14, 2024

tests: add tests for GGUF #10830

Merged

JohannesGaessler mentioned this pull request Jan 1, 2025

GGUF: C++ refactor, backend support, misc fixes #11030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GGUF: backend support, fixed-width I/O, misc fixes #10655

GGUF: backend support, fixed-width I/O, misc fixes #10655

Uh oh!

JohannesGaessler commented Dec 4, 2024 •

edited

Loading

Uh oh!

slaren Dec 4, 2024

Uh oh!

JohannesGaessler Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

JohannesGaessler Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

Uh oh!

slaren commented Dec 4, 2024

Uh oh!

JohannesGaessler commented Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

slaren Dec 4, 2024

Uh oh!

Uh oh!

GGUF: backend support, fixed-width I/O, misc fixes #10655

GGUF: backend support, fixed-width I/O, misc fixes #10655

Uh oh!

Conversation

JohannesGaessler commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slaren commented Dec 4, 2024

Uh oh!

JohannesGaessler commented Dec 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler commented Dec 4, 2024 •

edited

Loading