llama : add `llama_model_load_from_splits` #11255

ngxson · 2025-01-15T16:59:42Z

Some downstream program may want to use non-conventional file name. For example, ollama is using SHA256 as file name. This can make adding support for multi-splits GGUF become tricky.

This PR adds a new API llama_model_load_from_splits that allow user to manually specify a list of GGUF files:

    // Load the model from multiple splits (support custom naming scheme)
    // The paths must be in the correct order
    LLAMA_API struct llama_model * llama_model_load_from_splits(
                             const char ** paths,
                                 size_t    n_paths,
              struct llama_model_params    params);

src/llama.cpp

ggerganov · 2025-01-16T08:57:21Z

src/llama-model-loader.h

+// return a list of splits for a given path
+// for example, given "<name>-00002-of-00004.gguf", returns list of all 4 splits
+std::vector<std::string> llama_get_list_splits(const std::string & path, const int n_split);


This can be static function in the source file only - no need to add it in the header.

There is also an existing a llama_split_ prefix which seems suitable to use for this function: llama_split_get_list()

Ah yeah, I wanted to use this in llama.cpp but decided not to do that in the end. Forgot to delete it in the header file.

It should be fixed with 49822ba

jianlinshi · 2025-01-17T01:53:57Z

How to use it in command line?

ngxson · 2025-01-17T10:14:44Z

I don't add this to CLI for now, as this is only useful for downstream users.

For llama-cli we simply support <name>-0000x-of-0000x.gguf file name for now. Not sure in which cases you want to have different naming?

jianlinshi · 2025-01-17T19:37:44Z

It would be very helpful to start with:
llama-serve with the first split file -0001-of-000x.gguf
I often had issues with merging, and that process will need extra space. I am working on a controlled device, the space is often an issue.
Thank you!

* llama : add `llama_model_load_from_splits` * update

llama : add llama_model_load_from_splits

1782462

ngxson requested a review from ggerganov January 15, 2025 16:59

ggerganov approved these changes Jan 16, 2025

View reviewed changes

update

49822ba

ngxson merged commit 681149c into ggml-org:master Jan 16, 2025
48 checks passed

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

llama : add llama_model_load_from_splits (ggml-org#11255)

b154edc

* llama : add `llama_model_load_from_splits` * update

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

llama : add llama_model_load_from_splits (ggml-org#11255)

ae92c72

* llama : add `llama_model_load_from_splits` * update

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

llama : add llama_model_load_from_splits (ggml-org#11255)

78fbc0d

* llama : add `llama_model_load_from_splits` * update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add `llama_model_load_from_splits` #11255

llama : add `llama_model_load_from_splits` #11255

Uh oh!

ngxson commented Jan 15, 2025

Uh oh!

Uh oh!

ggerganov Jan 16, 2025

Uh oh!

ngxson Jan 16, 2025

Uh oh!

Uh oh!

jianlinshi commented Jan 17, 2025

Uh oh!

ngxson commented Jan 17, 2025

Uh oh!

jianlinshi commented Jan 17, 2025

Uh oh!

Uh oh!

llama : add llama_model_load_from_splits #11255

llama : add llama_model_load_from_splits #11255

Uh oh!

Conversation

ngxson commented Jan 15, 2025

Uh oh!

Uh oh!

ggerganov Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jianlinshi commented Jan 17, 2025

Uh oh!

ngxson commented Jan 17, 2025

Uh oh!

jianlinshi commented Jan 17, 2025

Uh oh!

Uh oh!

llama : add `llama_model_load_from_splits` #11255

llama : add `llama_model_load_from_splits` #11255