Skip to content

mtmd : add C public API #13184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 4, 2025
Merged

mtmd : add C public API #13184

merged 18 commits into from
May 4, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 29, 2025

Fix #13124

/**
 * libmtmd: A library for multimodal support in llama.cpp.
 *
 * WARNING: This API is experimental and subject to many BREAKING CHANGES.
 *          Issues related to API usage may receive lower priority support.
 *
 * For the usage, see an example in mtmd-cli.cpp
 */

The global idea of this PR is to make C-only wrapper that wraps around C++ type. Think of it as a manually transpiled version of libmtmd from C++ to C

The idea of this PR is as follow:

  • All struct containing C++ type are converted to opaque pointer
  • Opaque (private) types will need setter/getter function to interact with them
  • C++ convenient wrappers will be added to prevent manual free() calls in cpp code, they will be grouped into a namespace

For example:

// old code
mtmd_input_chunks chunks = ...;
for (auto & chunk : chunks) {
    mtmd_input_chunk_type type = chunk.type;
    size_t n_tokens = chunk.tokens_text.size();
    ...
}

// new code
mtmd_input_chunks * chunks = mtmd_input_chunks_init();
int32_t res_code = mtmd_tokenize(..., chunks, ...);
size_t n_chunks = mtmd_input_chunks_size(chunks);
for (size_t i = 0; i < n_chunks; i++) {
    mtmd_input_chunk * chunk = mtmd_input_chunks_get(chunks, i);
    mtmd_input_chunk_type type = mtmd_input_chunk_get_type(chunk);
    size_t n_tokens;
    mtmd_input_chunk_get_tokens_text(chunk, &n_tokens);
    ...
}
mtmd_input_chunks_free(chunks);

// or, with c++ wrapper
mtmd::input_chunks chunks;
int32_t res_code = mtmd_tokenize(..., chunks.ptr.get(), ...);
size_t n_chunks = mtmd_input_chunks_size(chunks);
for (size_t i = 0; i < n_chunks; i++) {
    // (same as above)
    ...
}

@ngxson ngxson requested review from ggerganov and slaren April 29, 2025 14:29
@ngxson ngxson changed the title mtmd : add C-only public API mtmd : add C public API Apr 29, 2025
@github-actions github-actions bot added the testing Everything test related label Apr 29, 2025
@ggerganov
Copy link
Member

Wouldn't having to maintain a C API for libmtmd make the development harder without benefits?

My understanding is that libmtmd allows us to prototype the multi-modality functionality in parallel to the changes that are needed in libllama to support this. And eventually, the multi-modality should be supported directly from libllama in order to reuse the existing infra for model and context management. If we provide a libmtmd C API now, we would have to maintain it and deal with a lot of breaking changes in the future. So I don't think I see a reason to add the C API now. Maybe I am missing something?

@ngxson
Copy link
Collaborator Author

ngxson commented May 2, 2025

The key benefit I was thinking about was to allow user to use libmtmd in their downstream project. This will allow us to gather more feedback about the internal design and the API.

Also, this is also quite necessary as more small vision models are available, people want to implement it in their mobile applications. While they can wait until the proper support to come in libllama, I think it's not guaranteed to come soon in 1-2 months. And if it ever come in the future, my vision is to provide a simple way to convert most of the API from libmtmd to libllama.

And finally, that's also why the C API is needed in libmtmd, it's part of the experiment how we design a C API that deal with multimodal input.

Re. your point about breaking changes, this is a valid concern, but I think I'm following the trajectory of libllama in the early days. IIRC we didn't have a stable API until a certain version because things was changing quite fast. libmtmd is in early development and I think breaking changes are expected in either C or C++ API. To make it clear, I will state in the header file that libmtmd is experimental, breaking changes are expected

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's early to expose an API and invite third-party projects to interface with it. But if you want to give this a try that's OK. I like the momentum that you've created with the implementation so far. Just take some steps to inform the developers that this will be very unstable and that issues will be handled with low priority.

IMO the most important feedback that we will get is from integrating libmtmd in llama-server and figuring out how to support all the necessary features.

It's difficult for me to provide a comprehensive review at this point. The C APIs are hard to get right and usually I take an approach to implement some examples that exercise them in order to understand what is needed.

Unless there are some additional concerns, I think we can merge this. @slaren Curious if you have any thoughts too.

// you can move the chunk ownership to your own code
// this will release the chunk from the list of input chunks
// remember to free the chunk when you are done with it
MTMD_API mtmd_input_chunk * mtmd_input_chunk_release(mtmd_input_chunk * chunk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the usage of this function, but maybe a better name would be mtmd_input_chunk_take. But generally it seems like something that should not be needed.

Copy link
Collaborator Author

@ngxson ngxson May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this API is inspired by unique_ptr::release(), which transfer the ownership of a chunk from its container to user code.

For the usage, it's actually related to this discussion on llama-server. Basically the idea here is to have a mapping std::map<llama_pos, mtmd_input_chunk *> to map the image chunk to the correct position in llama_tokens array. And since I've been playing around with the same idea on wllama, I'm pretty sure that this is what we want to have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is making a copy rather than really releasing it, would mtmd_input_chunk_dup or copy be more accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore that, I see that it is moved and only a small struct is allocated. Wouldn't this leave the mtmd_input_chunks that used to own this chunk in a bad state?

Copy link
Collaborator Author

@ngxson ngxson May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will leave the position of that chunk in mtmd_input_chunks to be in an invalid state (actually not invalid, but it will be a text chunk with 0 tokens)

You idea of mtmd_input_chunk_dup sounds better though. I'll implement it. I think the cost of copying some images could be negligible for now, as we not yet design this API to accept video input (which is essentially a sequence of images). In case more models support video input in the future, we can introduce another API specifically for tokenizing video.

Copy link
Collaborator Author

@ngxson ngxson May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in 6bc7a30

Edit: some structs may also need clone() function, I added it here: 4d842eb

@ngxson
Copy link
Collaborator Author

ngxson commented May 2, 2025

Thanks for the feedback. I'll add some comments to tell that the libmtmd is under active development and breaking changes are expected.

At this stage, I think not many developers are even aware of the existence of this library, so I assume that we won't get many reports. If we start to get more reported issue about the usage of libmtmd in downstream project, I'll add a dedicated issue template to inform to them that the issue will be low-prio.

And yes I would love to hear your thought on this proposal @slaren

Comment on lines 203 to 221
MTMD_API int32_t mtmd_helper_eval_chunks(mtmd_context * ctx,
struct llama_context * lctx,
mtmd_input_chunks * chunks,
llama_pos n_past,
llama_seq_id seq_id,
int32_t n_batch,
bool logits_last,
llama_pos * new_n_past);

// works like mtmd_helper_eval_chunks(), but only for a single chunk
// this function is NOT thread-safe
MTMD_API int32_t mtmd_helper_eval_chunk_single(mtmd_context * ctx,
struct llama_context * lctx,
mtmd_input_chunk * chunk,
llama_pos n_past,
llama_seq_id seq_id,
int32_t n_batch,
bool logits_last,
llama_pos * new_n_past);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const chunks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup thanks, I added const to various places. Places not having const are:

  • *_init()
  • setters (for ex. mtmd_bitmap_set_id)
  • *_free()
  • mtmd_input_chunk_copy --> kinda equivalent to _init()
  • mtmd_tokenize --> because it takes mtmd_input_chunks * output and modify it

I'm merging this PR once the CI is green

@ngxson
Copy link
Collaborator Author

ngxson commented May 4, 2025

For viz, I added this comment which should be enough to communicate about the state of libmtmd:

/**
 * libmtmd: A library for multimodal support in llama.cpp.
 *
 * WARNING: This API is experimental and subject to many BREAKING CHANGES.
 *          Issues related to API usage may receive lower priority support.
 *
 * For the usage, see an example in mtmd-cli.cpp
 */

@ngxson
Copy link
Collaborator Author

ngxson commented May 4, 2025

I implemented this API in #12898 and it works well, cache tokens also work correctly. Should be good to merge.

@ngxson ngxson merged commit 27aa259 into ggml-org:master May 4, 2025
45 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 6, 2025
* origin/master: (27 commits)
llama : fix build_ffn without gate (ggml-org#13336)
CUDA: fix bad asserts for partial offload (ggml-org#13337)
convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331)
CUDA: fix --split-mode row for MMQ (ggml-org#13323)
gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036)
CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264)
server : Webui - change setText command from parent window to also send the message. (ggml-org#13309)
mtmd : rename llava directory to mtmd (ggml-org#13311)
clip : fix confused naming ffn_up and ffn_down (ggml-org#13290)
convert : bailingmoe : set yarn metadata if present (ggml-org#13312)
SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308)
mtmd : add C public API (ggml-org#13184)
rpc : use backend registry, support dl backends (ggml-org#13304)
ggml : activate s390x simd for Q3_K (ggml-org#13301)
llava/mtmd : fixes to fully support dl backends (ggml-org#13303)
llama : build windows releases with dl backends (ggml-org#13220)
CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299)
CUDA: fix race condition in MMQ ids_dst (ggml-org#13294)
vulkan: Additional type support for unary, binary, and copy (ggml-org#13266)
...
@zhouwg
Copy link
Contributor

zhouwg commented May 23, 2025

The key benefit I was thinking about was to allow user to use libmtmd in their downstream project. This will allow us to gather more feedback about the internal design and the API.

Also, this is also quite necessary as more small vision models are available, people want to implement it in their mobile applications. While they can wait until the proper support to come in libllama, I think it's not guaranteed to come soon in 1-2 months. And if it ever come in the future, my vision is to provide a simple way to convert most of the API from libmtmd to libllama.

And finally, that's also why the C API is needed in libmtmd, it's part of the experiment how we design a C API that deal with multimodal input.

Re. your point about breaking changes, this is a valid concern, but I think I'm following the trajectory of libllama in the early days. IIRC we didn't have a stable API until a certain version because things was changing quite fast. libmtmd is in early development and I think breaking changes are expected in either C or C++ API. To make it clear, I will state in the header file that libmtmd is experimental, breaking changes are expected

ngxson, thanks for your MTMD API. it's very helpful in downstream project: https://github.com/kantv-ai/kantv/blob/master/core/ggml/jni/realtime-video-recognition.cpp#L304.

I have two tech questions here:

  • could we using multiple(e.g. 2) libllama instance in a single process?
  • could we using multiple(e.g. 2) libmtmd instance in a single process?

thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add C api for mtmd
4 participants