I'm starting to implement LLaVA.cpp, but where? #1996

monatis · 2023-06-25T19:37:02Z

monatis
Jun 25, 2023
Collaborator

4
Now that monatis/clip.cpp is working, ı'd like to take the next step to implement multimodal generation models with that. I'm thinking of starting with LLaVA first, and then extending to InstructBLIP.

I'd like to get the opinion of maintainors and the community on where to implement it:

as an example in ggml
as a fork of llama.cpp, e.g., llava.cpp, or even more ambitiously multimodal.cpp, thinking of the support for InstructBLIP in the future.
as a feature in llama.cpp.
Any other suggestions.

The 3rd option has the potential to reach the largest audience, but I know that it might make things unnecessarily complicated and hinder the speed of innovations here. What would the ideal path be?

Green-Sky · 2023-06-25T21:10:45Z

Green-Sky
Jun 25, 2023
Collaborator

how about 3-ish?
integrating with #1910 , and replacing the .py that pr is depending on would be nice.

0 replies

monatis · 2023-06-26T08:04:30Z

monatis
Jun 26, 2023
Collaborator Author

Good idea! Maybe I should write in that PR.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I'm starting to implement LLaVA.cpp, but where? #1996

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I'm starting to implement LLaVA.cpp, but where? #1996

Uh oh!

monatis Jun 25, 2023 Collaborator

Replies: 2 comments

Uh oh!

Uh oh!

Green-Sky Jun 25, 2023 Collaborator

Uh oh!

monatis Jun 26, 2023 Collaborator Author

monatis
Jun 25, 2023
Collaborator

Green-Sky
Jun 25, 2023
Collaborator

monatis
Jun 26, 2023
Collaborator Author