Replies: 2 comments
-
how about 3-ish? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Good idea! Maybe I should write in that PR. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
4
Now that monatis/clip.cpp is working, ı'd like to take the next step to implement multimodal generation models with that. I'm thinking of starting with LLaVA first, and then extending to InstructBLIP.
I'd like to get the opinion of maintainors and the community on where to implement it:
The 3rd option has the potential to reach the largest audience, but I know that it might make things unnecessarily complicated and hinder the speed of innovations here. What would the ideal path be?
Beta Was this translation helpful? Give feedback.
All reactions