-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Handle images in chat api #1828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
1a609ac
to
4d040b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good minimal approach.
1- I think we can make better use of serde's typings to get cleaner errors (for instance here you don't have checks in place that either image_url or text is set and you're using unwraps invalidly imho.
2- I think we should keep that proper structure and send it all the way to the Python level so that real markdown can still be parsed as real markdown by the python layer.
3- Can we add some tests too ?
router/src/lib.rs
Outdated
"text" => Ok(content.text.unwrap_or_default()), | ||
"image_url" => { | ||
if let Some(url) = content.image_url { | ||
Ok(format!("\n", url.url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra \n
is shouldn't be here I think.
#[derive(Clone, Deserialize, Serialize, ToSchema, Default, Debug)] | ||
pub(crate) struct Content { | ||
pub r#type: String, | ||
#[serde(default, skip_serializing_if = "Option::is_none")] | ||
pub text: Option<String>, | ||
#[serde(default, skip_serializing_if = "Option::is_none")] | ||
pub image_url: Option<ImageUrl>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use something like
#[serde(tag = "type")]
enum ContentChunk{
Text(Text{ text: String}),
Image(Image {image_url: String })
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving since this is functional (with the deletion of the newline) and usable by users.
Let's refactor this though I think (especially to pass around real payloads to avoid the markdown issue).
This PR allows for messages to be formatted as simple strings, or as an array of objects including image urls. This is done by formatting content arrays into a simple string. Example using `llava-hf/llava-v1.6-mistral-7b-hf` ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Whats in this image?" }, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png" } } ] } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` is equivlant to this more simple request ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": "Whats in this image?\n" } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` output ``` # {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}% ``` --------- Co-authored-by: Nicolas Patry <[email protected]>
This PR allows for messages to be formatted as simple strings, or as an array of objects including image urls. This is done by formatting content arrays into a simple string. Example using `llava-hf/llava-v1.6-mistral-7b-hf` ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Whats in this image?" }, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png" } } ] } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` is equivlant to this more simple request ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": "Whats in this image?\n" } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` output ``` # {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}% ``` --------- Co-authored-by: Nicolas Patry <[email protected]>
This PR allows for messages to be formatted as simple strings, or as an array of objects including image urls. This is done by formatting content arrays into a simple string.
Example using
llava-hf/llava-v1.6-mistral-7b-hf
is equivlant to this more simple request
output