Handle images in chat api #1828

drbh · 2024-04-29T16:15:32Z

This PR allows for messages to be formatted as simple strings, or as an array of objects including image urls. This is done by formatting content arrays into a simple string.

Example using llava-hf/llava-v1.6-mistral-7b-hf

curl localhost: 3000/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{
    "model": "tgi",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Whats in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
                    }
                }
            ]
        }
    ],
    "stream": false,
    "max_tokens": 20,
    "seed": 42
}'

is equivlant to this more simple request

curl localhost: 3000/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{
    "model": "tgi",
    "messages": [
        {
            "role": "user",
            "content": "Whats in this image?\n![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)"
        }
    ],
    "stream": false,
    "max_tokens": 20,
    "seed": 42
}'

output

# {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}%

HuggingFaceDocBuilderDev · 2024-04-29T16:39:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Narsil

Good minimal approach.

1- I think we can make better use of serde's typings to get cleaner errors (for instance here you don't have checks in place that either image_url or text is set and you're using unwraps invalidly imho.
2- I think we should keep that proper structure and send it all the way to the Python level so that real markdown can still be parsed as real markdown by the python layer.
3- Can we add some tests too ?

Narsil · 2024-04-30T08:35:31Z

router/src/lib.rs

+                            "text" => Ok(content.text.unwrap_or_default()),
+                            "image_url" => {
+                                if let Some(url) = content.image_url {
+                                    Ok(format!("\n![]({})", url.url))


Why the extra \n is shouldn't be here I think.

Narsil · 2024-04-30T08:37:13Z

router/src/lib.rs

+#[derive(Clone, Deserialize, Serialize, ToSchema, Default, Debug)]
+pub(crate) struct Content {
+    pub r#type: String,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub text: Option<String>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub image_url: Option<ImageUrl>,
+}


Why not use something like

#[serde(tag = "type")] enum ContentChunk{ Text(Text{ text: String}), Image(Image {image_url: String }) }

?

router/src/lib.rs

Narsil

Approving since this is functional (with the deletion of the newline) and usable by users.

Let's refactor this though I think (especially to pass around real payloads to avoid the markdown issue).

This PR allows for messages to be formatted as simple strings, or as an array of objects including image urls. This is done by formatting content arrays into a simple string. Example using `llava-hf/llava-v1.6-mistral-7b-hf` ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Whats in this image?" }, { "type": "image_url", "image_url": { "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png" } } ] } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` is equivlant to this more simple request ```bash curl localhost: 3000/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "model": "tgi", "messages": [ { "role": "user", "content": "Whats in this image?\n![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)" } ], "stream": false, "max_tokens": 20, "seed": 42 }' ``` output ``` # {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}% ``` --------- Co-authored-by: Nicolas Patry <[email protected]>

drbh added 3 commits April 29, 2024 12:46

feat: accept variable content in chat request api

a480273

feat: prefer custom deserializer for complex message content

91a705e

fix: remove redundant struct

4d040b3

drbh force-pushed the handle-images-in-chat-api branch from 1a609ac to 4d040b3 Compare April 29, 2024 16:46

drbh added 2 commits April 29, 2024 16:47

fix: adjust docs after rebase

5d8ef9f

fix: rebuilt and reran doc update

ecd00d5

Narsil reviewed Apr 30, 2024

View reviewed changes

router/src/lib.rs Outdated Show resolved Hide resolved

Update router/src/lib.rs

7182d5d

Narsil approved these changes Apr 30, 2024

View reviewed changes

Narsil merged commit c99ecd7 into main Apr 30, 2024
4 checks passed

Narsil deleted the handle-images-in-chat-api branch April 30, 2024 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle images in chat api #1828

Handle images in chat api #1828

Uh oh!

drbh commented Apr 29, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2024

Uh oh!

Narsil left a comment

Uh oh!

Narsil Apr 30, 2024

Uh oh!

Narsil Apr 30, 2024

Uh oh!

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Handle images in chat api #1828

Handle images in chat api #1828

Uh oh!

Conversation

drbh commented Apr 29, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2024

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Narsil Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

Narsil Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!