feat: concat the adapter id to the model id in chat response #2779

drbh · 2024-11-25T14:51:28Z

This PR concatenates the adapter id to the model id when an adapter id is specified in the chat request.

the follwing request

curl http: //localhost:3000/v1/chat/completions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "google-cloud-partnership/gemma-2-2b-it-lora-jap-en",
    "messages": [
        {
            "content": "たくさんお金を稼ぎましたか？ Translate: ",
            "role": "user"
        }
    ],
    "max_tokens": 10,
    "stream": false
}' | jq

currently returns

{
    "object": "chat.completion",
    "id": "",
    "created": 1732545871,
    "model": "google/gemma-2-2b-it",
    "system_fingerprint": "2.4.2-dev0-native",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Have you made a fortune?"
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 7,
        "total_tokens": 26
    }
}

and with the changes returns an updated model id

{
    "object": "chat.completion",
    "id": "",
    "created": 1732545871,
    "model": "google-cloud-partnership/gemma-2-2b-it-lora-jap-en",
    "system_fingerprint": "2.4.2-dev0-native",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Have you made a fortune?"
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 7,
        "total_tokens": 26
    }
}

philschmid · 2024-11-25T15:55:32Z

router/src/server.rs

+    // extract model id from request if specified
+    let model_id = match model.as_deref() {
+        Some("tgi") | None => info.model_id.clone(),
+        Some(m_id) => format!("{}+{}", info.model_id, m_id),


IMO it should only be google-cloud-partnership/gemma-2-2b-it-lora-jap-en

@philschmid that case is much more concise, however it leaves out the current model id which could obscure information from the caller. Since the generation is a combination of the model and the adapter it seems to reason to add both ids.

I'm happy to change the id to just the adapter if that makes the most sense, however for the reasons above I think its best to include both

But its not what openai or vllm does. They return the adapter or the "real" model. google-cloud-partnership/gemma-2-2b-it-lora-jap-en defines the base model in the config.json so i would only go with the adapter id

I would try to stay in sync with what others are doing.

ah thats a great reason to only use the adapter id, just updated in the latest commit

drbh · 2024-11-25T17:36:26Z

merging as the failing CI is unrelated - and issues have been resolved/approved before the string change

feat: concat the adapter id to the model id in chat response

651a039

OlivierDehaene previously approved these changes Nov 25, 2024

View reviewed changes

philschmid reviewed Nov 25, 2024

View reviewed changes

fix: updated to include only the adapter id in chat response

6082146

drbh dismissed OlivierDehaene’s stale review via 6082146 November 25, 2024 16:23

drbh merged commit c637d68 into main Nov 25, 2024
10 of 12 checks passed

drbh deleted the include-adapter-id-with-model-id-in-repsonse branch November 25, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: concat the adapter id to the model id in chat response #2779

feat: concat the adapter id to the model id in chat response #2779

Uh oh!

drbh commented Nov 25, 2024 •

edited

Loading

Uh oh!

philschmid Nov 25, 2024

Uh oh!

drbh Nov 25, 2024

Uh oh!

philschmid Nov 25, 2024 •

edited

Loading

Uh oh!

drbh Nov 25, 2024

Uh oh!

drbh commented Nov 25, 2024

Uh oh!

Uh oh!

Uh oh!

feat: concat the adapter id to the model id in chat response #2779

feat: concat the adapter id to the model id in chat response #2779

Uh oh!

Conversation

drbh commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philschmid Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

drbh Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

philschmid Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drbh Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

drbh commented Nov 25, 2024

Uh oh!

Uh oh!

Uh oh!

drbh commented Nov 25, 2024 •

edited

Loading

philschmid Nov 25, 2024 •

edited

Loading