Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

EndlessReform · 2024-01-23T05:13:39Z

This PR makes some minor tweaks to the new OpenAI-compatible chat endpoint #1427 in GenerateParameters:

Disables decoder_input_details when streaming is enabled. This was causing all streaming chat requests to fail before, since decoder_input_details==true is not enabled when streaming tokens.
Passes through temperature and top_p hyperparameters from the API request to GenerateParameters

Testing

curl localhost:8080/v1/chat/completions \
    -X POST \
    -d '{
  "model": "",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": true, 
  "max_tokens": 20
}' \                                   
    -H 'Content-Type: application/json'

Should work correctly. Currently, most recent release from main returns error:

data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"}

It's my first time contributing to this project, so I could be missing something. Would especially appreciate @drbh's eyes on this one

drbh · 2024-01-23T14:53:55Z

Hi @EndlessReform thank you for this contribution. Code looks good and thanks for the helpful comments, just tested locally and seems to work 🙏

drbh

lgtm

@drbh

…pass temp and top-k from API (huggingface#1470) This PR makes some minor tweaks to the new OpenAI-compatible chat endpoint huggingface#1427 in `GenerateParameters`: - Disables `decoder_input_details` when streaming is enabled. This was causing all streaming chat requests to fail before, since [`decoder_input_details`==true is not enabled when streaming tokens](https://github.com/huggingface/text-generation-inference/blob/98e5faff9daec6170cc2b0f963f2d73cf846b341/router/src/validation.rs#L406). - Passes through `temperature` and `top_p` hyperparameters from the API request to `GenerateParameters` ## Testing ```bash curl localhost:8080/v1/chat/completions \ -X POST \ -d '{ "model": "", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is deep learning?" } ], "stream": true, "max_tokens": 20 }' \ -H 'Content-Type: application/json' ``` Should work correctly. Currently, most recent release from `main` returns error: ``` data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"} ``` It's my first time contributing to this project, so I could be missing something. Would especially appreciate @drbh's eyes on this one

EndlessReform added 2 commits January 22, 2024 22:14

Disable decoder_input_details for streaming requests

4347960

Transparently pass through temp and top_p

d805612

drbh approved these changes Jan 23, 2024

View reviewed changes

drbh merged commit 82f87ad into huggingface:main Jan 23, 2024

EndlessReform deleted the fix-oai-stream branch January 24, 2024 04:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

Uh oh!

EndlessReform commented Jan 23, 2024 •

edited

Loading

Uh oh!

drbh commented Jan 23, 2024

Uh oh!

drbh left a comment

Uh oh!

Uh oh!

Disable decoder_input_details on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

Disable decoder_input_details on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

Uh oh!

Conversation

EndlessReform commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

drbh commented Jan 23, 2024

Uh oh!

drbh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API #1470

EndlessReform commented Jan 23, 2024 •

edited

Loading