Skip to content

Add Mistral AI Chat Completion support to Inference Plugin #128538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented May 27, 2025

Change to existing Mistral AI provider integration allowing completion (both streaming and non-streaming) and chat_completion (only streaming) to be executed as part of inference API.
Changes were tested against next models:

  • mistral-large-latest
  • mistral-small-latest

Notes:

  • Mistral returns at least 5 different formats of non-streaming errors. While in their documentation only 1 type is mentioned. However MistralErrorEntity handles error formats for 4 most common errors: Unauthorized, Bad Request, Not Found, Unprocessable Entity.
  • Format of streaming errors is not defined in Mistral documentation and wasn't received during testing. Assuming mid stream errors follow Open AI format.
  • Changes were made to common OpenAI Response Handler allowing it to handle more error codes. That might affect behavior for other providers, but it is better solution than having duplication.
  • Mistral AI doesn't recognize "stream_options" field despite it being present in OpenAI schema. I added provider-based solution to not include it for Mistral.
  • Task settings not being passed as parameters but not being used is already present solution that might need improvement, e.g. not passing it as parameter.

Examples of RQ/RS from local testing:

Create Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-completion",
    "task_type": "completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/completion/mistral-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "wrong-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [Invalid model: wrong-model-name]"
        }
    },
    "status": 400
}
Perform Non-Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "completion": [
        {
            "result": "The sentence you've provided is the opening line of William Gibson's seminal cyberpunk novel *Neuromancer*. This vivid and evocative description sets the tone for the dystopian, high-tech, low-life world that the novel explores. The imagery of a \"dead channel\" on a television screen suggests a sense of emptiness, static, and the absence of clear signals or information, which can be seen as a metaphor for the fragmented and often chaotic nature of the future Gibson envisions.\n\nThe use of such a striking opening line is characteristic of Gibson's style, which often blends technological and cultural references to create a rich, immersive atmosphere. *Neuromancer* is known for its influence on the cyberpunk genre and its prescient exploration of themes related to artificial intelligence, virtual reality, and the digital age."
        }
    ]
}

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"
    },
    "status": 404
}
Perform Streaming Completion

Success:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: message
data: {"completion":[{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sentence"},{"delta":" you"}]}

event: message
data: {"completion":[{"delta":"'ve"}]}

event: message
data: {"completion":[{"delta":" provided"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" William"}]}

event: message
data: {"completion":[{"delta":" Gibson"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" seminal"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" *"}]}

event: message
data: {"completion":[{"delta":"Ne"}]}

event: message
data: {"completion":[{"delta":"u"}]}

event: message
data: {"completion":[{"delta":"rom"}]}

event: message
data: {"completion":[{"delta":"ancer"}]}

event: message
data: {"completion":[{"delta":"*."}]}

event: message
data: {"completion":[{"delta":" This"}]}

event: message
data: {"completion":[{"delta":" vivid"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" evoc"}]}

event: message
data: {"completion":[{"delta":"ative"}]}

event: message
data: {"completion":[{"delta":" description"}]}

event: message
data: {"completion":[{"delta":" sets"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" tone"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" dyst"}]}

event: message
data: {"completion":[{"delta":"op"}]}

event: message
data: {"completion":[{"delta":"ian"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" high"}]}

event: message
data: {"completion":[{"delta":"-tech"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" low"}]}

event: message
data: {"completion":[{"delta":"-life"}]}

event: message
data: {"completion":[{"delta":" world"}]}

event: message
data: {"completion":[{"delta":" that"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" novel"}]}

event: message
data: {"completion":[{"delta":" explores"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"dead"}]}

event: message
data: {"completion":[{"delta":" channel"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" on"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" television"}]}

event: message
data: {"completion":[{"delta":" screen"}]}

event: message
data: {"completion":[{"delta":" suggests"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" empt"}]}

event: message
data: {"completion":[{"delta":"iness"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" static"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" absence"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" clear"}]}

event: message
data: {"completion":[{"delta":" signals"}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" information"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" can"}]}

event: message
data: {"completion":[{"delta":" be"}]}

event: message
data: {"completion":[{"delta":" seen"}]}

event: message
data: {"completion":[{"delta":" as"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" metaphor"}]}

event: message
data: {"completion":[{"delta":" for"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" fragmented"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" confusing"},{"delta":" reality"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" characters"}]}

event: message
data: {"completion":[{"delta":" in"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" story"}]}

event: message
data: {"completion":[{"delta":".\n\n"}]}

event: message
data: {"completion":[{"delta":"Gib"}]}

event: message
data: {"completion":[{"delta":"son"}]}

event: message
data: {"completion":[{"delta":"'s"}]}

event: message
data: {"completion":[{"delta":" use"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" such"}]}

event: message
data: {"completion":[{"delta":" imagery"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" characteristic"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" cyber"}]}

event: message
data: {"completion":[{"delta":"punk"}]}

event: message
data: {"completion":[{"delta":" genre"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" often"}]}

event: message
data: {"completion":[{"delta":" blends"}]}

event: message
data: {"completion":[{"delta":" advanced"}]}

event: message
data: {"completion":[{"delta":" technology"}]}

event: message
data: {"completion":[{"delta":" with"}]}

event: message
data: {"completion":[{"delta":" social"}]}

event: message
data: {"completion":[{"delta":" decay"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" sense"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" alien"}]}

event: message
data: {"completion":[{"delta":"ation"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":" The"}]}

event: message
data: {"completion":[{"delta":" \""}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":"\""}]}

event: message
data: {"completion":[{"delta":" mentioned"}]}

event: message
data: {"completion":[{"delta":" could"}]}

event: message
data: {"completion":[{"delta":" refer"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" space"}]}

event: message
data: {"completion":[{"delta":"port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" se"}]}

event: message
data: {"completion":[{"delta":"aport"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" even"}]}

event: message
data: {"completion":[{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" data"}]}

event: message
data: {"completion":[{"delta":" port"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" adding"}]}

event: message
data: {"completion":[{"delta":" to"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" ambiguity"}]}

event: message
data: {"completion":[{"delta":" and"}]}

event: message
data: {"completion":[{"delta":" futur"}]}

event: message
data: {"completion":[{"delta":"istic"}]}

event: message
data: {"completion":[{"delta":" feel"}]}

event: message
data: {"completion":[{"delta":" of"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" setting"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: [DONE]

Not Found:

RQ:
POST {{base-url}}/_inference/completion/mistral-completion/_stream
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS:
event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [Not Found]"},"status":404}
Create Chat Completion Endpoint

Success:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "inference_id": "mistral-chat-completion",
    "task_type": "chat_completion",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-small-latest",
        "rate_limit": {
            "requests_per_minute": 240
        }
    }
}

Unauthorized:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{invalid-mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [Unauthorized]"
        }
    },
    "status": 400
}

Not Found:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "mistral-small-latest"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]"
        }
    },
    "status": 400
}

Invalid Model:

RQ:
PUT {{base-url}}/_inference/chat_completion/mistral-chat-completion
{
    "service": "mistral",
    "service_settings": {
        "api_key": "{{mistral-api-key}}",
        "model": "invalid-model-name"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]"
        }
    },
    "status": 400
}
Perform Streaming Chat Completion

Success:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk"}

event: message
data: {"id":"c79e758e9d1a4e89866c9165701496f2","choices":[{"delta":{"content":" learning"},"finish_reason":"length","index":0}],"model":"mistral-small-latest","object":"chat.completion.chunk","usage":{"completion_tokens":2,"prompt_tokens":8,"total_tokens":10}}

event: message
data: [DONE]

Invalid Model:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "invalid-model-name",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [Invalid model: invalid-model-name]","type":"mistral_error"}}

Negative Max Tokens:
RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": -1
}
RS:
event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Not Found:

RQ:
POST {{base-url}}/_inference/chat_completion/mistral-chat-completion/_stream
{
    "model": "mistral-small-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 2
}
RS:
event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [Not Found]","type":"mistral_error"}}
  • - Have you signed the contributor license agreement?
  • - Have you followed the contributor guidelines?
  • - If submitting code, have you built your formula locally prior to submission with gradle check?
  • - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • - If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • - If you are submitting this code for a class then read our policy for that.

@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 27, 2025
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I left a few suggestions.

Could you update the description of the PR so that the example requests are formatted?

Let's also wrap them in code blocks using the three backticks. Thanks for making them collapsable sections though!

builder.startObject(STREAM_OPTIONS_FIELD);
builder.field(INCLUDE_USAGE_FIELD, true);
builder.endObject();
fillStreamOptionsFields(builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an FYI we have some inflight changes that'll affect how we do this: #128592

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the heads up! I inspected the changes in the attached PR. They don't affect my changes. We're good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant is that we can use the same approach here. Let's leverage the Params class to determine whether to serialize the stream options, the same way that the PR I listed is doing. That way we don't need to subclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import org.elasticsearch.xpack.inference.services.mistral.response.MistralErrorResponseEntity;

/**
* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Handles non-streaming chat completion responses for Mistral models, extending the OpenAI chat completion response handler.
* Handles non-streaming completion responses for Mistral models, extending the OpenAI chat completion response handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@@ -0,0 +1,51 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prwhelan any suggestions on how to intentionally encounter a midstream error while testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To followup, I don't think we have a great way to test this.

@Jan-Kazlouski-elastic can you try initiating a streaming request and disable your internet (or some sort of similar failure) in the middle of the response being streamed

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve tried this several times and consistently received read ECONNRESET errors, these are client-side errors coming from Postman, not server-side ones. By shutting off the client’s internet, we’re preventing it from receiving any further stream data, including both the remaining valid chunks and any potential error payload.
If Mistral does construct an error response, it would still need to reach the client, but in this case, the client is no longer able to receive anything.
All errors returned by Mistral so far are strictly non-streaming. The only scenario where Mistral might return an error midstream would be due to a genuine server-side malfunction on their part. Or a rate limit, but testing it would be too expensive and even then, the error would probably be non-streaming.

Maybe we could contact Mistral to clarify that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could contact Mistral to clarify that?

Yeah let's bring this up at the next meeting and see if we can get Serena to follow up.

* Handles streaming chat completion responses and error parsing for Mistral inference endpoints.
* Adapts the OpenAI handler to support Mistral's simpler error schema with fields like "message" and "http_status_code".
*/
public class MistralUnifiedChatCompletionResponseHandler extends OpenAiUnifiedChatCompletionResponseHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the midstream errors are in the same format as openai, how about we refactor the OpenAiUnifiedChatCompletionResponseHandler so that we can replace some of the strings that reference openai specifically? I think it's just the name of the parser:

"open_ai_error",
true,
args -> Optional.ofNullable((OpenAiErrorResponse) args[0])
);
private static final ConstructingObjectParser<OpenAiErrorResponse, Void> ERROR_BODY_PARSER = new ConstructingObjectParser<>(
"open_ai_error",

Maybe we could extract those classes and rename them to be more generic?

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
But we're not positive that midstream errors are in the same format as openai. It just assumed of the fact that Mistral uses OpenAI type API

@Jan-Kazlouski-elastic
Copy link
Contributor Author

Hi @jonathan-buttner
I addressed the comments. All fixes are done. I replied to the one comment related to the OpenAI handlers hierarchy #128538 (comment) to discuss it a bit further.

Also I updated the section related to testing. Three backticks didn't work for me when I was creating initial PR comment, but when I applied them afterwards - worked perfectly. Not sure why, but I will remember for the future PR creation.

@Jan-Kazlouski-elastic
Copy link
Contributor Author

New error format:

Create Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"a580d263fb1521778782b22104efb415\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: wrong-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}
Perform Non-Streaming Completion

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
    },
    "status": 404
}
Perform Streaming Completion

Not Found:

event: error
data: {"error":{"root_cause":[{"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"}],"type":"status_exception","reason":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"},"status":404}
Create Chat Completion Endpoint

Not Found:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Unauthorized:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [mistral-chat-completion] status [401]. Error message: [{\n  \"message\":\"Unauthorized\",\n  \"request_id\":\"409ddf538d3f1a55bfe4b7324fe01676\"\n}]"
        }
    },
    "status": 400
}

Invalid Model:

{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]"
        }
    },
    "status": 400
}
Perform Streaming Chat Completion

Not Found:

event: error
data: {"error":{"code":"not_found","message":"Resource not found at [https://api.mistral.ai/v1/chat/completions] for request from inference entity id [mistral-chat-completion] status [404]. Error message: [{\"detail\":\"Not Found\"}]","type":"mistral_error"}}

Negative Max Tokens:

event: error
data: {"error":{"code":"unprocessable_entity","message":"Received an input validation error response for request from inference entity id [mistral-chat-completion] status [422]. Error message: [Input should be greater than or equal to 0]","type":"mistral_error"}}

Invalid Model:

event: error
data: {"error":{"code":"bad_request","message":"Received a bad request status code for request from inference entity id [mistral-chat-completion] status [400]. Error message: [{\"object\":\"error\",\"message\":\"Invalid model: invalid-model-name\",\"type\":\"invalid_model\",\"param\":null,\"code\":\"1500\"}]","type":"mistral_error"}}

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, left a few suggestions.

builder.startObject(STREAM_OPTIONS_FIELD);
builder.field(INCLUDE_USAGE_FIELD, true);
builder.endObject();
fillStreamOptionsFields(builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant is that we can use the same approach here. Let's leverage the Params class to determine whether to serialize the stream options, the same way that the PR I listed is doing. That way we don't need to subclass.

if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MistralChatCompletionServiceSettings that = (MistralChatCompletionServiceSettings) o;
return Objects.equals(modelId, that.modelId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to include rateLimitSettings here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that. Thank you. Fixed.


@Override
public int hashCode() {
return Objects.hash(modelId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to include rateLimitSettings here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that. Thank you. Fixed.

}

@Override
protected Exception buildError(String message, Request request, HttpResult result, ErrorResponse errorResponse) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this block is nearly identical for Hugging Face, Mistral, and OpenAI. Could you take a shot at refactoring it to remove the duplication? Maybe we can lift the instanceof check out and somehow pass in the MISTRAL_ERROR type field as it seems like those are the unique parts.

Could we do that in a separate PR?

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that in separate PR and share the link here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing the PR:
#128923

import java.util.Objects;
import java.util.Optional;

public class StreamingErrorResponse extends ErrorResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with an example error message that this would parse? Let's also add a TODO to note that ErrorMessageResponseEntity https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/response/ErrorMessageResponseEntity.java is nearly identical (doesn't parse as many fields) and we should remove the duplication

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


public MistralChatCompletionRequestEntity(UnifiedChatInput unifiedChatInput, MistralChatCompletionModel model) {
this.unifiedChatInput = Objects.requireNonNull(unifiedChatInput);
this.unifiedRequestEntity = new MistralUnifiedChatCompletionRequestEntity(unifiedChatInput);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import java.nio.charset.StandardCharsets;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few examples of what the error format looks like, or maybe just add them as tests. If you go the test route, just add a comment saying look at the tests for the expected error formats that we're aware of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the change. I also removed the unifiedChatInput for Mistral. While for OpenAI it was left unused along with other fields:


    private static final String MODEL_FIELD = "model";
    private static final String MAX_COMPLETION_TOKENS_FIELD = "max_completion_tokens";

    private final UnifiedChatInput unifiedChatInput;

I know that refactoring is heavily discouraged by CONTRIBUTING.md, but perhaps I could clean it up for OpenAI as well? Seems like a pretty easy fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I know that refactoring is heavily discouraged by CONTRIBUTING.md, but perhaps I could clean it up for OpenAI as well? Seems like a pretty easy fix.

Yeah feel free to remove those unused variables. I think that improves the quality of the code. If it's small changes like you mentioned, I think it's fine. In future PRs if you encounter this situation, I would just leave a github review comment saying that these variables weren't used so removing, or something like that to make it clear to the reviewing why they're being removed.

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for OpenAI refactoring is delivered!
BTW I meant to discuss this in thread for MistralRequestEntity. Misclicked.

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review June 2, 2025 00:01
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 2, 2025
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a few suggestions. I think this PR is in a good spot, just ping me when you have the other PR for the refactoring finished so we can merge that one first.

if (stream) {
fillStreamOptionsFields(builder);
// If request is streamed and skip stream options parameter is not true, include stream options in the request.
if (stream == true && params.paramAsBoolean(SKIP_STREAM_OPTIONS_PARAM, false) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about we reverse the naming, skip and false seems closer to a double negative to me so maybe:

if (stream && params.paramAsBoolean(INCLUDE_STREAM_OPTIONS_PARAM, true) == true) {

Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. Defaulting boolean to true allows us not to fill it out for every other provider. Took another look at the CONTRIBUTING.md. According to it we should use == check for boolean values only in case we're checking for "false". So I replaced it with:
stream && params.paramAsBoolean(INCLUDE_STREAM_OPTIONS_PARAM, true)
Also extended the javadoc for INCLUDE_STREAM_OPTIONS_PARAM

import java.io.IOException;
import java.util.ArrayList;

import static org.elasticsearch.xpack.inference.Utils.assertJsonEquals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where possible let's switch to using XContentHelper.stripWhitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import java.nio.charset.StandardCharsets;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I know that refactoring is heavily discouraged by CONTRIBUTING.md, but perhaps I could clean it up for OpenAI as well? Seems like a pretty easy fix.

Yeah feel free to remove those unused variables. I think that improves the quality of the code. If it's small changes like you mentioned, I think it's fine. In future PRs if you encounter this situation, I would just leave a github review comment saying that these variables weren't used so removing, or something like that to make it clear to the reviewing why they're being removed.

@nielsbauman nielsbauman removed the needs:triage Requires assignment of a team area label label Jun 4, 2025
@nielsbauman nielsbauman added :ml Machine learning Team:ML Meta label for the ML team labels Jun 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

…completion-integration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@jonathan-buttner jonathan-buttner self-assigned this Jun 4, 2025
@jonathan-buttner jonathan-buttner added v8.19.0 auto-backport Automatically create backport pull requests when merged >enhancement labels Jun 4, 2025
…completion-integration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@jonathan-buttner
Copy link
Contributor

Looking good, looks like CI is failing with:


REPRODUCE WITH: ./gradlew ":x-pack:plugin:inference:qa:inference-service-tests:javaRestTest" --tests "org.elasticsearch.xpack.inference.InferenceGetServicesIT.testGetServicesWithCompletionTaskType" -Dtests.seed=FB54236D6191C4C3 -Dtests.locale=dz-Tibt-BT -Dtests.timezone=America/Nassau -Druntime.java=24
--
  |  
  | InferenceGetServicesIT > testGetServicesWithCompletionTaskType FAILED
  | java.lang.AssertionError:
  | Expected: <13>
  | but: was <14>
  | at __randomizedtesting.SeedInfo.seed([FB54236D6191C4C3:78524A5E8D7A2437]:0)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  | at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2653)
  | at org.elasticsearch.xpack.inference.InferenceGetServicesIT.testGetServicesWithCompletionTaskType(InferenceGetServicesIT.java:137)


Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I did some testing and it looks good.

@jonathan-buttner jonathan-buttner merged commit 767d53f into elastic:main Jun 4, 2025
19 of 20 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128538

@Jan-Kazlouski-elastic
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

jonathan-buttner pushed a commit that referenced this pull request Jun 9, 2025
…28538) (#128947)

* Change versions for Mistral Chat Completion version

* Refactor model handling in MistralService to use instanceof for cleaner code

* Update Mistral Chat Completion 8.19 Version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants