-
Notifications
You must be signed in to change notification settings - Fork 250
Merge Openai api version route to main #1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1021
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New FailuresAs of commit 716485c with merge base f384d4f ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
README.md
Outdated
@@ -200,6 +200,10 @@ streamlit run torchchat.py -- browser llama3.1 | |||
<details> | |||
<summary>This mode gives a REST API that matches the OpenAI API spec for interacting with a model</summary> | |||
|
|||
The server follows the [OpenAI API specification](https://platform.openai.com/docs/api-reference/chat) for chat completions. | |||
Since this feature is under active development, it's possible not every parameter is consumed. See api/api.py for details on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this feature is under active development, it's possible not every parameter is consumed. See api/api.py for details on | |
Since this feature is under active development, not every parameter is consumed. See api/api.py for details on |
@@ -270,7 +271,13 @@ def chunked_completion(self, completion_request: CompletionRequest): | |||
) | |||
generator_args = GeneratorArgs( | |||
completion_request.messages[-1].get("content"), | |||
max_new_tokens=( | |||
int(completion_request.max_tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request object was being populated from JSON, which automatically means the value for every field is str despite the dataclass type hints.
These casts were to get things working, but yes - this is a bad solution and we should have better type enforcement here (and possibly repo-wide)
Based on what I can find, we can resolve this by casting the types at the object's init time for each field (kinda tedious) or by using something like pydantic (yet another package to manage).
Should we attack this in another issue/PR or try to resolve it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting at init time makes sense, since downstream users shouldn't need to think about it
We can make it a separate PR or push the casting to init just for max_tokens and temperature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #1025 to track this, will merge this PR for now.
encoded_prompt=encoded, | ||
temperature=float(completion_request.temperature), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous re: typecasting for dataclasses.
01bc092
to
d1a0da8
Compare
#1016 mistakenly got merged into this development branch instead of main.