Skip to content

Commit 3851d6b

Browse files
ggerganovhazelnutcloud
authored andcommitted
server : remove api_like_OAI.py proxy script (ggml-org#5808)
1 parent dba99b0 commit 3851d6b

File tree

3 files changed

+3
-243
lines changed

3 files changed

+3
-243
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
1010

1111
### Hot topics
1212

13+
- The `api_like_OAI.py` script has been removed - use `server` instead ([#5766](https://github.com/ggerganov/llama.cpp/issues/5766#issuecomment-1969037761))
1314
- Support for chat templates: [Uncyclo (contributions welcome)](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
1415
- Support for Gemma models: https://github.com/ggerganov/llama.cpp/pull/5631
1516
- Non-linear quantization IQ4_NL: https://github.com/ggerganov/llama.cpp/pull/5590

examples/server/README.md

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ Notice that each `probs` is an array of length `n_probs`.
326326
- `default_generation_settings` - the default generation settings for the `/completion` endpoint, has the same fields as the `generation_settings` response object from the `/completion` endpoint.
327327
- `total_slots` - the total number of slots for process requests (defined by `--parallel` option)
328328

329-
- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint. Compared to `api_like_OAI.py` this API implementation does not require a wrapper to be served.
329+
- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint.
330330

331331
*Options:*
332332

@@ -528,20 +528,7 @@ bash chat.sh
528528

529529
### API like OAI
530530

531-
API example using Python Flask: [api_like_OAI.py](api_like_OAI.py)
532-
This example must be used with server.cpp
533-
534-
```sh
535-
python api_like_OAI.py
536-
```
537-
538-
After running the API server, you can use it in Python by setting the API base URL.
539-
540-
```python
541-
openai.api_base = "http://<Your api-server IP>:port"
542-
```
543-
544-
Then you can utilize llama.cpp as an OpenAI's **chat.completion** or **text_completion** API
531+
The HTTP server supports OAI-like API
545532

546533
### Extending or building alternative Web Front End
547534

examples/server/api_like_OAI.py

Lines changed: 0 additions & 228 deletions
This file was deleted.

0 commit comments

Comments
 (0)