-
Notifications
You must be signed in to change notification settings - Fork 12.2k
server: docs - refresh and tease a little bit more the http server #5718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ggerganov Happy to have your point of view on the matter, are we developping for playground or production systems ? :) |
Heh, I think it's a big responsibility to declare
The main goal of the examples is to demonstrate usage of the But given enough interest and help from the community, In any case, with the new test framework that we now have, I think the implementation could be rapidly improved 👍 |
Thanks, I understand and agree with your position. The docs hopefully does not state (yet?) it's production ready. I am happy to help following that goal: Opening the PR for review then |
Co-authored-by: Georgi Gerganov <[email protected]>
I join the same opinion with @ggerganov , that by declaring Personally, I don't even use the FYI, my personal project is an imaginary character that users can chat with. The problem was that the hardware that it's running on is so slow (it can evaluate merely 5 tokens per second), I need to rely heavy KV cache management to make it usable. That's why just using server example without any modifications is impossible for me. |
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Let's be clear, I have not declared that the server is production ready. But asking if I am contributing for a playground or for real users. Software without production users is out of interest IMHO. I am not convinced about ollama.com as I have the feeling they will quickly switch to a paid/SAS version. I was not aware of nitro, I will give it a try, thanks. Probably, vLLM is also a serious challenger. Thanks for your feedback, let's merge this small doc improvment. |
I think it's depends on how you define who is your user. For the moment, our "real" users are mostly developers who adapt the code from our examples into their products. Therefore, I believe that the examples play a big role too, even if it will never become production-ready. Anw, the contributions that you made recently is impressive personally say, I'd very appreciate. Keep it up man! |
…gml-org#5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>
…gml-org#5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>
Motivation
The server is under visibility on the homepage,
legit to see all our recent efforts highlighted 🥇
Although we are still under active development, our code base is fast, secured, lightweight, well tested
and deserves to be more promoted comparing to other llm inference servers mainly written in Go or Python.
Changes
Refresh server homepage, and add a link from the main page.