Skip to content

server: docs - refresh and tease a little bit more the http server #5718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 25, 2024

Conversation

phymbert
Copy link
Collaborator

Motivation

The server is under visibility on the homepage,
legit to see all our recent efforts highlighted 🥇

Although we are still under active development, our code base is fast, secured, lightweight, well tested
and deserves to be more promoted comparing to other llm inference servers mainly written in Go or Python.

Changes

Refresh server homepage, and add a link from the main page.

@phymbert phymbert requested review from ggerganov and ngxson February 25, 2024 18:38
@phymbert
Copy link
Collaborator Author

@ggerganov Happy to have your point of view on the matter, are we developping for playground or production systems ? :)

@ggerganov
Copy link
Member

Heh, I think it's a big responsibility to declare server to be "production ready". We can definitely give it more visibility in the docs though

are we developping for playground or production systems ?

The main goal of the examples is to demonstrate usage of the llama.cpp library. I consider the library to be usable for production systems, while the examples around it - not so much.

But given enough interest and help from the community, server can become more than an example. The implementation of server has been almost entirely contributed and it feels a bit alien to me - i.e. I'm not super comfortable adding new stuff to it. That makes it difficult for me to set a more detailed roadmap. But I think there are already many good points in #4216 that we can try to follow.

In any case, with the new test framework that we now have, I think the implementation could be rapidly improved 👍

@phymbert
Copy link
Collaborator Author

phymbert commented Feb 25, 2024

Heh, I think it's a big responsibility to declare server to be "production ready". We can definitely give it more visibility in the docs though

are we developping for playground or production systems ?

The main goal of the examples is to demonstrate usage of the llama.cpp library. I consider the library to be usable for production systems, while the examples around it - not so much.

But given enough interest and help from the community, server can become more than an example. The implementation of server has been almost entirely contributed and it feels a bit alien to me - i.e. I'm not super comfortable adding new stuff to it. That makes it difficult for me to set a more detailed roadmap. But I think there are already many good points in #4216 that we can try to follow.

In any case, with the new test framework that we now have, I think the implementation could be rapidly improved 👍

Thanks, I understand and agree with your position. The docs hopefully does not state (yet?) it's production ready. I am happy to help following that goal: server becomes more than an example.

Opening the PR for review then

@phymbert phymbert marked this pull request as ready for review February 25, 2024 19:43
Co-authored-by: Georgi Gerganov <[email protected]>
@ngxson
Copy link
Collaborator

ngxson commented Feb 25, 2024

I join the same opinion with @ggerganov , that by declaring server example to be production-ready, we also take more responsibility (than before) to maintain it. To me, I prefer to let it be a "cutting-edge" example that we can easily implement latest experimental features that third party products can learn from. Two of such products that I know are janhq/nitro and ollama.

Personally, I don't even use the server example on either any of my project or either at work (my job is in cybersecurity sector, and I'm just learning about NLP for fun). However, I have a personal project for my petite artistic blog that benefits from the work on server example, not directly use it but I did re-use some of the logics here.

FYI, my personal project is an imaginary character that users can chat with. The problem was that the hardware that it's running on is so slow (it can evaluate merely 5 tokens per second), I need to rely heavy KV cache management to make it usable. That's why just using server example without any modifications is impossible for me.

@phymbert
Copy link
Collaborator Author

I join the same opinion with @ggerganov , that by declaring server example to be production-ready, we also take more responsibility (than before) to maintain it. To me, I prefer to let it be a "cutting-edge" example that we can easily implement latest experimental features that third party products can learn from. Two of such products that I know are janhq/nitro and ollama.

Personally, I don't even use the server example on either any of my project or either at work (my job is in cybersecurity sector, and I'm just learning about NLP for fun). However, I have a personal project for my petite artistic blog that benefits from the work on server example, not directly use it but I did re-use some of the logics here.

FYI, my personal project is an imaginary character that users can chat with. The problem was that the hardware that it's running on is so slow (it can evaluate merely 5 tokens per second), I need to rely heavy KV cache management to make it usable. That's why just using server example without any modifications is impossible for me.

Let's be clear, I have not declared that the server is production ready. But asking if I am contributing for a playground or for real users. Software without production users is out of interest IMHO.

I am not convinced about ollama.com as I have the feeling they will quickly switch to a paid/SAS version. I was not aware of nitro, I will give it a try, thanks. Probably, vLLM is also a serious challenger.

Thanks for your feedback, let's merge this small doc improvment.

@phymbert phymbert merged commit 8b35035 into master Feb 25, 2024
@phymbert phymbert deleted the doc/server-refresh-documentation branch February 25, 2024 20:46
@ngxson
Copy link
Collaborator

ngxson commented Feb 25, 2024

I think it's depends on how you define who is your user. For the moment, our "real" users are mostly developers who adapt the code from our examples into their products. Therefore, I believe that the examples play a big role too, even if it will never become production-ready.

Anw, the contributions that you made recently is impressive personally say, I'd very appreciate. Keep it up man!

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
…gml-org#5718)

* server: docs - refresh and tease a little bit more the http server

* Rephrase README.md server doc

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update README.md

---------

Co-authored-by: Georgi Gerganov <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…gml-org#5718)

* server: docs - refresh and tease a little bit more the http server

* Rephrase README.md server doc

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update README.md

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants