server: docs - refresh and tease a little bit more the http server #5718

phymbert · 2024-02-25T18:38:40Z

Motivation

The server is under visibility on the homepage,
legit to see all our recent efforts highlighted 🥇

Although we are still under active development, our code base is fast, secured, lightweight, well tested
and deserves to be more promoted comparing to other llm inference servers mainly written in Go or Python.

Changes

Refresh server homepage, and add a link from the main page.

phymbert · 2024-02-25T18:40:08Z

@ggerganov Happy to have your point of view on the matter, are we developping for playground or production systems ? :)

ggerganov · 2024-02-25T19:24:39Z

Heh, I think it's a big responsibility to declare server to be "production ready". We can definitely give it more visibility in the docs though

are we developping for playground or production systems ?

The main goal of the examples is to demonstrate usage of the llama.cpp library. I consider the library to be usable for production systems, while the examples around it - not so much.

But given enough interest and help from the community, server can become more than an example. The implementation of server has been almost entirely contributed and it feels a bit alien to me - i.e. I'm not super comfortable adding new stuff to it. That makes it difficult for me to set a more detailed roadmap. But I think there are already many good points in #4216 that we can try to follow.

In any case, with the new test framework that we now have, I think the implementation could be rapidly improved 👍

phymbert · 2024-02-25T19:43:10Z

Heh, I think it's a big responsibility to declare server to be "production ready". We can definitely give it more visibility in the docs though

are we developping for playground or production systems ?

The main goal of the examples is to demonstrate usage of the llama.cpp library. I consider the library to be usable for production systems, while the examples around it - not so much.

But given enough interest and help from the community, server can become more than an example. The implementation of server has been almost entirely contributed and it feels a bit alien to me - i.e. I'm not super comfortable adding new stuff to it. That makes it difficult for me to set a more detailed roadmap. But I think there are already many good points in #4216 that we can try to follow.

In any case, with the new test framework that we now have, I think the implementation could be rapidly improved 👍

Thanks, I understand and agree with your position. The docs hopefully does not state (yet?) it's production ready. I am happy to help following that goal: server becomes more than an example.

Opening the PR for review then

README.md

examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

ngxson · 2024-02-25T20:23:41Z

I join the same opinion with @ggerganov , that by declaring server example to be production-ready, we also take more responsibility (than before) to maintain it. To me, I prefer to let it be a "cutting-edge" example that we can easily implement latest experimental features that third party products can learn from. Two of such products that I know are janhq/nitro and ollama.

Personally, I don't even use the server example on either any of my project or either at work (my job is in cybersecurity sector, and I'm just learning about NLP for fun). However, I have a personal project for my petite artistic blog that benefits from the work on server example, not directly use it but I did re-use some of the logics here.

FYI, my personal project is an imaginary character that users can chat with. The problem was that the hardware that it's running on is so slow (it can evaluate merely 5 tokens per second), I need to rely heavy KV cache management to make it usable. That's why just using server example without any modifications is impossible for me.

Co-authored-by: Georgi Gerganov <[email protected]>

phymbert · 2024-02-25T20:45:59Z

I join the same opinion with @ggerganov , that by declaring server example to be production-ready, we also take more responsibility (than before) to maintain it. To me, I prefer to let it be a "cutting-edge" example that we can easily implement latest experimental features that third party products can learn from. Two of such products that I know are janhq/nitro and ollama.

Personally, I don't even use the server example on either any of my project or either at work (my job is in cybersecurity sector, and I'm just learning about NLP for fun). However, I have a personal project for my petite artistic blog that benefits from the work on server example, not directly use it but I did re-use some of the logics here.

FYI, my personal project is an imaginary character that users can chat with. The problem was that the hardware that it's running on is so slow (it can evaluate merely 5 tokens per second), I need to rely heavy KV cache management to make it usable. That's why just using server example without any modifications is impossible for me.

Let's be clear, I have not declared that the server is production ready. But asking if I am contributing for a playground or for real users. Software without production users is out of interest IMHO.

I am not convinced about ollama.com as I have the feeling they will quickly switch to a paid/SAS version. I was not aware of nitro, I will give it a try, thanks. Probably, vLLM is also a serious challenger.

Thanks for your feedback, let's merge this small doc improvment.

ngxson · 2024-02-25T21:01:19Z

I think it's depends on how you define who is your user. For the moment, our "real" users are mostly developers who adapt the code from our examples into their products. Therefore, I believe that the examples play a big role too, even if it will never become production-ready.

Anw, the contributions that you made recently is impressive personally say, I'd very appreciate. Keep it up man!

…gml-org#5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>

server: docs - refresh and tease a little bit more the http server

18239fa

phymbert requested review from ggerganov and ngxson February 25, 2024 18:38

phymbert marked this pull request as ready for review February 25, 2024 19:43

ggerganov approved these changes Feb 25, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

examples/server/README.md Show resolved Hide resolved

examples/server/README.md Outdated Show resolved Hide resolved

examples/server/README.md Outdated Show resolved Hide resolved

Rephrase README.md server doc

1294deb

Co-authored-by: Georgi Gerganov <[email protected]>

phymbert and others added 3 commits February 25, 2024 21:43

Update examples/server/README.md

42d781e

Co-authored-by: Georgi Gerganov <[email protected]>

Update examples/server/README.md

e647ed4

Co-authored-by: Georgi Gerganov <[email protected]>

Update README.md

69e8e66

ngxson approved these changes Feb 25, 2024

View reviewed changes

phymbert merged commit 8b35035 into master Feb 25, 2024

phymbert deleted the doc/server-refresh-documentation branch February 25, 2024 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: docs - refresh and tease a little bit more the http server #5718

server: docs - refresh and tease a little bit more the http server #5718

Uh oh!

phymbert commented Feb 25, 2024

Uh oh!

phymbert commented Feb 25, 2024

Uh oh!

ggerganov commented Feb 25, 2024

Uh oh!

phymbert commented Feb 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Feb 25, 2024 •

edited

Loading

Uh oh!

phymbert commented Feb 25, 2024

Uh oh!

ngxson commented Feb 25, 2024

Uh oh!

Uh oh!

server: docs - refresh and tease a little bit more the http server #5718

server: docs - refresh and tease a little bit more the http server #5718

Uh oh!

Conversation

phymbert commented Feb 25, 2024

Uh oh!

phymbert commented Feb 25, 2024

Uh oh!

ggerganov commented Feb 25, 2024

Uh oh!

phymbert commented Feb 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Feb 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phymbert commented Feb 25, 2024

Uh oh!

ngxson commented Feb 25, 2024

Uh oh!

Uh oh!

phymbert commented Feb 25, 2024 •

edited

Loading

ngxson commented Feb 25, 2024 •

edited

Loading