-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Created a Server example #1025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a Server example #1025
Conversation
Please fix:
Also you shouldn't do such massive changes to main.cpp, otherwise this will never get merged and you'll keep fixing the conflicts forever. It's better to just add a server.cpp example that is independent from main.cpp (at the cost of code duplication). |
@prusnak Sorry I should've marked this as a draft, I meant to fix all that. |
@prusnak Could you help me with getting the Windows build (windows-latest-cmake) to work? I'm not sure how to get Boost to install there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add
LLAMA_BOOST
CMake option here which isOFF
by default:
- Make this example build only if
LLAMA_BOOST
isON
- Make a separate CI job just for
LLAMA_BOOST
. Only one OS is enough. We don't want to install / link boost in all CI jobs
@zrthxn Do we even need Boost? We are on C++11 and lots of Boost features are being moved to std::. Doesn't Crow work with plain C++11? |
Answering to myself - yes, it seems Crow still needs boost :-/ |
Wondering whether we shouldn't use a different C++ header-only HTTP(S) server library which does not require boost - such as https://github.com/yhirose/cpp-httplib |
@prusnak This lib does look much better. Boost comes with many drawbacks and hence the request to put this behind a CMake option We can either rework the PR to use the proposed lib, or merge it like this and later when someone implements an example using |
@ggerganov I think it won't be too hard to use another library. I'm only using very minimal functionality from Crow, and I only have 1 or 2 endpoints. So I'll rework this to use |
So is cpp-httplib |
@ggerganov @prusnak By the way, there is a slight issue that I've come across with serving the model. If an incoming request is cancelled, i.e. the client disconnects, the eval loop keeps running consuming CPU resources. My guess is that, at least with Crow, its because the endpoint handler that you write as a lambda expression is executed in a separate thread and that doesn't get stopped/killed when the client disconnects. |
Let's try how cpp-httplib deals with that. |
In C++ world there is no way to terminate a thread once it has started, except to join it (i.e. wait for it to finish). |
@ggerganov One way of implementing aborting could be that in the eval loop, before writing a token to the output stream (stdout or file stream), check if some special character or sequence like |
Hello everyone, I have a llama.cpp with cpp-httplib here. It doesn't require external dependencies. Limitations:
UsageGet Codegit clone https://github.com/FSSRepo/llama.cpp.git
cd llama.cpp Buildmkdir build
cd build
cmake ..
cmake --build . --config Release RunModel tested: Vicuna server -m ggml-vicuna-7b-q4_0.bin --keep -1 --ctx_size 2048 Node JS Test the endpointsYou need to have Node.js installed. mkdir llama-client
cd llama-client
npm init
npm install axios Create a index.js file and put inside this: const axios = require('axios');
async function Test() {
let result = await axios.post("http://127.0.0.1:8080/setting-context", {
context: [
{ role: "system", content: "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions." },
{ role: "user", content: "Hello, Assistant." },
{ role: "assistant", content: "Hello. How may I help you today?" },
{ role: "user", content: "Please tell me the largest city in Europe." },
{ role: "assistant", content: "Sure. The largest city in Europe is Moscow, the capital of Russia." }
],
batch_size: 64,
temperature: 0.2,
top_k: 40,
top_p: 0.9,
n_predict: 2048,
threads: 5
});
result = await axios.post("http://127.0.0.1:8080/set-message", {
message: ' What is linux?'
});
if(result.data.can_inference) {
result = await axios.get("http://127.0.0.1:8080/completion?stream=true", { responseType: 'stream' });
result.data.on('data', (data) => {
// token by token completion
let dat = JSON.parse(data.toString());
process.stdout.write(dat.content);
});
}
}
Test(); And run it: node . Sorry my bad english and practices in C++ :( |
@FSSRepo Hello, I tried running
and it gives errors in my mac mini M2 pro, the
works Can you help ? Thanks |
@x4080 you can detail the error in Issues tab on my fork, please |
I wanted to before this, there's no issue hehe, I'll do it now, see you there and thank you very much man |
@ggerganov I tried this API and I think I love it, I cant wait to get it integrated to llama.cpp @FSSRepo good work man |
Closing in favor of https://github.com/FSSRepo/llama.cpp |
@zrthxn so it won't be merged into llama cpp? |
Oh i thought that you have the same merge request as @FSSRepo My mistake then |
I've created an example which provides an HTTP interface to LLaMA using Crow. This comes as a single header file which I've committed. Also, this library depends on Boost, so to build this example, one needs to install Boost (for MacOS its
brew install boost
).