Skip to content

Commit e474ef1

Browse files
author
Olivier Chafik
committed
update llama-rpc-server bin name + doc
1 parent ee3a086 commit e474ef1

File tree

3 files changed

+20
-23
lines changed

3 files changed

+20
-23
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ ifeq ($(UNAME_S),Darwin)
106106
endif
107107

108108
ifdef LLAMA_RPC
109-
BUILD_TARGETS += rpc-server
109+
BUILD_TARGETS += llama-rpc-server
110110
endif
111111

112112
default: $(BUILD_TARGETS)
@@ -699,7 +699,7 @@ ggml-rpc.o: ggml-rpc.cpp ggml-rpc.h
699699
rpc-server.o: examples/rpc/rpc-server.cpp ggml-rpc.h
700700
$(CXX) $(CXXFLAGS) -c $< -o $@
701701

702-
rpc-server: rpc-server.o ggml.o llama.o $(COMMON_DEPS) $(OBJS)
702+
llama-rpc-server: rpc-server.o ggml.o llama.o $(COMMON_DEPS) $(OBJS)
703703
$(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS)
704704
endif # LLAMA_RPC
705705

examples/rpc/CMakeLists.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
add_executable(rpc-server rpc-server.cpp)
2-
target_link_libraries(rpc-server PRIVATE ggml llama)
1+
set(TARGET llama-rpc-server)
2+
add_executable(${TARGET} rpc-server.cpp)
3+
target_link_libraries(${TARGET} PRIVATE ggml llama)

examples/rpc/README.md

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## Overview
22

3-
The `rpc-server` allows running `ggml` backend on a remote host.
4-
The RPC backend communicates with one or several instances of `rpc-server` and offloads computations to them.
3+
`llama-rpc-server` allows running `ggml` backend on a remote host.
4+
The RPC backend communicates with one or several instances of `llama-rpc-server` and offloads computations to them.
55
This can be used for distributed LLM inference with `llama.cpp` in the following way:
66

77
```mermaid
@@ -10,13 +10,13 @@ flowchart TD
1010
rpcb---|TCP|srvb
1111
rpcb-.-|TCP|srvn
1212
subgraph hostn[Host N]
13-
srvn[rpc-server]-.-backend3["Backend (CUDA,Metal,etc.)"]
13+
srvn[llama-rpc-server]-.-backend3["Backend (CUDA,Metal,etc.)"]
1414
end
1515
subgraph hostb[Host B]
16-
srvb[rpc-server]---backend2["Backend (CUDA,Metal,etc.)"]
16+
srvb[llama-rpc-server]---backend2["Backend (CUDA,Metal,etc.)"]
1717
end
1818
subgraph hosta[Host A]
19-
srva[rpc-server]---backend["Backend (CUDA,Metal,etc.)"]
19+
srva[llama-rpc-server]---backend["Backend (CUDA,Metal,etc.)"]
2020
end
2121
subgraph host[Main Host]
2222
ggml[llama.cpp]---rpcb[RPC backend]
@@ -25,24 +25,22 @@ flowchart TD
2525
```
2626

2727
Each host can run a different backend, e.g. one with CUDA and another with Metal.
28-
You can also run multiple `rpc-server` instances on the same host, each with a different backend.
28+
You can also run multiple `llama-rpc-server` instances on the same host, each with a different backend.
2929

3030
## Usage
3131

3232
On each host, build the corresponding backend with `cmake` and add `-DLLAMA_RPC=ON` to the build options.
3333
For example, to build the CUDA backend with RPC support:
3434

3535
```bash
36-
mkdir build-rpc-cuda
37-
cd build-rpc-cuda
38-
cmake .. -DLLAMA_CUDA=ON -DLLAMA_RPC=ON
39-
cmake --build . --config Release
36+
cmake -B build-rpc-cuda -DLLAMA_CUDA=ON -DLLAMA_RPC=ON
37+
cmake --build build-rpc-cuda --config Release
4038
```
4139

42-
Then, start the `rpc-server` with the backend:
40+
Then, start `llama-rpc-server` with the backend:
4341

4442
```bash
45-
$ bin/rpc-server -p 50052
43+
$ bin/llama-rpc-server -p 50052
4644
create_backend: using CUDA backend
4745
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
4846
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
@@ -53,21 +51,19 @@ Starting RPC server on 0.0.0.0:50052
5351

5452
When using the CUDA backend, you can specify the device with the `CUDA_VISIBLE_DEVICES` environment variable, e.g.:
5553
```bash
56-
$ CUDA_VISIBLE_DEVICES=0 bin/rpc-server -p 50052
54+
$ CUDA_VISIBLE_DEVICES=0 bin/llama-rpc-server -p 50052
5755
```
58-
This way you can run multiple `rpc-server` instances on the same host, each with a different CUDA device.
56+
This way you can run multiple `llama-rpc-server` instances on the same host, each with a different CUDA device.
5957

6058

6159
On the main host build `llama.cpp` only with `-DLLAMA_RPC=ON`:
6260

6361
```bash
64-
mkdir build-rpc
65-
cd build-rpc
66-
cmake .. -DLLAMA_RPC=ON
67-
cmake --build . --config Release
62+
cmake -B build-rpc -DLLAMA_RPC=ON
63+
cmake --build build-rpc --config Release -t -j
6864
```
6965

70-
Finally, use the `--rpc` option to specify the host and port of each `rpc-server`:
66+
Finally, use the `--rpc` option to specify the host and port of each `llama-rpc-server`:
7167

7268
```bash
7369
$ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name is" --repeat-penalty 1.0 -n 64 --rpc 192.168.88.10:50052,192.168.88.11:50052 -ngl 99

0 commit comments

Comments
 (0)