Skip to content

Commit a99d38b

Browse files
committed
Add docs
1 parent 1158fee commit a99d38b

File tree

4 files changed

+42
-10
lines changed

4 files changed

+42
-10
lines changed

README.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,14 +197,47 @@ starting from 23.10 release.
197197

198198
You can use `pip install ...` within the container to upgrade vLLM version.
199199

200-
201200
## Running Multiple Instances of Triton Server
202201

203202
If you are running multiple instances of Triton server with a Python-based backend,
204203
you need to specify a different `shm-region-prefix-name` for each server. See
205204
[here](https://github.com/triton-inference-server/python_backend#running-multiple-instances-of-triton-server)
206205
for more information.
207206

207+
## Triton Metrics
208+
Starting with the 24.08 release of Triton, users can now obtain partial
209+
vLLM metrics by querying the Triton metrics endpoint (see complete vLLM metrics
210+
[here](https://docs.vllm.ai/en/latest/serving/metrics.html)). This can be
211+
accomplished by launching a Triton server in any of the ways described above
212+
(ensuring the build code / container is 24.08 or later) and querying the server.
213+
Upon receiving a successful response, you can query the metrics endpoint by entering
214+
the following:
215+
```bash
216+
curl localhost:8002/metrics
217+
```
218+
VLLM stats are reported by the metrics endpoint in fields that
219+
are prefixed with `vllm:`. Your output for these fields should look
220+
similar to the following:
221+
```bash
222+
# HELP vllm:prompt_tokens_total Number of prefill tokens processed.
223+
# TYPE vllm:prompt_tokens_total counter
224+
vllm:prompt_tokens_total{model="vllm_model",version="1"} 10
225+
# HELP vllm:generation_tokens_total Number of generation tokens processed.
226+
# TYPE vllm:generation_tokens_total counter
227+
vllm:generation_tokens_total{model="vllm_model",version="1"} 16
228+
```
229+
*Note:* The vLLM metrics reporting is disabled by default due to potential
230+
performance slowdowns. To enable vLLM model's metrics reporting, please add
231+
following lines to its config.pbtxt.
232+
```bash
233+
parameters: {
234+
key: "REPORT_CUSTOM_METRICS"
235+
value: {
236+
string_value:"yes"
237+
}
238+
}
239+
```
240+
208241
## Referencing the Tutorial
209242

210243
You can read further in the

ci/L0_backend_vllm/metrics_test/test.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ sed -i 's/"gpu_memory_utilization": 0.5/"gpu_memory_utilization": 0.4/' models/v
4949

5050
RET=0
5151

52-
# Test disabling vLLM metrics reporting without parameter "REPORT_METRICS" in config.pbtxt
52+
# Test disabling vLLM metrics reporting without parameter "REPORT_CUSTOM_METRICS" in config.pbtxt
5353
run_server
5454
if [ "$SERVER_PID" == "0" ]; then
5555
cat $SERVER_LOG
@@ -77,10 +77,10 @@ set -e
7777
kill $SERVER_PID
7878
wait $SERVER_PID
7979

80-
# Test disabling vLLM metrics reporting with parameter "REPORT_METRICS" set to "no" in config.pbtxt
80+
# Test disabling vLLM metrics reporting with parameter "REPORT_CUSTOM_METRICS" set to "no" in config.pbtxt
8181
echo -e "
8282
parameters: {
83-
key: \"REPORT_METRICS\"
83+
key: \"REPORT_CUSTOM_METRICS\"
8484
value: {
8585
string_value:\"no\"
8686
}
@@ -114,11 +114,11 @@ set -e
114114
kill $SERVER_PID
115115
wait $SERVER_PID
116116

117-
# Test vLLM metrics reporting with parameter "REPORT_METRICS" set to "yes" in config.pbtxt
117+
# Test vLLM metrics reporting with parameter "REPORT_CUSTOM_METRICS" set to "yes" in config.pbtxt
118118
cp ${SAMPLE_MODELS_REPO}/vllm_model/config.pbtxt models/vllm_opt
119119
echo -e "
120120
parameters: {
121-
key: \"REPORT_METRICS\"
121+
key: \"REPORT_CUSTOM_METRICS\"
122122
value: {
123123
string_value:\"yes\"
124124
}

samples/model_repository/vllm_model/1/model.json

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,5 @@
22
"model":"facebook/opt-125m",
33
"disable_log_requests": true,
44
"gpu_memory_utilization": 0.5,
5-
"enforce_eager": true,
6-
"disable_log_stats": false
5+
"enforce_eager": true
76
}

src/model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,8 @@ def init_engine(self):
162162

163163
# Create vLLM custom metrics
164164
if (
165-
"REPORT_METRICS" in self.model_config["parameters"]
166-
and self.model_config["parameters"]["REPORT_METRICS"]["string_value"]
165+
"REPORT_CUSTOM_METRICS" in self.model_config["parameters"]
166+
and self.model_config["parameters"]["REPORT_CUSTOM_METRICS"]["string_value"]
167167
== "yes"
168168
):
169169
try:

0 commit comments

Comments
 (0)