Skip to content

Commit 3829366

Browse files
authored
feat: Add vLLM counter metrics access through Triton (#53)
Report vLLM counter metrics through Triton server
1 parent 843cbdd commit 3829366

File tree

8 files changed

+572
-9
lines changed

8 files changed

+572
-9
lines changed

README.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,8 @@ container with the following commands:
111111

112112
```
113113
mkdir -p /opt/tritonserver/backends/vllm
114-
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py
114+
git clone https://github.com/triton-inference-server/vllm_backend.git /tmp/vllm_backend
115+
cp -r /tmp/vllm_backend/src/* /opt/tritonserver/backends/vllm
115116
```
116117

117118
## Using the vLLM Backend
@@ -194,14 +195,52 @@ starting from 23.10 release.
194195

195196
You can use `pip install ...` within the container to upgrade vLLM version.
196197

197-
198198
## Running Multiple Instances of Triton Server
199199

200200
If you are running multiple instances of Triton server with a Python-based backend,
201201
you need to specify a different `shm-region-prefix-name` for each server. See
202202
[here](https://github.com/triton-inference-server/python_backend#running-multiple-instances-of-triton-server)
203203
for more information.
204204

205+
## Triton Metrics
206+
Starting with the 24.08 release of Triton, users can now obtain partial
207+
vLLM metrics by querying the Triton metrics endpoint (see complete vLLM metrics
208+
[here](https://docs.vllm.ai/en/latest/serving/metrics.html)). This can be
209+
accomplished by launching a Triton server in any of the ways described above
210+
(ensuring the build code / container is 24.08 or later) and querying the server.
211+
Upon receiving a successful response, you can query the metrics endpoint by entering
212+
the following:
213+
```bash
214+
curl localhost:8002/metrics
215+
```
216+
VLLM stats are reported by the metrics endpoint in fields that
217+
are prefixed with `vllm:`. Your output for these fields should look
218+
similar to the following:
219+
```bash
220+
# HELP vllm:prompt_tokens_total Number of prefill tokens processed.
221+
# TYPE vllm:prompt_tokens_total counter
222+
vllm:prompt_tokens_total{model="vllm_model",version="1"} 10
223+
# HELP vllm:generation_tokens_total Number of generation tokens processed.
224+
# TYPE vllm:generation_tokens_total counter
225+
vllm:generation_tokens_total{model="vllm_model",version="1"} 16
226+
```
227+
To enable vLLM engine colleting metrics, "disable_log_stats" option need to be either false
228+
or left empty (false by default) in [model.json](https://github.com/triton-inference-server/vllm_backend/blob/main/samples/model_repository/vllm_model/1/model.json).
229+
```bash
230+
"disable_log_stats": false
231+
```
232+
*Note:* vLLM metrics are not reported to Triton metrics server by default
233+
due to potential performance slowdowns. To enable vLLM model's metrics
234+
reporting, please add following lines to its config.pbtxt as well.
235+
```bash
236+
parameters: {
237+
key: "REPORT_CUSTOM_METRICS"
238+
value: {
239+
string_value:"yes"
240+
}
241+
}
242+
```
243+
205244
## Referencing the Tutorial
206245

207246
You can read further in the
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
#!/bin/bash
2+
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
28+
source ../../common/util.sh
29+
30+
TRITON_DIR=${TRITON_DIR:="/opt/tritonserver"}
31+
SERVER=${TRITON_DIR}/bin/tritonserver
32+
BACKEND_DIR=${TRITON_DIR}/backends
33+
SERVER_ARGS="--model-repository=$(pwd)/models --backend-directory=${BACKEND_DIR} --model-control-mode=explicit --load-model=vllm_opt --log-verbose=1"
34+
SERVER_LOG="./vllm_metrics_server.log"
35+
CLIENT_LOG="./vllm_metrics_client.log"
36+
TEST_RESULT_FILE='test_results.txt'
37+
CLIENT_PY="./vllm_metrics_test.py"
38+
SAMPLE_MODELS_REPO="../../../samples/model_repository"
39+
EXPECTED_NUM_TESTS=1
40+
41+
# Helpers =======================================
42+
function copy_model_repository {
43+
rm -rf models && mkdir -p models
44+
cp -r ${SAMPLE_MODELS_REPO}/vllm_model models/vllm_opt
45+
# `vllm_opt` model will be loaded on server start and stay loaded throughout
46+
# unittesting. To ensure that vllm's memory profiler will not error out
47+
# on `vllm_load_test` load, we reduce "gpu_memory_utilization" for `vllm_opt`,
48+
# so that at least 60% of GPU memory was available for other models.
49+
sed -i 's/"gpu_memory_utilization": 0.5/"gpu_memory_utilization": 0.4/' models/vllm_opt/1/model.json
50+
}
51+
52+
RET=0
53+
54+
# Test disabling vLLM metrics reporting without parameter "REPORT_CUSTOM_METRICS" in config.pbtxt
55+
copy_model_repository
56+
run_server
57+
if [ "$SERVER_PID" == "0" ]; then
58+
cat $SERVER_LOG
59+
echo -e "\n***\n*** Failed to start $SERVER\n***"
60+
exit 1
61+
fi
62+
63+
set +e
64+
python3 $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled -v > $CLIENT_LOG 2>&1
65+
66+
if [ $? -ne 0 ]; then
67+
cat $CLIENT_LOG
68+
echo -e "\n***\n*** Running $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled FAILED. \n***"
69+
RET=1
70+
else
71+
check_test_results $TEST_RESULT_FILE $EXPECTED_NUM_TESTS
72+
if [ $? -ne 0 ]; then
73+
cat $CLIENT_LOG
74+
echo -e "\n***\n*** Test Result Verification FAILED.\n***"
75+
RET=1
76+
fi
77+
fi
78+
set -e
79+
80+
kill $SERVER_PID
81+
wait $SERVER_PID
82+
83+
# Test disabling vLLM metrics reporting with parameter "REPORT_CUSTOM_METRICS" set to "no" in config.pbtxt
84+
copy_model_repository
85+
echo -e "
86+
parameters: {
87+
key: \"REPORT_CUSTOM_METRICS\"
88+
value: {
89+
string_value:\"no\"
90+
}
91+
}
92+
" >> models/vllm_opt/config.pbtxt
93+
94+
run_server
95+
if [ "$SERVER_PID" == "0" ]; then
96+
cat $SERVER_LOG
97+
echo -e "\n***\n*** Failed to start $SERVER\n***"
98+
exit 1
99+
fi
100+
101+
set +e
102+
python3 $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled -v > $CLIENT_LOG 2>&1
103+
104+
if [ $? -ne 0 ]; then
105+
cat $CLIENT_LOG
106+
echo -e "\n***\n*** Running $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled FAILED. \n***"
107+
RET=1
108+
else
109+
check_test_results $TEST_RESULT_FILE $EXPECTED_NUM_TESTS
110+
if [ $? -ne 0 ]; then
111+
cat $CLIENT_LOG
112+
echo -e "\n***\n*** Test Result Verification FAILED.\n***"
113+
RET=1
114+
fi
115+
fi
116+
set -e
117+
118+
kill $SERVER_PID
119+
wait $SERVER_PID
120+
121+
# Test vLLM metrics reporting with parameter "REPORT_CUSTOM_METRICS" set to "yes" in config.pbtxt
122+
copy_model_repository
123+
cp ${SAMPLE_MODELS_REPO}/vllm_model/config.pbtxt models/vllm_opt
124+
echo -e "
125+
parameters: {
126+
key: \"REPORT_CUSTOM_METRICS\"
127+
value: {
128+
string_value:\"yes\"
129+
}
130+
}
131+
" >> models/vllm_opt/config.pbtxt
132+
133+
run_server
134+
if [ "$SERVER_PID" == "0" ]; then
135+
cat $SERVER_LOG
136+
echo -e "\n***\n*** Failed to start $SERVER\n***"
137+
exit 1
138+
fi
139+
140+
set +e
141+
python3 $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics -v > $CLIENT_LOG 2>&1
142+
143+
if [ $? -ne 0 ]; then
144+
cat $CLIENT_LOG
145+
echo -e "\n***\n*** Running $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics FAILED. \n***"
146+
RET=1
147+
else
148+
check_test_results $TEST_RESULT_FILE $EXPECTED_NUM_TESTS
149+
if [ $? -ne 0 ]; then
150+
cat $CLIENT_LOG
151+
echo -e "\n***\n*** Test Result Verification FAILED.\n***"
152+
RET=1
153+
fi
154+
fi
155+
set -e
156+
157+
kill $SERVER_PID
158+
wait $SERVER_PID
159+
160+
# Test enabling vLLM metrics reporting in config.pbtxt but disabling in model.json
161+
copy_model_repository
162+
jq '. += {"disable_log_stats" : true}' models/vllm_opt/1/model.json > "temp.json"
163+
mv temp.json models/vllm_opt/1/model.json
164+
echo -e "
165+
parameters: {
166+
key: \"REPORT_CUSTOM_METRICS\"
167+
value: {
168+
string_value:\"yes\"
169+
}
170+
}
171+
" >> models/vllm_opt/config.pbtxt
172+
173+
run_server
174+
if [ "$SERVER_PID" == "0" ]; then
175+
cat $SERVER_LOG
176+
echo -e "\n***\n*** Failed to start $SERVER\n***"
177+
exit 1
178+
fi
179+
180+
set +e
181+
python3 $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled -v > $CLIENT_LOG 2>&1
182+
183+
if [ $? -ne 0 ]; then
184+
cat $CLIENT_LOG
185+
echo -e "\n***\n*** Running $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_disabled FAILED. \n***"
186+
RET=1
187+
else
188+
check_test_results $TEST_RESULT_FILE $EXPECTED_NUM_TESTS
189+
if [ $? -ne 0 ]; then
190+
cat $CLIENT_LOG
191+
echo -e "\n***\n*** Test Result Verification FAILED.\n***"
192+
RET=1
193+
fi
194+
fi
195+
set -e
196+
197+
kill $SERVER_PID
198+
wait $SERVER_PID
199+
200+
# Test enabling vLLM metrics reporting in config.pbtxt while disabling in server option
201+
copy_model_repository
202+
echo -e "
203+
parameters: {
204+
key: \"REPORT_CUSTOM_METRICS\"
205+
value: {
206+
string_value:\"yes\"
207+
}
208+
}
209+
" >> models/vllm_opt/config.pbtxt
210+
SERVER_ARGS="${SERVER_ARGS} --allow-metrics=false"
211+
run_server
212+
if [ "$SERVER_PID" == "0" ]; then
213+
cat $SERVER_LOG
214+
echo -e "\n***\n*** Failed to start $SERVER\n***"
215+
exit 1
216+
fi
217+
218+
set +e
219+
python3 $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_refused -v > $CLIENT_LOG 2>&1
220+
221+
if [ $? -ne 0 ]; then
222+
cat $CLIENT_LOG
223+
echo -e "\n***\n*** Running $CLIENT_PY VLLMTritonMetricsTest.test_vllm_metrics_refused FAILED. \n***"
224+
RET=1
225+
else
226+
check_test_results $TEST_RESULT_FILE $EXPECTED_NUM_TESTS
227+
if [ $? -ne 0 ]; then
228+
cat $CLIENT_LOG
229+
echo -e "\n***\n*** Test Result Verification FAILED.\n***"
230+
RET=1
231+
fi
232+
fi
233+
set -e
234+
235+
kill $SERVER_PID
236+
wait $SERVER_PID
237+
rm -rf "./models" "temp.json"
238+
239+
if [ $RET -eq 1 ]; then
240+
cat $CLIENT_LOG
241+
cat $SERVER_LOG
242+
echo -e "\n***\n*** vLLM test FAILED. \n***"
243+
else
244+
echo -e "\n***\n*** vLLM test PASSED. \n***"
245+
fi
246+
247+
collect_artifacts_from_subdir
248+
exit $RET

0 commit comments

Comments
 (0)