Skip to content

Commit 150e983

Browse files
authored
docs: Add note about ignore_eos for MTP (#1475)
1 parent 227a0e7 commit 150e983

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

examples/tensorrt_llm/README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,15 @@ cd /workspace/examples/tensorrt_llm
129129
dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
130130
```
131131

132-
#### Aggregated serving with Multi-Token Prediction(MTP) and DeepSeek R1
132+
#### Aggregated serving with Multi-Token Prediction (MTP) and DeepSeek R1
133133
```bash
134134
cd /workspace/examples/tensorrt_llm
135135
dynamo serve graphs.agg:Frontend -f configs/deepseek_r1/mtp/mtp_agg.yaml
136136
```
137+
137138
Notes:
138139
- There is a noticeable latency for the first two inference requests. Please send warm-up requests before starting the benchmark.
139-
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking
140+
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking. Additionally, `ignore_eos` should generally be omitted or set to `false` when using MTP to avoid speculating garbage outputs and getting unrealistic acceptance rates.
140141

141142
#### Multi-Node Disaggregated Serving
142143

@@ -233,7 +234,7 @@ Notes:
233234
unset SLURM_JOBID SLURM_JOB_ID SLURM_NODELIST
234235
```
235236

236-
#### Multi-Node Disaggregated Serving with Multi-Token Prediction(MTP) and DeepSeek R1
237+
#### Multi-Node Disaggregated Serving with Multi-Token Prediction (MTP) and DeepSeek R1
237238

238239
Most of the steps remain the same as the above example, but this time we will have `dynamo serve` point to different config files that contains the MTP configurations
239240

@@ -268,8 +269,9 @@ dynamo serve components.prefill_worker:TensorRTLLMPrefillWorker -f configs/deeps
268269
```
269270

270271
Notes:
271-
- There is a noticeable latency for the first four inference requests. Please send warm-up requests before starting the benchmark.
272-
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking
272+
- There is a noticeable latency for the first two inference requests. Please send warm-up requests before starting the benchmark.
273+
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking. Additionally, `ignore_eos` should generally be omitted or set to `false` when using MTP to avoid speculating garbage outputs and getting unrealistic acceptance rates.
274+
273275

274276
### Client
275277

0 commit comments

Comments
 (0)