Skip to content

Commit c252428

Browse files
authored
[DOCS] Documents that models with one allocation might have downtime (#2567)
* [DOCS] Documents that models with one allocations might have downtime. * [DOCS] Fine-tunes wording.
1 parent f9c8a20 commit c252428

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,11 @@ nodes. Model allocations are independent units of work for NLP tasks. To
198198
influence model performance, you can configure the number of allocations and the
199199
number of threads used by each allocation of your deployment.
200200

201+
IMPORTANT: If your deployed trained model has only one allocation, it's likely
202+
that you will experience downtime in the service your trained model performs.
203+
You can reduce or eliminate downtime by adding more allocations to your trained
204+
models.
205+
201206
Throughput can be scaled by adding more allocations to the deployment; it
202207
increases the number of {infer} requests that can be performed in parallel. All
203208
allocations assigned to a node share the same copy of the model in memory. The

0 commit comments

Comments
 (0)