File tree Expand file tree Collapse file tree 1 file changed +5
-0
lines changed Expand file tree Collapse file tree 1 file changed +5
-0
lines changed Original file line number Diff line number Diff line change @@ -198,6 +198,11 @@ nodes. Model allocations are independent units of work for NLP tasks. To
198
198
influence model performance, you can configure the number of allocations and the
199
199
number of threads used by each allocation of your deployment.
200
200
201
+ IMPORTANT: If your deployed trained model has only one allocation, it's likely
202
+ that you will experience downtime in the service your trained model performs.
203
+ You can reduce or eliminate downtime by adding more allocations to your trained
204
+ models.
205
+
201
206
Throughput can be scaled by adding more allocations to the deployment; it
202
207
increases the number of {infer} requests that can be performed in parallel. All
203
208
allocations assigned to a node share the same copy of the model in memory. The
You can’t perform that action at this time.
0 commit comments