Skip to content

Commit 5c97507

Browse files
tanmayv25kthui
andauthored
Document the thread count options (#126)
* Document the thread count options * Format fix * Apply suggestions from code review Co-authored-by: Jacky <[email protected]> --------- Co-authored-by: Jacky <[email protected]>
1 parent c50d65b commit 5c97507

File tree

2 files changed

+45
-4
lines changed

2 files changed

+45
-4
lines changed

README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,47 @@ key: "ENABLE_CACHE_CLEANING"
176176
}
177177
```
178178

179+
* `INTER_OP_THREAD_COUNT`:
180+
181+
PyTorch allows using multiple CPU threads during TorchScript model inference.
182+
One or more inference threads execute a model’s forward pass on the given
183+
inputs. Each inference thread invokes a JIT interpreter that executes the ops
184+
of a model inline, one by one. This parameter sets the size of this thread
185+
pool. The default value of this setting is the number of cpu cores. Please refer
186+
to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
187+
document on how to set this parameter properly.
188+
189+
The section of model config file specifying this parameter will look like:
190+
191+
```
192+
parameters: {
193+
key: "INTER_OP_THREAD_COUNT"
194+
value: {
195+
string_value:"1"
196+
}
197+
}
198+
```
199+
200+
* `INTRA_OP_THREAD_COUNT`:
201+
202+
In addition to the inter-op parallelism, PyTorch can also utilize multiple threads
203+
within the ops (intra-op parallelism). This can be useful in many cases, including
204+
element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and
205+
others. The default value for this setting is the number of CPU cores. Please refer
206+
to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
207+
document on how to set this parameter properly.
208+
209+
The section of model config file specifying this parameter will look like:
210+
211+
```
212+
parameters: {
213+
key: "INTRA_OP_THREAD_COUNT"
214+
value: {
215+
string_value:"1"
216+
}
217+
}
218+
```
219+
179220
* Additional Optimizations: Three additional boolean parameters are available to disable
180221
certain Torch optimizations that can sometimes cause latency regressions in models with
181222
complex execution modes and dynamic shapes. If not specified, all are enabled by default.

src/libtorch.cc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -476,8 +476,8 @@ ModelState::ParseParameters()
476476
// is made to 'intra_op_thread_count', which by default will take all
477477
// threads
478478
int intra_op_thread_count = -1;
479-
err = ParseParameter(
480-
params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count);
479+
err =
480+
ParseParameter(params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count);
481481
if (err != nullptr) {
482482
if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) {
483483
return err;
@@ -500,8 +500,8 @@ ModelState::ParseParameters()
500500
// is made to 'inter_op_thread_count', which by default will take all
501501
// threads
502502
int inter_op_thread_count = -1;
503-
err = ParseParameter(
504-
params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count);
503+
err =
504+
ParseParameter(params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count);
505505
if (err != nullptr) {
506506
if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) {
507507
return err;

0 commit comments

Comments
 (0)