@@ -176,6 +176,47 @@ key: "ENABLE_CACHE_CLEANING"
176
176
}
177
177
```
178
178
179
+ * ` INTER_OP_THREAD_COUNT ` :
180
+
181
+ PyTorch allows using multiple CPU threads during TorchScript model inference.
182
+ One or more inference threads execute a model’s forward pass on the given
183
+ inputs. Each inference thread invokes a JIT interpreter that executes the ops
184
+ of a model inline, one by one. This parameter sets the size of this thread
185
+ pool. The default value of this setting is the number of cpu cores. Please refer
186
+ to [ this] ( https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html )
187
+ document on how to set this parameter properly.
188
+
189
+ The section of model config file specifying this parameter will look like:
190
+
191
+ ```
192
+ parameters: {
193
+ key: "INTER_OP_THREAD_COUNT"
194
+ value: {
195
+ string_value:"1"
196
+ }
197
+ }
198
+ ```
199
+
200
+ * ` INTRA_OP_THREAD_COUNT ` :
201
+
202
+ In addition to the inter-op parallelism, PyTorch can also utilize multiple threads
203
+ within the ops (intra-op parallelism). This can be useful in many cases, including
204
+ element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and
205
+ others. The default value for this setting is the number of CPU cores. Please refer
206
+ to [ this] ( https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html )
207
+ document on how to set this parameter properly.
208
+
209
+ The section of model config file specifying this parameter will look like:
210
+
211
+ ```
212
+ parameters: {
213
+ key: "INTRA_OP_THREAD_COUNT"
214
+ value: {
215
+ string_value:"1"
216
+ }
217
+ }
218
+ ```
219
+
179
220
* Additional Optimizations: Three additional boolean parameters are available to disable
180
221
certain Torch optimizations that can sometimes cause latency regressions in models with
181
222
complex execution modes and dynamic shapes. If not specified, all are enabled by default.
0 commit comments