add thread control for pytorch backend #125

yongbinfeng · 2024-03-06T07:04:04Z

As noted in the issue here: triton-inference-server/server#6896 we have found out the number of threads can affect the pytorch inference performance a lot. In some cases we have seen PyTorch inference runs (super) slow on multi-core CPU machines, and just setting number of instance is not enough to handle the problem. We have tested using at::set_num_threads(1) and confirmed this fixes the slow inference issue.

This PR allows to configure intra_op_thread_count and inter_op_thread_count for pytorch models, similar to other backends such as TF and ONNX, with syntax such as

parameters { key: "INTRA_OP_THREAD_COUNT" value: { string_value: "1" } }
parameters { key: "INTER_OP_THREAD_COUNT" value: { string_value: "1" } }

Pascualex · 2024-03-13T17:11:36Z

Thank you for this, we are experiencing the same problem and are being forced to convert our models to ONNX, which in turn is causing other issues.

I'm not a maintainer so I can only validate that this is a real issue for us too.

tanmayv25 · 2024-04-18T19:45:44Z

@yongbinfeng Can you submit Triton CLA?

src/libtorch.cc

src/libtorch_utils.cc

src/libtorch_utils.h

src/libtorch.cc

src/libtorch_utils.cc

src/libtorch_utils.h

yongbinfeng · 2024-04-18T19:52:14Z

Triton CLA

I think I've done already that, through my affiliation (fermilab) and my affiliation email. (The other PR is already merged: #120 so hopefully it should be fine I guess?)

tanmayv25 · 2024-04-18T21:30:11Z

Thanks for your contribution!

add pytorch thread control

105a048

yongbinfeng mentioned this pull request Mar 6, 2024

thread control for pytorch backend to fix the issue of PyTorch very slow inference on multi-core CPUs triton-inference-server/server#6896

Open

tanmayv25 self-assigned this Apr 15, 2024

tanmayv25 requested changes Apr 18, 2024

View reviewed changes

use function overloading and update copyright years

a78efaf

tanmayv25 self-requested a review April 18, 2024 21:29

tanmayv25 approved these changes Apr 18, 2024

View reviewed changes

tanmayv25 merged commit c50d65b into triton-inference-server:main Apr 18, 2024

tanmayv25 mentioned this pull request Apr 18, 2024

Document the thread count options #126

Merged

erenboz mentioned this pull request Nov 26, 2024

Thread controls in pytorch_backend is not handled correctly (due to wrong assumption) triton-inference-server/server#7836

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add thread control for pytorch backend #125

add thread control for pytorch backend #125

Uh oh!

yongbinfeng commented Mar 6, 2024

Uh oh!

Pascualex commented Mar 13, 2024

Uh oh!

tanmayv25 commented Apr 18, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yongbinfeng commented Apr 18, 2024

Uh oh!

tanmayv25 commented Apr 18, 2024

Uh oh!

Uh oh!

add thread control for pytorch backend #125

add thread control for pytorch backend #125

Uh oh!

Conversation

yongbinfeng commented Mar 6, 2024

Uh oh!

Pascualex commented Mar 13, 2024

Uh oh!

tanmayv25 commented Apr 18, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yongbinfeng commented Apr 18, 2024

Uh oh!

tanmayv25 commented Apr 18, 2024

Uh oh!

Uh oh!