-
Notifications
You must be signed in to change notification settings - Fork 53
add thread control for pytorch backend #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add thread control for pytorch backend #125
Conversation
Thank you for this, we are experiencing the same problem and are being forced to convert our models to ONNX, which in turn is causing other issues. I'm not a maintainer so I can only validate that this is a real issue for us too. |
@yongbinfeng Can you submit Triton CLA? |
I think I've done already that, through my affiliation (fermilab) and my affiliation email. (The other PR is already merged: #120 so hopefully it should be fine I guess?) |
Thanks for your contribution! |
As noted in the issue here: triton-inference-server/server#6896 we have found out the number of threads can affect the pytorch inference performance a lot. In some cases we have seen PyTorch inference runs (super) slow on multi-core CPU machines, and just setting number of instance is not enough to handle the problem. We have tested using
at::set_num_threads(1)
and confirmed this fixes the slow inference issue.This PR allows to configure
intra_op_thread_count
andinter_op_thread_count
for pytorch models, similar to other backends such as TF and ONNX, with syntax such as