Dynamic batching for PyTorch inference (not batch transform!) #2462
Unanswered
johann-petrak
asked this question in
Help
Replies: 2 comments
-
Does anybody know if there is a way to get dynamic batches work with Pytorch inference and what the conventions are there so that AWS automatically groups multiple requests within a certain time into one request with a list of request data? |
Beta Was this translation helpful? Give feedback.
0 replies
-
For Pytorch inference the AWS container basically relies 100% on torch serve where this was not possible so far using a config file. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Does this work, if yes how does it work and how can I configure it?
I think deep down in the nested design of the inference image is something where the inferencer takes an array of inputs and where it is possible to configure a batch size and a maximum timeout. When the inference endpoint gets more than batchsize requests within the timeout, then all those requests get grouped into a single call to the inferences so that the whole batch can be sent through the inferencer.
However, I cannot see how this works with the PyTorch inferencer? Could you shed some light on this?
Beta Was this translation helpful? Give feedback.
All reactions