Skip to content

Add support for batch_input #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

HennerM
Copy link
Contributor

@HennerM HennerM commented Apr 8, 2023

The backend was missing support for batch_input inputs, specifically, no batch_input is passed to the model.

I added the ability to pass these inputs through to the model, the same way normal input's are passed through.
I followed the implementation in the onnxruntime_backend, and kept to passing the batch_input data as Tensor on the CPU.

@dyastremsky
Copy link
Contributor

Thanks for making this contribution, Markus! This looks like a great change. Before we can start review, can you please submit your Contributor License Agreement, directions here?

Also, we'll need to add tests to validate this change. If you're able to update the L0_batch_input to ensure this passes, that would help with getting this in. The models used in those tests are generated here and here, I believe.

Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave minor comment, and @dyastremsky 's points on signed CLA and testing. Otherwise looks good to me

src/libtorch.cc Outdated
batch_input_count_ = config_batch_inputs.ArraySize();
expected_input_cnt += batch_input_count_;
} else {
batch_input_count_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can initalize to 0 in the constructor member init list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an initialisation at the declaration now, let me know if that's okay.

@dyastremsky
Copy link
Contributor

@HennerM Are you able to fill out and send in the CLA?

@HennerM
Copy link
Contributor Author

HennerM commented May 10, 2023

@HennerM Are you able to fill out and send in the CLA?

Sorry for the delay, we had the CLA sent through for Speechmatics to [email protected] on 4th of May

@dyastremsky
Copy link
Contributor

@HennerM Are you able to fill out and send in the CLA?

Sorry for the delay, we had the CLA sent through for Speechmatics to [email protected] on 4th of May

Thanks Markus, received! Will start working on the tests to get this merged.

@HennerM HennerM force-pushed the batch-input-support branch from 4a7753d to 5506203 Compare May 19, 2023 16:56
@HennerM
Copy link
Contributor Author

HennerM commented May 19, 2023

@HennerM Are you able to fill out and send in the CLA?

Sorry for the delay, we had the CLA sent through for Speechmatics to [email protected] on 4th of May

Thanks Markus, received! Will start working on the tests to get this merged.

Appreciated, I had a look at the test repo but couldn't quite figure out how it all fits together.

@dyastremsky
Copy link
Contributor

Hi Henner. Keeping you updated. You can see the updated test here: triton-inference-server/server#5855

You can generate the models by going into server/qa/common and running CUDA_DEVICE=0 ./gen_qa_model_repository.

The batch_input support is failing, so will try to debug it to see what needs changing. When the batch input is made on the CPU, PyTorch complains due to it being on a different device than the other inputs. When modified to be made on the GPU, it segfaults.

@dyastremsky
Copy link
Contributor

dyastremsky commented May 31, 2023

Thank you for this PR, @HennerM! This works and passes testing if you modify the PR to create the batch inputs on the current device rather than the CPU. As an example, please see these changes.

That PR also has some extra differences due to running the auto-formatter on libtorch.cc. The main differences are here, here, and here.

If you're able to make these changes, we can get this pull request merged. Let me know if you have any questions or comments.

@HennerM HennerM force-pushed the batch-input-support branch 2 times, most recently from bc38192 to 625dee4 Compare June 5, 2023 17:04
@HennerM HennerM force-pushed the batch-input-support branch from 625dee4 to 8c2bcc8 Compare June 5, 2023 17:06
@HennerM
Copy link
Contributor Author

HennerM commented Jun 5, 2023

I changed the batch input to create the tensor on the Triton assigned device and ran clang-format over the .cc files, I got some formatting changes that are unrelated to my changes though, I hope that's okay?

@dyastremsky
Copy link
Contributor

I changed the batch input to create the tensor on the Triton assigned device and ran clang-format over the .cc files, I got some formatting changes that are unrelated to my changes though, I hope that's okay?

Yep, that's perfect! Thank you for making the changes. Let me re-run CI quickly, then should be able to approve and merge once it passes.

@dyastremsky dyastremsky merged commit f405488 into triton-inference-server:main Jun 5, 2023
@dyastremsky
Copy link
Contributor

Great work! This pull request is now merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants