Add support for instance group of type 'MODEL' #107

krishung5 · 2023-05-17T19:32:17Z

For MODEL instance group type,

Don't specify the device when loading the model so that the backend will not select the device for the model
For the compute input duration, since only one stream will be used for input collection, we can record the cuda events to calculate the elapsed time. However, for the compute infer duration, since multiple streams will be involved, it is hard to use cuda events as timestamps. Therefore, using cudaLaunchHostFunc to launch a callback function to record the timestamp.

Added testing: triton-inference-server/server#5810
Closes: triton-inference-server/server#5694

src/libtorch.cc

tanmayv25 · 2023-05-26T19:47:34Z

src/libtorch.cc

+      auto parameters = (*torch_model)->parameters();
+      auto buffers = (*torch_model)->buffers();
+
+      for (const auto& parameter : parameters) {


Obtaining device_id sets can be simplified to a templated utility function that can consume both parameters and buffers data type.
Should be able reduce the code duplicity.

tanmayv25 · 2023-05-26T19:58:24Z

src/libtorch.cc

-        at::cuda::getStreamFromExternal(stream_, DeviceId());
+  if ((Kind() == TRITONSERVER_INSTANCEGROUPKIND_GPU) ||
+      (Kind() == TRITONSERVER_INSTANCEGROUPKIND_MODEL && device_.is_cuda())) {
+    at::cuda::CUDAStream torch_stream = at::cuda::getStreamFromExternal(


Is calling setCurrentCUDAStream to the cuda stream corresponding to the first device, really the right thing to do for the KIND_MODEL. Is setting the current cuda stream even required for kind_model?

Discussed offline that we should call setCurrentCUDAStream for the cuda stream of every device to replace the default stream.

Tabrizian

Could you please explain how this PR would solve the cuda synchronization issue?

Just creating the streams doesn't mean that PyTorch would use them. Looks like you are only creating streams for other devices but they are not actually being used by PyTorch for the execution of the model.

src/libtorch.cc

krishung5 · 2023-06-05T16:44:01Z

Updated the description to explain how to do the cuda synchronization for KIND_MODEL. Let me know if there is anything unclear.

krishung5 · 2023-06-05T21:27:04Z

Documentation added in a separate PR: #110

Tabrizian

Do we need to update this part too?

pytorch_backend/src/libtorch.cc

Lines 1774 to 1779 in ead0e23

    
           if (device_.is_cpu()) { 
        
             alloc_perference = {{TRITONSERVER_MEMORY_CPU_PINNED, 0}, 
        
                                 {TRITONSERVER_MEMORY_CPU, 0}}; 
        
           } else { 
        
             alloc_perference = {{TRITONSERVER_MEMORY_GPU, device_.index()}}; 
        
           }

I guess for KIND_MODEL the inputs will be always in CPU since we don't have a way to query the input types required by the model beforehand?

src/libtorch.cc

krishung5 · 2023-06-06T04:46:08Z

Do we need to update this part too?

pytorch_backend/src/libtorch.cc

Lines 1774 to 1779 in ead0e23

if (device_.is_cpu()) {

alloc_perference = {{TRITONSERVER_MEMORY_CPU_PINNED, 0},

{TRITONSERVER_MEMORY_CPU, 0}};

} else {

alloc_perference = {{TRITONSERVER_MEMORY_GPU, device_.index()}};

}

I guess for KIND_MODEL the inputs will be always in CPU since we don't have a way to query the input types required by the model beforehand?

That is correct. Here the device_.is_cpu() will be true for KIND_MODEL because it is initialized with torch::kCPU and is not updated to any GPU device for KIND_MODEL. However, this condition can be indirect, so I have updated the if condition and the comment to make it clearer.

src/libtorch.cc

krishung5 · 2023-06-07T16:42:22Z

For KIND_MODEL, used the first stream from stream_vec_ as the stream for input/output collector/responder.
Added cuda stream synchronization before reading the output tensors.
Removed the cuda synchronization for all the streams from places where only one stream will be used(input/output collector/responder).
Used two variants of cuda callback functions to capture the first and the last timestamp.
Since we are adding a cuda stream synchronization before reading the output tensors, we wouldn't need to get the first_compute_infer_start. We just need to get the last_comput_infer_start(compute_output_start). Attaching the graph from the discussion yesterday:

Tabrizian · 2023-06-08T15:19:19Z

src/libtorch.cc

+  std::mutex timestamp_mu;
+
+  uint64_t compute_input_start = 0;


Instead of mutex / variable combination you can use an atomic variable: std::atomic<uint64_t>

Updated to use std::atomic<uint64_t> and removed the mutex.

Tabrizian · 2023-06-08T15:20:15Z

src/libtorch.cc

+#ifdef TRITON_ENABLE_GPU
+    for (const auto& stream : stream_vec_) {
+      cudaLaunchHostFunc(
+          stream, CaptureFirstTimestampCallback,
+          reinterpret_cast<void*>(&compute_input_cb_data));
+    }


As discussed, I don't think for inputs we need to make any changes. We only need the multi-stream logic for calculating the compute infer time.

Removed this part and use cuda events for the comput input duration.

Tabrizian · 2023-06-08T15:20:56Z

src/libtorch.cc

+  uint64_t compute_infer_start = 0;

+  std::tuple<uint64_t*, std::mutex*> compute_infer_cb_data(
+      &compute_infer_start, &timestamp_mu);


I don't think these changes are required.

We are using cuda events for the comput input duration, but we would still need the compute_infer_start timestamp for the compute infer duration.

Edit: for compute_infer_start, no need to use callback function. Simply using SET_TIMESTAMP.

…callback for compute_infer_duration

krishung5 · 2023-06-09T01:48:13Z

@Tabrizian Addressed all the comments from the offline discussion. Please review, thank you!

Tabrizian

@tanmayv25 would be great if you can take a look as well.

src/libtorch.cc

krishung5 · 2023-06-09T23:16:33Z

Fix up the initialization of std::atomic which was failing on jetson build.

src/libtorch.cc

* Add support for instance group of type 'MODEL' * Format * Handle multi GPU cases when recording timestamps * Address comment * Use callback function to record timestamp for 'MODEL' kind * Add missing #ifdef * Update comment and if condition for input tensor memory alloc_perference * Fix for cuda stream. Use separate cuda callback to capture timestamp * Add comment to mention the possible timestamp issue * For 'KIND_MODEL', use cuda events for compute_input_duration and use callback for compute_infer_duration * Move the cudaLaunchHostFunc from RecordBackendTimestamp function * Fix up naming * Fix up * Fix up atomic initialization * Capture the timestamp after synchronization

krishung5 mentioned this pull request May 25, 2023

Add testing for Pytorch instance group kind MODEL triton-inference-server/server#5810

Merged

krishung5 changed the title ~~Add support for instance group of type 'MODEL~~ Add support for instance group of type 'MODEL' May 25, 2023

krishung5 marked this pull request as ready for review May 25, 2023 21:26

nnshah1 reviewed May 26, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

nnshah1 reviewed May 26, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

tanmayv25 reviewed May 26, 2023

View reviewed changes

Tabrizian requested changes May 29, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

krishung5 force-pushed the krish-pytorch branch 2 times, most recently from 12e8e56 to b3d6ec6 Compare June 5, 2023 16:43

krishung5 requested review from tanmayv25, Tabrizian and nnshah1 June 5, 2023 16:44

Tabrizian reviewed Jun 5, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

krishung5 force-pushed the krish-pytorch branch from 749901a to 467b8d1 Compare June 6, 2023 04:37

Tabrizian reviewed Jun 6, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

src/libtorch.cc Outdated Show resolved Hide resolved

src/libtorch.cc Outdated Show resolved Hide resolved

krishung5 mentioned this pull request Jun 6, 2023

Add documentation for instance group kind of type 'KIND_MODEL' #110

Merged

tanmayv25 reviewed Jun 6, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

tanmayv25 reviewed Jun 6, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

src/libtorch.cc Outdated Show resolved Hide resolved

krishung5 force-pushed the krish-pytorch branch from 467b8d1 to 67abc75 Compare June 7, 2023 10:33

krishung5 requested review from tanmayv25 and Tabrizian June 7, 2023 16:48

Tabrizian reviewed Jun 8, 2023

View reviewed changes

krishung5 added 4 commits June 8, 2023 13:27

Add support for instance group of type 'MODEL'

0b86d4f

Format

05ae43a

Handle multi GPU cases when recording timestamps

a9af556

Address comment

dc3b92f

krishung5 added 4 commits June 8, 2023 13:28

Add missing #ifdef

0be6f57

Update comment and if condition for input tensor memory alloc_perference

01263d4

Fix for cuda stream. Use separate cuda callback to capture timestamp

1bab812

Add comment to mention the possible timestamp issue

b1bd8af

krishung5 force-pushed the krish-pytorch branch from 65e8cb4 to 157920e Compare June 8, 2023 20:29

For 'KIND_MODEL', use cuda events for compute_input_duration and use …

6e7a066

…callback for compute_infer_duration

krishung5 force-pushed the krish-pytorch branch from 157920e to 6e7a066 Compare June 8, 2023 20:34

krishung5 requested a review from Tabrizian June 8, 2023 20:38

krishung5 added 2 commits June 8, 2023 18:31

Move the cudaLaunchHostFunc from RecordBackendTimestamp function

6ab2ba7

Fix up naming

91ec61a

Tabrizian reviewed Jun 9, 2023

View reviewed changes

src/libtorch.cc Show resolved Hide resolved

src/libtorch.cc Show resolved Hide resolved

src/libtorch.cc Show resolved Hide resolved

Fix up

170da74

krishung5 requested a review from Tabrizian June 9, 2023 15:56

Tabrizian previously approved these changes Jun 9, 2023

View reviewed changes

Fix up atomic initialization

1b1e2c6

krishung5 dismissed Tabrizian’s stale review via 1b1e2c6 June 9, 2023 23:13

krishung5 requested a review from Tabrizian June 12, 2023 16:36

Tabrizian previously approved these changes Jun 12, 2023

View reviewed changes

tanmayv25 requested changes Jun 12, 2023

View reviewed changes

src/libtorch.cc Outdated Show resolved Hide resolved

Capture the timestamp after synchronization

4778149

krishung5 dismissed Tabrizian’s stale review via 4778149 June 12, 2023 21:28

krishung5 requested a review from tanmayv25 June 12, 2023 21:30

tanmayv25 approved these changes Jun 13, 2023

View reviewed changes

krishung5 merged commit 83d2ada into main Jun 13, 2023

krishung5 deleted the krish-pytorch branch June 13, 2023 05:43

krishung5 restored the krish-pytorch branch June 13, 2023 05:54

	if (device_.is_cpu()) {
	alloc_perference = {{TRITONSERVER_MEMORY_CPU_PINNED, 0},
	{TRITONSERVER_MEMORY_CPU, 0}};
	} else {
	alloc_perference = {{TRITONSERVER_MEMORY_GPU, device_.index()}};
	}

Add support for instance group of type 'MODEL' #107

Add support for instance group of type 'MODEL' #107

Uh oh!

Conversation

krishung5 commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tabrizian left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krishung5 commented Jun 5, 2023

Uh oh!

krishung5 commented Jun 5, 2023

Uh oh!

Tabrizian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krishung5 commented Jun 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishung5 commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krishung5 Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krishung5 commented Jun 9, 2023

Uh oh!

Tabrizian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishung5 commented Jun 9, 2023

Uh oh!

Uh oh!

Uh oh!

krishung5 commented May 17, 2023 •

edited

Loading

Tabrizian left a comment •

edited

Loading

krishung5 commented Jun 7, 2023 •

edited

Loading

krishung5 Jun 8, 2023 •

edited

Loading