WIP: (//core): Added device meta data serialization/deserialization implic… #175

andi4191 · 2020-08-24T22:44:39Z

…itly

Signed-off-by: Anurag Dixit [email protected]

Description

This change enables the end-user to bypass setting CUDA device ID at runtime. It automatically sets the CUDA device based on serialization.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)
Added abstraction of Device Info configuration at runtime

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes

core/util/core_util.cpp

core/util/core_util.h

tests/modules/test_serialization.cpp

core/util/core_util.h

core/execution/execution.h

narendasan · 2020-09-01T20:44:18Z

@andi4191 Any updates on the comments. Also there was some changes in the execution phase to support the non deterministic binding order. You will probably need to rebase on master

third_party/cuda/BUILD

third_party/tensorrt/local/BUILD

cpp/api/include/trtorch/trtorch.h

core/execution/TRTEngine.cpp

narendasan · 2020-09-16T22:59:31Z

All existing tests pass on x86 and Jetson for me

narendasan · 2020-09-30T16:55:56Z

Are the Python API changes for device supposed to be in this PR?

github-actions · 2020-11-11T00:03:52Z

This PR has not seen activity for 30 days, Remove stale label or comment or this will be closed in 5 days

core/compiler.h

core/execution/TRTEngine.cpp

core/execution/execution.h

core/runtime/TRTEngine.cpp

narendasan · 2020-11-20T23:32:16Z

core/runtime/TRTEngine.cpp

+  {
+      cuda_device = deserialize_device(serialized_device_info);
+      // Set CUDA device as configured in serialized meta data
+      set_cuda_device(cuda_device);


do we store the deserialized device somewhere?

We don't save the deserialized device information anywhere. Do you want to perform file write in runtime?

docs/_notebooks/Resnet50-example.html

tests/modules/test_serialization.cpp

tests/util/run_graph_engine.cpp

narendasan · 2020-12-10T22:46:44Z

core/compiler.cpp

@@ -174,7 +183,7 @@ torch::jit::script::Module CompileGraph(const torch::jit::script::Module& mod, C
 }

 void set_device(const int gpu_id) {
-  TRTORCH_ASSERT(cudaSetDevice(gpu_id) == cudaSuccess, "Unable to set CUDA device: " << gpu_id);
+  TRTORCH_CHECK((cudaSetDevice(gpu_id) == cudaSuccess), "Unable to set CUDA device: " << gpu_id);


can we call runtime::set_cuda_device here just to centralize responsibility for device management in the runtime section?

narendasan · 2020-12-10T22:49:06Z

core/runtime/TRTEngine.cpp

+  CudaDevice cuda_device;
+  // Deserialize device meta data if device_info is non-empty
+  if (!serialized_device_info.empty()) {
+    cuda_device = deserialize_device(serialized_device_info);


I think the device should be maintained as a field in TRTEngine instead of querying set device before serialization

narendasan · 2020-12-10T22:55:31Z

core/runtime/runtime.h

@@ -19,14 +40,16 @@ struct TRTEngine : torch::CustomClassHolder {
  std::pair<uint64_t, uint64_t> num_io;
  EngineID id;
  std::string name;
+  CudaDevice device_info;


I dont think we use this but we should

narendasan · 2020-12-10T22:58:57Z

core/runtime/TRTEngine.cpp

    : logger(
          std::string("[") + mod_name + std::string("_engine] - "),
          util::logging::get_logger().get_reportable_severity(),
          util::logging::get_logger().get_is_colored_output_on()) {
+  CudaDevice cuda_device;
+  // Deserialize device meta data if device_info is non-empty
+  if (!serialized_device_info.empty()) {


This seems potentially unsafe. What happens if we provide a DLA engine but no serialized_device_info?

narendasan · 2020-12-10T22:59:20Z

core/runtime/TRTEngine.cpp

+TRTEngine::TRTEngine(
+    std::string mod_name,
+    std::string serialized_engine,
+    std::string serialized_device_info = std::string())


Should we take a CudaDevice struct as the arg here instead?

narendasan · 2020-12-10T23:00:52Z

core/runtime/TRTEngine.cpp

-            [](const c10::intrusive_ptr<TRTEngine>& self) -> std::string {
-              auto serialized_engine = self->cuda_engine->serialize();
-              return std::string((const char*)serialized_engine->data(), serialized_engine->size());
+            [](const c10::intrusive_ptr<TRTEngine>& self) -> std::vector<std::string> {


How hard would it be to use a map instead of a vector? I think a map will be a more scalable solution long term for metadata but if its too hard its not worth

narendasan · 2020-12-10T23:01:36Z

core/runtime/TRTEngine.cpp

+          util::logging::get_logger().get_reportable_severity(),
+          util::logging::get_logger().get_is_colored_output_on()) {
+  std::string _name = "deserialized_trt";
+  std::string device_info = serialized_info[0];


I think we need size or key checking here.

narendasan · 2020-12-10T23:03:42Z

Do you think we need to add a section to the main execution function that sets device on the fly? or does it need to be done at de-serialization time only? What happens if there is one engine on GPU 1 and one engine on GPU 2?

narendasan · 2020-12-10T23:05:32Z

Do we need to do something in the runtime that moves input tensors around to the right GPUs for execution?

github-actions · 2021-01-10T00:21:23Z

This PR has not seen activity for 30 days, Remove stale label or comment or this will be closed in 5 days

Signed-off-by: TRTorch Github Bot <[email protected]>

Signed-off-by: inocsin <[email protected]>

Signed-off-by: TRTorch Github Bot <[email protected]>

Signed-off-by: Anurag Dixit <[email protected]>

andi4191 · 2021-05-25T17:10:50Z

After design discussion #311, the PR is being tracked here: #484

Closing this PR.

andi4191 requested a review from narendasan August 24, 2020 22:45

andi4191 self-assigned this Aug 24, 2020

andi4191 added component: core Issues re: The core compiler component: execution Issues re: Execution of engines labels Aug 24, 2020

narendasan reviewed Aug 24, 2020

View reviewed changes

andi4191 force-pushed the anuragd/device_info_serialize branch from 819f6d6 to 9c70a39 Compare September 2, 2020 06:06

narendasan requested changes Sep 16, 2020

View reviewed changes

andi4191 force-pushed the anuragd/device_info_serialize branch from afb0de7 to 1d3529a Compare September 17, 2020 00:43

andi4191 requested a review from narendasan September 18, 2020 19:10

github-actions bot added the No Activity label Nov 11, 2020

narendasan removed the No Activity label Nov 11, 2020

andi4191 force-pushed the anuragd/device_info_serialize branch 2 times, most recently from 1ea0b45 to 3df301e Compare November 20, 2020 19:32

andi4191 added component: runtime component: tests Issues re: Tests labels Nov 20, 2020

andi4191 added this to the v0.2.0 milestone Nov 20, 2020

narendasan reviewed Nov 20, 2020

View reviewed changes

core/compiler.h Outdated Show resolved Hide resolved

narendasan reviewed Nov 20, 2020

View reviewed changes

narendasan reviewed Dec 8, 2020

View reviewed changes

tests/modules/test_serialization.cpp Show resolved Hide resolved

narendasan reviewed Dec 8, 2020

View reviewed changes

tests/util/run_graph_engine.cpp Outdated Show resolved Hide resolved

andi4191 force-pushed the anuragd/device_info_serialize branch from 755a55e to 5069368 Compare December 9, 2020 23:22

narendasan reviewed Dec 10, 2020

View reviewed changes

github-actions bot added the No Activity label Jan 10, 2021

narendasan removed the No Activity label Jan 11, 2021

narendasan modified the milestones: v0.2.0, v0.3.0 Jan 11, 2021

andi4191 force-pushed the anuragd/device_info_serialize branch from 5069368 to fa942e7 Compare March 3, 2021 18:06

andi4191 changed the title ~~(//core): Added device meta data serialization/deserialization implic…~~ WIP: (//core): Added device meta data serialization/deserialization implic… Mar 3, 2021

andi4191 force-pushed the anuragd/device_info_serialize branch 2 times, most recently from 1c2cf90 to 00c1dd3 Compare March 5, 2021 22:20

TRTorch Github Bot and others added 6 commits March 5, 2021 14:23

docs: [Automated] Regenerating documenation from

021ce12

Signed-off-by: TRTorch Github Bot <[email protected]>

add clamp_min/clamp_max converter

7cf5e38

Signed-off-by: inocsin <[email protected]>

update clamp/clamp_min/clamp_max with util function

cea2658

Signed-off-by: inocsin <[email protected]>

use clip to implement clamp

b210866

Signed-off-by: inocsin <[email protected]>

update clamp test case

aa54c1e

Signed-off-by: inocsin <[email protected]>

docs: [Automated] Regenerating documenation from

e3dd820

Signed-off-by: TRTorch Github Bot <[email protected]>

andi4191 force-pushed the anuragd/device_info_serialize branch from 00c1dd3 to 574d77e Compare March 5, 2021 22:23

narendasan force-pushed the master branch from e705c4f to 8cea634 Compare May 13, 2021 23:42

andi4191 force-pushed the anuragd/device_info_serialize branch from 574d77e to 968cb82 Compare May 20, 2021 21:53

Device metadata serialization deserialization

968cb82

Signed-off-by: Anurag Dixit <[email protected]>

andi4191 closed this May 25, 2021

andi4191 deleted the anuragd/device_info_serialize branch June 1, 2021 18:51

WIP: (//core): Added device meta data serialization/deserialization implic… #175

WIP: (//core): Added device meta data serialization/deserialization implic… #175

Uh oh!

Conversation

andi4191 commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan commented Sep 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan commented Sep 16, 2020

Uh oh!

narendasan commented Sep 30, 2020

Uh oh!

github-actions bot commented Nov 11, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

narendasan commented Dec 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

narendasan commented Dec 10, 2020

Uh oh!

github-actions bot commented Jan 10, 2021

Uh oh!

andi4191 commented May 25, 2021

Uh oh!

Uh oh!

andi4191 commented Aug 24, 2020 •

edited

Loading

narendasan commented Dec 10, 2020 •

edited

Loading