Skip to content

feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, serde cleanup #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 12, 2021

Conversation

narendasan
Copy link
Collaborator

Description

BREAKING CHANGE: This commit cleans up the WIP CudaDevice class,
simplifying implementation and formalizing the seralized format for CUDA
devices.

It also implements ABI Versioning. The first entry in the serialized
format of a TRTEngine now records the ABI that the engine was compiled
with, defining expected compatibility with the TRTorch runtime. If the
ABI version does not match, the runtime will error out asking to
recompile the program.

ABI version is a monotonically increasing integer and should be
incremented everytime the serialization format changes in some way.

This commit cleans up the CudaDevice class, implementing a number of
constructors to replace the various utility functions that populate the
struct. Descriptive utility functions remain but solely call the
relevant constructor.
Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

serde cleanup

BREAKING CHANGE: This commit cleans up the WIP CudaDevice class,
simplifying implementation and formalizing the seralized format for CUDA
devices.

It also implements ABI Versioning. The first entry in the serialized
format of a TRTEngine now records the ABI that the engine was compiled
with, defining expected compatibility with the TRTorch runtime. If the
ABI version does not match, the runtime will error out asking to
recompile the program.

ABI version is a monotonically increasing integer and should be
incremented everytime the serialization format changes in some way.

This commit cleans up the CudaDevice class, implementing a number of
constructors to replace the various utility functions that populate the
struct. Descriptive utility functions remain but solely call the
relevant constructor.

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
@narendasan narendasan requested a review from andi4191 July 1, 2021 05:56
@github-actions github-actions bot added component: api [Python] Issues re: Python API component: api [C++] Issues re: C++ API component: core Issues re: The core compiler component: runtime component: tests Issues re: Tests labels Jul 1, 2021
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

<< ". Switching context");
return true;
}
// REVIEW: This shouldnt be a reason to switch GPU
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasnt sure why we have this check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the device_id configured are not the same, then the memory allocations need to move between the devices. Hence, confirmation for switching the context is needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make the warning more explanatory then

// Set Device Type
this->device_type = device_type;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a delimted string vs. a byte indexed string, should be easier to parse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still requires parsing based on the positional encoding of the parameters during deserialization.

<< ". Switching context");
return true;
}
// REVIEW: This shouldnt be a reason to switch GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the device_id configured are not the same, then the memory allocations need to move between the devices. Hence, confirmation for switching the context is needed.

// Set Device Type
this->device_type = device_type;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still requires parsing based on the positional encoding of the parameters during deserialization.

}

CudaDevice get_current_device() {
int device = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be -1

<< ". Switching context");
return true;
}
// REVIEW: This shouldnt be a reason to switch GPU
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make the warning more explanatory then

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/workspace/core/runtime/register_trt_op.cpp b/tmp/changes.txt
index 0241ca3..ce92c42 100644
--- a/workspace/core/runtime/register_trt_op.cpp
+++ b/tmp/changes.txt
@@ -35,7 +35,7 @@ bool is_switch_required(const CudaDevice& curr_device, const CudaDevice& conf_de
  if (curr_device.id != conf_device.id) {
    LOG_WARNING(
        "Configured Device ID: " << conf_device.id << " is different that current device ID: " << curr_device.id
-                                  << ". Moving input tensors to device: " << conf_device.id);
+                                 << ". Moving input tensors to device: " << conf_device.id);
    return true;
  }

diff --git a/workspace/core/runtime/CudaDevice.cpp b/tmp/changes.txt
index 95ed54d..496c217 100644
--- a/workspace/core/runtime/CudaDevice.cpp
+++ b/tmp/changes.txt
@@ -73,11 +73,11 @@ std::string CudaDevice::serialize() {
  content[ID_IDX] = std::to_string(id);
  content[SM_MAJOR_IDX] = std::to_string(major);
  content[SM_MINOR_IDX] = std::to_string(minor);
-  content[DEVICE_TYPE_IDX] = std::to_string((int64_t) device_type);
+  content[DEVICE_TYPE_IDX] = std::to_string((int64_t)device_type);
  content[DEVICE_NAME_IDX] = device_name;

  std::stringstream ss;
-  for(size_t i = 0; i < content.size() - 1; i++) {
+  for (size_t i = 0; i < content.size() - 1; i++) {
    ss << content[i] << DEVICE_INFO_DELIM;
  }
  ss << content[DEVICE_NAME_IDX];
ERROR: Some files do not conform to style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
@narendasan narendasan force-pushed the narendasan/device_serde branch from 771e67b to 8d3258e Compare July 2, 2021 01:22
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

@narendasan narendasan merged commit c9af908 into anuragd/dev_serdes Jul 12, 2021
@narendasan narendasan deleted the narendasan/device_serde branch July 12, 2021 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: api [C++] Issues re: C++ API component: api [Python] Issues re: Python API component: core Issues re: The core compiler component: runtime component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants