Skip to content

Commit 90dc976

Browse files
committed
fix: Address review comments
1 parent 60c73d3 commit 90dc976

File tree

6 files changed

+41
-37
lines changed

6 files changed

+41
-37
lines changed

core/runtime/TRTEngine.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ TRTEngine::TRTEngine(
5252
auto most_compatible_device = get_most_compatible_device(cuda_device);
5353
TORCHTRT_CHECK(most_compatible_device, "No compatible device was found for instantiating TensorRT engine");
5454
device_info = most_compatible_device.value();
55-
multi_gpu_device_check(device_info);
55+
multi_gpu_device_check();
5656
set_rt_device(device_info);
5757

5858
rt = make_trt(nvinfer1::createInferRuntime(util::logging::get_logger()));

core/runtime/runtime.cpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,16 +105,14 @@ RTDevice get_current_device() {
105105
return RTDevice(device_id, nvinfer1::DeviceType::kGPU);
106106
}
107107

108-
void multi_gpu_device_check(const RTDevice& most_compatible_device) {
108+
void multi_gpu_device_check() {
109109
// If multi-device safe mode is disabled and more than 1 device is registered on the machine, warn user
110110
if (!(MULTI_DEVICE_SAFE_MODE) && get_available_device_list().get_devices().size() > 1) {
111111
LOG_WARNING(
112112
"Detected this engine is being instantitated in a multi-GPU system with "
113113
<< "multi-device safe mode disabled. For more on the implications of this "
114-
<< "as well as workarounds, see MULTI_DEVICE_SAFE_MODE.md "
115-
<< "(https://github.com/pytorch/TensorRT/blob/main/py/torch_tensorrt/dynamo/runtime/MULTI_DEVICE_SAFE_MODE.md). "
116-
<< "The engine is set to be instantiated on the cuda device, " << most_compatible_device << ". "
117-
<< "If this is incorrect, please set the desired cuda device as default and retry.");
114+
<< "as well as workarounds, see the linked documentation "
115+
<< "(https://pytorch.org/TensorRT/user_guide/runtime.html#multi-device-safe-mode)");
118116
}
119117
}
120118

core/runtime/runtime.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ std::vector<RTDevice> find_compatible_devices(const RTDevice& target_device);
3434

3535
std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intrusive_ptr<TRTEngine> compiled_engine);
3636

37-
void multi_gpu_device_check(const RTDevice& most_compatible_device);
37+
void multi_gpu_device_check();
3838

3939
class DeviceList {
4040
using DeviceMap = std::unordered_map<int, RTDevice>;

docsrc/user_guide/runtime.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,37 @@ Plugin Library
3434
In the case you use Torch-TensorRT as a converter to a TensorRT engine and your engine uses plugins provided by Torch-TensorRT, Torch-TensorRT
3535
ships the library ``libtorchtrt_plugins.so`` which contains the implementation of the TensorRT plugins used by Torch-TensorRT during
3636
compilation. This library can be ``DL_OPEN`` or ``LD_PRELOAD`` similar to other TensorRT plugin libraries.
37+
38+
Multi Device Safe Mode
39+
---------------
40+
41+
Multi-device safe mode is a setting in Torch-TensorRT which allows the user to determine whether
42+
the runtime checks for device consistency prior to every inference call.
43+
44+
There is a non-negligible, fixed cost per-inference call when multi-device safe mode is enabled, which is why
45+
it is now disabled by default. It can be controlled via the following convenience function which
46+
doubles as a context manager.
47+
48+
.. code-block:: python
49+
50+
# Enables Multi Device Safe Mode
51+
torch_tensorrt.runtime.set_multi_device_safe_mode(True)
52+
53+
# Disables Multi Device Safe Mode [Default Behavior]
54+
torch_tensorrt.runtime.set_multi_device_safe_mode(False)
55+
56+
# Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
57+
with torch_tensorrt.runtime.set_multi_device_safe_mode(True):
58+
...
59+
60+
TensorRT requires that each engine be associated with the CUDA context in the active thread from which it is invoked.
61+
Therefore, if the device were to change in the active thread, which may be the case when invoking
62+
engines on multiple GPUs from the same Python process, safe mode will cause Torch-TensorRT to display
63+
an alert and switch GPUs accordingly. If safe mode were not enabled, there could be a mismatch in the engine
64+
device and CUDA context device, which could lead the program to crash.
65+
66+
One technique for managing multiple TRT engines on different GPUs while not sacrificing performance for
67+
multi-device safe mode is to use Python threads. Each thread is responsible for all of the TRT engines
68+
on a single GPU, and the default CUDA device on each thread corresponds to the GPU for which it is
69+
responsible (can be set via ``torch.cuda.set_device(...)``). In this way, multiple threads can be used in the same
70+
Python script without needing to switch CUDA contexts and incur performance overhead.

py/torch_tensorrt/dynamo/runtime/MULTI_DEVICE_SAFE_MODE.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

py/torch_tensorrt/dynamo/runtime/tools.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ def multi_gpu_device_check() -> None:
1717
logger.warning(
1818
"Detected this engine is being instantitated in a multi-GPU system with "
1919
"multi-device safe mode disabled. For more on the implications of this "
20-
"as well as workarounds, see MULTI_DEVICE_SAFE_MODE.md "
21-
"(https://github.com/pytorch/TensorRT/blob/main/py/torch_tensorrt/dynamo/runtime/MULTI_DEVICE_SAFE_MODE.md). "
20+
"as well as workarounds, see the linked documentation "
21+
"(https://pytorch.org/TensorRT/user_guide/runtime.html#multi-device-safe-mode). "
2222
f"The engine is set to be instantiated on the current default cuda device, cuda:{torch.cuda.current_device()}. "
2323
"If this is incorrect, please set the desired cuda device via torch.cuda.set_device(...) and retry."
2424
)

0 commit comments

Comments
 (0)