Skip to content

Commit 6ecda12

Browse files
committed
Add doc
1 parent 1d94c6a commit 6ecda12

File tree

1 file changed

+16
-3
lines changed

1 file changed

+16
-3
lines changed

README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2020-2023, NVIDIA CORPORATION. All rights reserved.
2+
# Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -155,7 +155,20 @@ optimization { execution_accelerators {
155155
```
156156

157157
## ONNX Runtime with CUDA Execution Provider optimization
158-
When GPU is enabled for ORT, CUDA execution provider is enabled. If TensorRT is also enabled then CUDA EP is treated as a fallback option (only comes into picture for nodes which TensorRT cannot execute). If TensorRT is not enabled then CUDA EP is the primary EP which executes the models. ORT enabled configuring options for CUDA EP to further optimize based on the specific model and user scenarios. To enable CUDA EP optimization you must set the model configuration appropriately. There are several optimizations available, like selection of max mem, cudnn conv algorithm etc... The optimization parameters and their description are as follows.
158+
When GPU is enabled for ORT, CUDA execution provider is enabled. If TensorRT is also enabled then CUDA EP is treated as a fallback option (only comes into picture for nodes which TensorRT cannot execute). If TensorRT is not enabled then CUDA EP is the primary EP which executes the models. ORT enabled configuring options for CUDA EP to further optimize based on the specific model and user scenarios. There are several optimizations available, please refer to the [ONNX Runtime doc](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#cuda-execution-provider) for more details. To enable CUDA EP optimization you must set the model configuration appropriately:
159+
160+
```
161+
optimization { execution_accelerators {
162+
gpu_execution_accelerator : [ {
163+
name : "cuda"
164+
parameters { key: "cudnn_conv_use_max_workspace" value: "0" }
165+
parameters { key: "use_ep_level_unified_stream" value: "1" }}
166+
]
167+
}}
168+
```
169+
170+
### Deprecated Parameters
171+
The way to specify these specific parameters as shown below is deprecated. For backward compatibility, these parameters are still supported. Please use the above method to specify the parameters.
159172

160173
* `cudnn_conv_algo_search`: CUDA Convolution algorithm search configuration. Available options are 0 - EXHAUSTIVE (expensive exhaustive benchmarking using cudnnFindConvolutionForwardAlgorithmEx). This is also the default option, 1 - HEURISTIC (lightweight heuristic based search using cudnnGetConvolutionForwardAlgorithm_v7), 2 - DEFAULT (default algorithm using CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM)
161174

@@ -165,7 +178,7 @@ When GPU is enabled for ORT, CUDA execution provider is enabled. If TensorRT is
165178

166179
* `do_copy_in_default_stream`: Flag indicating if copying needs to take place on the same stream as the compute stream in the CUDA EP. Available options are: 0 = Use separate streams for copying and compute, 1 = Use the same stream for copying and compute. Defaults to 1.
167180

168-
The section of model config file specifying these parameters will look like:
181+
In the model config file, specifying these parameters will look like:
169182

170183
```
171184
.

0 commit comments

Comments
 (0)