Replies: 2 comments
-
Hi @KumoLiu , Could you please help share some comments about this question? I think you are also using the Nsight system recently. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
I tried to do some comparative experiments and the results are as follows. The height of the blue bars represents utilization rate.
From the table, we could see that Do you have any other insight on this? @Nic-Ma @wyli @yiheng-wang-nv |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!


The profiling section in the fast_training_tutorial is very useful!
I'm trying to visualize the GPU transform and ThreadDataLoader in the fast training in NsighT Systems UI. Clicking on the range for GPU transform "RandCrop" doesn't correlate with any kernels on the GPU row, please see the two screenshots below:
I'm wondering if the CUDA API call "CatArrayBatchedCopy" right after the range "RandCrop" is the one responsible for launching the GPU transform?

Another quick question, in fast training, the "dataload" range doesn't project to the GPU row, please see the highlighted area in the screenshot below. Is that because nothing is happening on the GPU during that range (i.e., no host-to-device data movement as in regular PyTorch training)?

Beta Was this translation helpful? Give feedback.
All reactions