You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #4298
This very simple metric runs a kernel across an increasing number of workgroups, until there is a noticeable increase in latency, as seen in the following graph:
{F1762497995}
The shader uses an integer division as its metric, because it is a multi-cycle operation that puts the ALU to work and stops the SM from context switching.
As other metrics, we start by obtaining the minimum number of iterations, NITER, that can run in 1000us, as to have a baseline for comparison and reduce timing noise. With this number of iterations, we run the kernel with an increasing number of threads. We also use a multidimensional global workgroup with a Y size of 1024 in hopes of saturating the ALUs and have a better point of reference for the latency caused by adding warps.
Once we detect a jump in latency, we can assume that that is the warp size.
More information can be found [here](https://www.microsoft.com/en-us/research/uploads/prod/2022/02/mobigpu_mobicom22_camera.pdf) on page 5.
Reviewed By: jorgep31415
Differential Revision: D59920169
fbshipit-source-id: 4ac9324e10f0ab1a72433fd7ce98ad5f5ab839e9
0 commit comments