You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #4159
This adds an internal implementation of https://github.com/microsoft/ArchProbe.
This stack introduces a kernel that can be used to get the number of available registers on a mobile GPU by gradually increasing the number of accessed elements and detecting dramatic drops in performance. See [this paper ](https://www.microsoft.com/en-us/research/uploads/prod/2022/02/mobigpu_mobicom22_camera.pdf), page 4, for more information.
This first diff gets the number of iterations (NITER) that can run in 1000us, to be used in the following tests.
The kernel looks like the following for any K number of registers:
float reg_data0 = float(niter) + 0;
float reg_data1 = float(niter) + 1;
...
float reg_dataK = float(niter) + K;
int i = 0;
for (; i < niter; ++i) {
reg_data0 *= reg_dataK;
reg_data1 *= reg_data0;
reg_data2 *= reg_data1;
...
reg_dataK *= reg_data(K-1);
}
i = i >> 31;
buffer_out.data[0 * i] = reg_data0;
buffer_out.data[1 * i] = reg_data1;
...
buffer_out.data[K * i] = reg_dataK;
Differential Revision: D59405012
Reviewed By: SS-JIA
0 commit comments