-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Fix computation of the submit time based on host timestamps #12104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
getCurrentDeviceTime uses the following formula to calculate device time: [base device timestamp] + ([current host timestamp] - [base host timestamp]) Host time stamps are queried using std::chrono. Base host timestamp is updated in two cases: 1. The first call to getCurrentDeviceTime. 2. When refresh is needed. Problem is that currently we remember base host timestamp at the wrong moment: there is a large gap between the point when we query and remember base host timestamp and the point when we get device timestamp from plugin, so ([current host timestamp] - [base host timestamp]) includes execution time of things like getPlugin() which may be significant, especially on the first call when plugins initialization happen. As a result we add incorrect difference to the base device time and calculated submission time is incorrect, it is sometimes greater than start time. This patch fixes that problem by querying base host time properly after piGetDeviceAndHostTimer call.
For some reasons it seems that problem is still not fixed on gen12 (even though on pvc fix is working fine), turned this into a draft to investigate. |
I verified that this fix is correct. So problem is fixed for opencl backend and for L0 backend on pvc. But there is a bug in L0 runtime and it reports incorrect device datestamps on some devices (gen12/dg*), created a tracker for this. That's why disabled test on all devices except pvc for L0 for now. |
@steffenlarsen Could you please take a look? Thank you in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good. Just a couple of nits.
sycl/source/detail/device_impl.cpp
Outdated
// going to be used for calculation of the device timestamp at the next | ||
// getCurrentDeviceTime() call. We need to do it here because getPlugin() | ||
// and piGetDeviceAndHostTimer calls may take significant amount of time, | ||
// for example on the first call to getPlugin plugins may beed to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// for example on the first call to getPlugin plugins may beed to be | |
// for example on the first call to getPlugin plugins may need to be |
// RUN: %{run} %t.out | ||
// There is an issue with reported device time for the L0 backend, works only on | ||
// pvc for now. No such problems for other backends. | ||
// RUN: %if (!ext_oneapi_level_zero || gpu-intel-pvc) %{ %{run} %t.out %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it a REQUIRES
instead? Seems like it would be better to just skip it. I don't think compile-only testing benefits us in E2E.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with REQUIRES at first, but for some reasons test is still executed if I do this. That's why changed to this approach.
auto start = | ||
event.get_profiling_info<sycl::info::event_profiling::command_start>(); | ||
auto end = | ||
event.get_profiling_info<sycl::info::event_profiling::command_end>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; We try to discourage the use of using namespace sycl;
so maybe we should remove that instead of removing the qualifications... Just to be good role models. 😉
[=](sycl::id<1> idx) {}); | ||
}); | ||
queue q({property::queue::enable_profiling{}}); | ||
int *data = malloc_host<int>(1024, q); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaks.
sycl/test-e2e/Basic/submit_time.cpp
Outdated
|
||
if (!(submit_time <= start_time) || !(start_time <= end_time)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!(submit_time <= start_time) || !(start_time <= end_time)) | |
if ((submit_time > start_time) || (start_time > end_time)) |
getCurrentDeviceTime uses the following formula to calculate device time:
[base device timestamp] + ([current host timestamp] - [base host timestamp]).
Host time stamps are queried using std::chrono. And base host timestamp is updated in two cases:
Problem is that currently we remember base host timestamp at the wrong moment: there is a large gap between the point when we query and remember base host timestamp and the point when we get device timestamp from plugin, so ([current host timestamp] - [base host timestamp]) includes execution time of things like getPlugin() which may be significant, especially on the first call when plugins initialization happen.
As a result we add incorrect difference to the base device time and calculated submission time is incorrect, it is sometimes greater than command_start time.
This patch fixes that problem by querying base host time properly after piGetDeviceAndHostTimer call.