Skip to content

[SYCL] Improve sycl-post-link performance with -split=kernel #6689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 7, 2022
Merged

[SYCL] Improve sycl-post-link performance with -split=kernel #6689

merged 1 commit into from
Sep 7, 2022

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented Sep 1, 2022

Right now we are computing a new callgraph in every call to extractCallGraph. extractCallGraph is called every time we do a module split, so for -split=kernel, that would be once per kernel. For modules with many kernels, this can take a very long time. We only need to compute this once because the input IR doesn't seem to change between splits.

This improves performance of sycl-post-link from ~45min to ~7min for an example with 13k kernels

Signed-off-by: Sarnie, Nick [email protected]

@sarnex
Copy link
Contributor Author

sarnex commented Sep 1, 2022

/summary:run

@sarnex sarnex marked this pull request as ready for review September 1, 2022 20:48
@sarnex sarnex requested a review from a team as a code owner September 1, 2022 20:48
@asudarsa asudarsa requested a review from kbobrovs September 1, 2022 20:51
@asudarsa
Copy link
Contributor

asudarsa commented Sep 1, 2022

Hi @kbobrovs, can you please take a look? Thanks.

@asudarsa
Copy link
Contributor

asudarsa commented Sep 1, 2022

It will be great if we can add a test here. Not sure if we have any existing test that track 'compile time'.

@AlexeySachkov
Copy link
Contributor

It will be great if we can add a test here. Not sure if we have any existing test that track 'compile time'.

I suppose that this is a non-functional change, which shouldn't require a test. I also doubt that we will be able to write proper 'compile time' test, or at least it should go into a separate suite of performance tests. The problem is that such tests can't be launched in a regular environment or they will be prone to fluctuations caused by many factors, like: having more than one test launched in parallel, having a misconfigured machine which uses tubro boost or operates on a non-fixed CPU frequency, etc.

@sarnex
Copy link
Contributor Author

sarnex commented Sep 2, 2022

I agree with Alexey, it would be very difficult to write a unit test that isn't super flaky. I'll reach out @asudarsa offline to see if I can connect with some quality engineers to see if we have a testing environment for this

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch!

@AlexeySachkov
Copy link
Contributor

CUDA testing seems to fail due to infra issues, but since this patch is effectively a non-functional change, I will go ahead and merge it

@AlexeySachkov AlexeySachkov merged commit 84de9d6 into intel:sycl Sep 7, 2022
@sarnex
Copy link
Contributor Author

sarnex commented Sep 7, 2022

Thanks! @AlexeySachkov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants