Skip to content

[SYCL] persistent cache fix - directory creation and reporting improvements #13019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cperkinsintel
Copy link
Contributor

@cperkinsintel cperkinsintel commented Mar 14, 2024

We have a report of persistent cache failures. Traced to the directory creation so I switched it to use C++17 std::filesystem routines for OSUtil::makeDir. Also improved trace reporting.

AlexeySachkov
AlexeySachkov previously approved these changes Mar 14, 2024
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I'm ok to merge this patch, because it improves error reporting, which is very helpful for debugging and it does not make a situation worse, whilst being a very small fix.

I don't think that it fully resolves all the problems we have:

  • using files for thread-safety doesn't seem to work, at least not in our implementation. There should either be an in-memory atomic lock, or we should have a wait loop to further iterate on i on a failed attempt to acquire a lock
  • we ignore any "extra" device images, i.e. our hash collision mechanism doesn't work

@cperkinsintel cperkinsintel marked this pull request as ready for review March 14, 2024 16:00
@cperkinsintel cperkinsintel requested a review from a team as a code owner March 14, 2024 16:00
@cperkinsintel cperkinsintel marked this pull request as draft March 14, 2024 16:22
@cperkinsintel cperkinsintel changed the title [SYCL] blind fix for persistent cache lock and minor err report improvement [SYCL] persistent cache fix - directory creation and reporting improvements Mar 16, 2024
@cperkinsintel cperkinsintel dismissed AlexeySachkov’s stale review March 16, 2024 01:42

different direction

@cperkinsintel cperkinsintel marked this pull request as ready for review March 27, 2024 04:39
Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

int OSUtil::makeDir(const char *Dir) {
assert((Dir != nullptr) && "Passed null-pointer as directory name.");
if (isPathPresent(Dir))
return 0;

// older GCC doesn't have full C++ 17 support.
#if __GNUC__ && __GNUC__ < 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw in this case as well, to align the two branches?

Copy link
Contributor Author

@cperkinsintel cperkinsintel Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the signature of this function leads to an ABI break. Since we need to keep this old stuff anyway, I was thinking to keep it as is.
My preference would be to change the signature, throw always, and drop GCC < 8 support, but that decision isn't mine to make.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the signature as is and still throw when we would have returned a non-zero code. I don't think we're handling non-zero codes returned from this anyway, are we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::fileystem::create_directories throws a std::filesystem::filesystem_error but the "old" code doesn't have std::filesystem so can't throw that. I'm modifying it to throw std::runtime_error

@againull againull merged commit f6e73e8 into intel:sycl Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants