-
Notifications
You must be signed in to change notification settings - Fork 654
Gracefully handle IOException on concurrent cache writes #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully handle IOException on concurrent cache writes #920
Conversation
} | ||
catch (IOException) | ||
{ | ||
Logger.WriteInfo("I/O exception during cache write"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency IMHO this should write a message with warning level, followd by detailed exception message as info level, as e.g. done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that a warning result in an msbuild warning being raised? If that is the case I would be careful about using it for this behavior since some CI build systems fail the build on warnings and this is not something that should impact the build in my opinion. But I could be wrong about the msbuild warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. @asbjornu @JakeGinnivan Do you know the impact of this in CI build systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pascalberger I've suggested handling this with locking instead, so we avoid the exceptions altogether. I have not experienced CI failing on warnings, but that's possible to configure, I guess.
@simonejsing Thanks for this! Do you think it's possible to handle this via some sort of locking instead? Also, do you think it's possible to write a test that verifies the intended and wanted behavior? |
Yes, implementing locking makes sense. But also in this case exceptions which are still thrown while trying to write to the cache for any reason, should be handled and logged and not lead to a failure IMHO. |
@pascalberger I agree that exceptions should be handled, but since in a locking scenario we're not really expecting anything to fail, I think the exceptions that do occur should be written out as a warning. |
@asbjornu @pascalberger - I initially thought about locking as well, but for multi-process synchronization I would need to use a named mutex. Besides requiring a fairly tricky implementation to get it right, it also adds a performance overhead on the happy path. It also leads to all sorts of considerations to take into account, e.g. how do we scope the locking to be at the repo level to avoid locking GitVersion across all repos. After discussing with a coworker, he brought up the point that a widely used practice for concurrent file system operations is to use exponential backoff. This is for example used by the C# compiler with a 30 second max wait. But at this point we agreed that for a caching scenario where the consequence of write failure is that another process will handle the retry then we should simply ignore a cache write. Now that being said, I would highly suggest the exponential backoff over named mutex locking if you disagree with out train of thoughts. Let me know what you think. |
@simonejsing Excellent points. I agree that a named mutex is too much of an overhead on execution time, code complexity and how long it will take to implement. I like the idea of an exponential backoff, though, so if you can implement something like that, it would be neat. Even neater if you could adjust the code wrt. logging as @pascalberger suggests and add at least one test that provokes the race condition. |
8c3e9ac
to
1fc50de
Compare
@asbjornu @pascalberger Implemented exponential backoff with unit tests. Please review. |
I get the following build error. That error message is not very helpful, so any insights would be nice... Error : /home/travis/build/GitTools/GitVersion/src/GitVersionTask.Tests/bin/Debug/GitVersionTask.Tests.dll 7850The object with ID 2 implements the IObjectReference interface for which all dependencies cannot be resolved. The likely cause is two instances of IObjectReference that have a mutual dependency on each other. 7851 at (wrapper managed-to-native) System.Object:__icall_wrapper_mono_remoting_wrapper (intptr,intptr) 7852 at (wrapper remoting-invoke) NUnit.Engine.Agents.RemoteTestAgent:Run (NUnit.Engine.ITestEventListener,NUnit.Engine.TestFilter) 7853 at NUnit.Engine.Runners.ProcessRunner.RunTests (ITestEventListener listener, NUnit.Engine.TestFilter filter) <0x413be5d0 + 0x00042> in :0 |
@simonejsing I restarted the build. Everything is green now. 👍 |
Thanks! |
} | ||
} | ||
|
||
ThreadSleep.Sleep(sleepMSec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to log here how long it is going to sleep for, that way if it is failing multiple times the user can see why it's taking so long.
…h 6 retries and 31.5 seconds total timeout. Switched exception handling to catch AggregateException now since that is what is thrown in cache write timeout.
d362a92
to
83becc0
Compare
@JakeGinnivan addressed your feedback and rebase onto latest master to avoid the merge commits. |
Awesome, thanks for the PR and making that change so quickly |
Gracefully handles concurrent writes to the disk cache by catching IO exceptions, since missing a cache write is no big deal, and the most likely reason we missed the write is that another process is already writing it. In any case if a complete failure to write the cache happened for some reason, the next process would retry the operation. Worst case cache write fails consistently which would result in a performance hit.
Fixes #918.