Skip to content

Fix for short runs inside (batch)triggerAndWait #1263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 20, 2024

Conversation

matt-aitken
Copy link
Member

If short runs were inside triggerAndWait or batchTriggerAndWait there was a race condition where a checkpoint wouldn't be created in time. In some cases the run wasn't running in the cluster anymore and they got stuck frozen forever.

Changes

  • Don't attempt to continue these runs in the cluster if there's no checkpoint.
  • When we create the checkpoint try and continue these runs (they won't continue if the sub-runs aren't finished).
  • Remove some code that was failing attempts to prevent infinite recursion. It was causing errors in certain conditions where runs would have otherwise succeeded.

Copy link

changeset-bot bot commented Aug 20, 2024

🦋 Changeset detected

Latest commit: 17994f0

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@matt-aitken matt-aitken merged commit 0591db5 into main Aug 20, 2024
2 checks passed
@matt-aitken matt-aitken deleted the fix-triggerandwait-races branch August 20, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant