-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Attempt to fix libc++ actions runner restarter. #120627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It appears that introducing docker containers has broken the restarter job since additional failure messages appear with the preemption messages. This should get jobs restarting on preemption again, but may do so for jobs that also contain unrelated failures
@llvm/pr-subscribers-github-workflow @llvm/pr-subscribers-libcxx Author: Eric (EricWF) ChangesIt appears that introducing docker containers has broken the restarter This should get jobs restarting on preemption again, but may do so Full diff: https://github.com/llvm/llvm-project/pull/120627.diff 1 Files Affected:
diff --git a/.github/workflows/libcxx-restart-preempted-jobs.yaml b/.github/workflows/libcxx-restart-preempted-jobs.yaml
index 82d84c01c92af2..b27debd0e6fe71 100644
--- a/.github/workflows/libcxx-restart-preempted-jobs.yaml
+++ b/.github/workflows/libcxx-restart-preempted-jobs.yaml
@@ -92,6 +92,12 @@ jobs:
check_run_id: check_run_id
})
+ // For temporary debugging purposes to see the structure of the annotations.
+ console.print(annotations);
+
+ has_failed_job = false;
+ saved_failure_message = null;
+
for (annotation of annotations.data) {
if (annotation.annotation_level != 'failure') {
continue;
@@ -106,15 +112,32 @@ jobs:
const failure_match = annotation.message.match(failure_regex);
if (failure_match != null) {
- // We only want to restart the workflow if all of the failures were due to preemption.
- // We don't want to restart the workflow if there were other failures.
- core.notice('Choosing not to rerun workflow because we found a non-preemption failure' +
- 'Failure message: "' + annotation.message + '"');
- await create_check_run('skipped', 'Choosing not to rerun workflow because we found a non-preemption failure\n'
- + 'Failure message: ' + annotation.message)
- return;
+ has_failed_job = true;
+ saved_failure_message = annotation.message;
}
}
+ if (has_failed_job and not has_preempted_job) {
+ // We only want to restart the workflow if all of the failures were due to preemption.
+ // We don't want to restart the workflow if there were other failures.
+ //
+ // However, libcxx runners running inside docker containers produce both a preemption message and failure message.
+ //
+ // The desired approach is to ignore failure messages which appear on the same job as a preemption message
+ // (An job is a single run with a specific configuration, ex generic-gcc, gcc-14).
+ //
+ // However, it's unclear that this code achieves the desired approach, and it may ignore all failures
+ // if a preemption message is found at all on any run.
+ //
+ // For now, it's more important to restart preempted workflows than to avoid restarting workflows with
+ // non-preemption failures.
+ //
+ // TODO Figure this out.
+ core.notice('Choosing not to rerun workflow because we found a non-preemption failure' +
+ 'Failure message: "' + saved_failure_message + '"');
+ await create_check_run('skipped', 'Choosing not to rerun workflow because we found a non-preemption failure\n'
+ + 'Failure message: ' + saved_failure_message)
+ return;
+ }
}
if (!has_preempted_job) {
|
Note: Due to the difficulty of testing, this hasn't been meaningfully tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we can't test that without pushing to main
is horrible, but I don't see how we can best make progress on this issue otherwise.
If you push this, please carefully monitor the CI state for a bit cause this can easily cause unintended instability and we'd rather have something broken-but-in-a-state-we-understand over the holidays than something broken-in-new-fun-ways.
It appears that introducing docker containers has broken the restarter
job since additional failure messages appear with the preemption
messages.
This should get jobs restarting on preemption again, but may do so
for jobs that also contain unrelated failures