Attempt to fix libc++ actions runner restarter. #120627

EricWF · 2024-12-19T19:41:58Z

It appears that introducing docker containers has broken the restarter
job since additional failure messages appear with the preemption
messages.

This should get jobs restarting on preemption again, but may do so
for jobs that also contain unrelated failures

It appears that introducing docker containers has broken the restarter job since additional failure messages appear with the preemption messages. This should get jobs restarting on preemption again, but may do so for jobs that also contain unrelated failures

llvmbot · 2024-12-19T19:42:31Z

@llvm/pr-subscribers-github-workflow

@llvm/pr-subscribers-libcxx

Author: Eric (EricWF)

Changes

It appears that introducing docker containers has broken the restarter
job since additional failure messages appear with the preemption
messages.

This should get jobs restarting on preemption again, but may do so
for jobs that also contain unrelated failures

Full diff: https://github.com/llvm/llvm-project/pull/120627.diff

1 Files Affected:

(modified) .github/workflows/libcxx-restart-preempted-jobs.yaml (+30-7)

diff --git a/.github/workflows/libcxx-restart-preempted-jobs.yaml b/.github/workflows/libcxx-restart-preempted-jobs.yaml
index 82d84c01c92af2..b27debd0e6fe71 100644
--- a/.github/workflows/libcxx-restart-preempted-jobs.yaml
+++ b/.github/workflows/libcxx-restart-preempted-jobs.yaml
@@ -92,6 +92,12 @@ jobs:
                 check_run_id: check_run_id
               })
 
+              // For temporary debugging purposes to see the structure of the annotations.
+              console.print(annotations);
+
+              has_failed_job = false;
+              saved_failure_message = null;
+
               for (annotation of annotations.data) {
                 if (annotation.annotation_level != 'failure') {
                   continue;
@@ -106,15 +112,32 @@ jobs:
 
                 const failure_match = annotation.message.match(failure_regex);
                 if (failure_match != null) {
-                  // We only want to restart the workflow if all of the failures were due to preemption.
-                  // We don't want to restart the workflow if there were other failures.
-                  core.notice('Choosing not to rerun workflow because we found a non-preemption failure' +
-                    'Failure message: "' + annotation.message + '"');
-                  await create_check_run('skipped', 'Choosing not to rerun workflow because we found a non-preemption failure\n'
-                    + 'Failure message: ' + annotation.message)
-                  return;
+                  has_failed_job = true;
+                  saved_failure_message = annotation.message;
                 }
               }
+              if (has_failed_job and not has_preempted_job) {
+                // We only want to restart the workflow if all of the failures were due to preemption.
+                // We don't want to restart the workflow if there were other failures.
+                //
+                // However, libcxx runners running inside docker containers produce both a preemption message and failure message.
+                //
+                // The desired approach is to ignore failure messages which appear on the same job as a preemption message
+                // (An job is a single run with a specific configuration, ex generic-gcc, gcc-14).
+                //
+                // However, it's unclear that this code achieves the desired approach, and it may ignore all failures
+                // if a preemption message is found at all on any run.
+                //
+                // For now, it's more important to restart preempted workflows than to avoid restarting workflows with
+                // non-preemption failures.
+                //
+                // TODO Figure this out.
+                core.notice('Choosing not to rerun workflow because we found a non-preemption failure' +
+                  'Failure message: "' + saved_failure_message + '"');
+                await create_check_run('skipped', 'Choosing not to rerun workflow because we found a non-preemption failure\n'
+                    + 'Failure message: ' + saved_failure_message)
+                return;
+              }
             }
 
             if (!has_preempted_job) {

EricWF · 2024-12-19T19:43:32Z

Note: Due to the difficulty of testing, this hasn't been meaningfully tested.

ldionne

The fact that we can't test that without pushing to main is horrible, but I don't see how we can best make progress on this issue otherwise.

If you push this, please carefully monitor the CI state for a bit cause this can easily cause unintended instability and we'd rather have something broken-but-in-a-state-we-understand over the holidays than something broken-in-new-fun-ways.

llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. github:workflow labels Dec 19, 2024

EricWF requested a review from ldionne December 19, 2024 19:42

You're not writing python Harry

f50c755

ldionne approved these changes Dec 19, 2024

View reviewed changes

EricWF merged commit 59850c2 into llvm:main Jan 21, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attempt to fix libc++ actions runner restarter. #120627

Attempt to fix libc++ actions runner restarter. #120627

Uh oh!

EricWF commented Dec 19, 2024

Uh oh!

llvmbot commented Dec 19, 2024 •

edited

Loading

Uh oh!

EricWF commented Dec 19, 2024

Uh oh!

ldionne left a comment

Uh oh!

Uh oh!

Uh oh!

Attempt to fix libc++ actions runner restarter. #120627

Attempt to fix libc++ actions runner restarter. #120627

Uh oh!

Conversation

EricWF commented Dec 19, 2024

Uh oh!

llvmbot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EricWF commented Dec 19, 2024

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Dec 19, 2024 •

edited

Loading