Skip to content

[utils][tests] Adjust timeout-hang.py tolerances #142089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hubert-reinterpretcast
Copy link
Collaborator

The subject test sporadically fails on the AIX builder: https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to signal a hang. This allows for a more generous "hard" timeout value, which allows observation of cases that take longer to finish despite not hanging.

The subject test sporadically fails on the AIX builder:
https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high
load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to
signal a hang. This allows for a more generous "hard" timeout value,
which allows observation of cases that take longer to finish despite not
hanging.
@llvmbot
Copy link
Member

llvmbot commented May 30, 2025

@llvm/pr-subscribers-testing-tools

Author: Hubert Tong (hubert-reinterpretcast)

Changes

The subject test sporadically fails on the AIX builder: https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to signal a hang. This allows for a more generous "hard" timeout value, which allows observation of cases that take longer to finish despite not hanging.


Full diff: https://github.com/llvm/llvm-project/pull/142089.diff

1 Files Affected:

  • (modified) llvm/utils/lit/tests/timeout-hang.py (+7-6)
diff --git a/llvm/utils/lit/tests/timeout-hang.py b/llvm/utils/lit/tests/timeout-hang.py
index 486f07983708f..4c4bccd670f73 100644
--- a/llvm/utils/lit/tests/timeout-hang.py
+++ b/llvm/utils/lit/tests/timeout-hang.py
@@ -8,20 +8,21 @@
 # throwing an exception. We expect this to fail immediately, rather than
 # timeout.
 
-# DEFINE: %{timeout}=1
+# DEFINE: %{grace_period}=5
+# DEFINE: %{hard_timeout}=15
 
 # RUN: not %{lit} %{inputs}/timeout-hang/run-nonexistent.txt \
-# RUN: --timeout=%{timeout} --param external=0 | %{python} %s %{timeout}
+# RUN: --timeout=%{hard_timeout} --param external=0 | %{python} %s %{grace_period}
 
 import sys
 import re
 
-timeout_time = float(sys.argv[1])
+grace_time = float(sys.argv[1])
 testing_time = float(re.search(r"Testing Time: (.*)s", sys.stdin.read()).group(1))
 
-if testing_time < timeout_time:
-    print("Testing took less than timeout")
+if testing_time <= grace_time:
+    print("Testing finished within the grace period")
     sys.exit(0)
 else:
-    print("Testing took as long or longer than timeout")
+    print("Testing took {}s, which is beyond the grace period of {}s".format(testing_time, grace_time))
     sys.exit(1)

This comment was marked as resolved.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the pain of dealing with these, and trying to explain them to folks that sometimes real, actual time is important to computers :)

I had never thought to handle it this way though.

If lit were to hang trying to execute the non-existent program: this all takes 15 seconds, this test fails
If lit immediately returns: it takes significantly less than 15 seconds, this test passes (it will take more or less time depending on when the parent lit process gets cpu time)

@DavidSpickett
Copy link
Collaborator

This appears to be an environment issue potentially connected to high load because the problem is not observed on other AIX machines.

My experience has been that this is the case. We had a lot of problems with the GoogleTest timeout test, though they have calmed down recently. lit would be told to apply a timeout of N seconds, but not get CPU time for N+1 seconds so it couldn't cancel it until it was too late.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this.

@hubert-reinterpretcast hubert-reinterpretcast merged commit 24f432d into main Jun 3, 2025
11 checks passed
@hubert-reinterpretcast hubert-reinterpretcast deleted the users/hubert-reinterpretcast/timeout-hang branch June 3, 2025 00:20
sallto pushed a commit to sallto/llvm-project that referenced this pull request Jun 3, 2025
The subject test sporadically fails on the AIX builder:
https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high
load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to
signal a hang. This allows for a more generous "hard" timeout value,
which allows observation of cases that take longer to finish despite not
hanging.
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
The subject test sporadically fails on the AIX builder:
https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high
load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to
signal a hang. This allows for a more generous "hard" timeout value,
which allows observation of cases that take longer to finish despite not
hanging.
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
The subject test sporadically fails on the AIX builder:
https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high
load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to
signal a hang. This allows for a more generous "hard" timeout value,
which allows observation of cases that take longer to finish despite not
hanging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants