Skip to content

[CI] Extend metrics container to log BuildKite metrics #130996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 14, 2025

Conversation

Keenuts
Copy link
Contributor

@Keenuts Keenuts commented Mar 12, 2025

The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same.

Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison.

BuildKite API allows filtering, but doesn't allow changing the result ordering. Meaning we are left with builds ordered by IDs. This means a completed job can appear before a running job in the list. 2 solutions from there:

  • keep the cursor on the oldest running workflow
  • keep a list of running workflows to compare.

Because there is no guarantees in workflow ordering, waiting for the oldest build to complete before reporting any newer build could mean delaying the more recent build completion reporting by a few hours. And because grafana cannot ingest metrics older than 2 hours, this is not an option.

Thus we leave with the second solution: remember what jobs were running during the last iteration, and record them as soon as they are completed. Buildkite has at most ~100 pending jobs, so keeping all those IDs should be OK.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just a couple nits.

Keenuts added 2 commits March 13, 2025 10:33
The current container focuses on Github metrics. Before deprecating BuildKite,
we want to make sure the new infra quality is better, or at least the same.

Being able to compare buildkite metrics with github metrics on grafana will
allow us to easily present the comparison.

BuildKite API allows filtering, but doesn't allow changing the result
ordering. Meaning we are left with builds ordered by IDs.
This means a completed job can appear before a running job in the list.
2 solutions from there:
 - keep the cursor on the oldest running workflow
 - keep a list of running workflows to compare.

Because there is no guarantees in workflow ordering, waiting for the
oldest build to complete before reporting any newer build could mean
delaying the more recent build completion reporting by a few hours.
And because grafana cannot ingest metrics older than 2 hours, this is
not an option.

Thus we leave with the second solution: remember what jobs were running
during the last iteration, and record them as soon as they are
completed. Buildkite has at most ~100 pending jobs, so keeping all those
IDs should be OK.
@Keenuts
Copy link
Contributor Author

Keenuts commented Mar 13, 2025

Thanks applied all comments.
Also, from letting it run tonight I learned buildkite can return None dates is a job is cancelled before it started for ex. Added conditions to handle this.

@boomanaiden154
Copy link
Contributor

Also, from letting it run tonight I learned buildkite can return None dates is a job is cancelled before it started for ex. Added conditions to handle this.

Seems like we'll just have to run it to see what all the edge cases are. Metrics on the dashboard look pretty good though!

@Keenuts Keenuts merged commit 44f4e43 into llvm:main Mar 14, 2025
11 checks passed
@Keenuts Keenuts deleted the buildkite branch March 14, 2025 10:44
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
The current container focuses on Github metrics. Before deprecating
BuildKite, we want to make sure the new infra quality is better, or at
least the same.

Being able to compare buildkite metrics with github metrics on grafana
will allow us to easily present the comparison.

BuildKite API allows filtering, but doesn't allow changing the result
ordering. Meaning we are left with builds ordered by IDs. This means a
completed job can appear before a running job in the list. 2 solutions
from there:
 - keep the cursor on the oldest running workflow
 - keep a list of running workflows to compare.

Because there is no guarantees in workflow ordering, waiting for the
oldest build to complete before reporting any newer build could mean
delaying the more recent build completion reporting by a few hours. And
because grafana cannot ingest metrics older than 2 hours, this is not an
option.

Thus we leave with the second solution: remember what jobs were running
during the last iteration, and record them as soon as they are
completed. Buildkite has at most ~100 pending jobs, so keeping all those
IDs should be OK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants