-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CI] Extend metrics container to log BuildKite metrics #130996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, just a couple nits.
The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same. Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison. BuildKite API allows filtering, but doesn't allow changing the result ordering. Meaning we are left with builds ordered by IDs. This means a completed job can appear before a running job in the list. 2 solutions from there: - keep the cursor on the oldest running workflow - keep a list of running workflows to compare. Because there is no guarantees in workflow ordering, waiting for the oldest build to complete before reporting any newer build could mean delaying the more recent build completion reporting by a few hours. And because grafana cannot ingest metrics older than 2 hours, this is not an option. Thus we leave with the second solution: remember what jobs were running during the last iteration, and record them as soon as they are completed. Buildkite has at most ~100 pending jobs, so keeping all those IDs should be OK.
Thanks applied all comments. |
Seems like we'll just have to run it to see what all the edge cases are. Metrics on the dashboard look pretty good though! |
The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same. Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison. BuildKite API allows filtering, but doesn't allow changing the result ordering. Meaning we are left with builds ordered by IDs. This means a completed job can appear before a running job in the list. 2 solutions from there: - keep the cursor on the oldest running workflow - keep a list of running workflows to compare. Because there is no guarantees in workflow ordering, waiting for the oldest build to complete before reporting any newer build could mean delaying the more recent build completion reporting by a few hours. And because grafana cannot ingest metrics older than 2 hours, this is not an option. Thus we leave with the second solution: remember what jobs were running during the last iteration, and record them as soon as they are completed. Buildkite has at most ~100 pending jobs, so keeping all those IDs should be OK.
The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same.
Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison.
BuildKite API allows filtering, but doesn't allow changing the result ordering. Meaning we are left with builds ordered by IDs. This means a completed job can appear before a running job in the list. 2 solutions from there:
Because there is no guarantees in workflow ordering, waiting for the oldest build to complete before reporting any newer build could mean delaying the more recent build completion reporting by a few hours. And because grafana cannot ingest metrics older than 2 hours, this is not an option.
Thus we leave with the second solution: remember what jobs were running during the last iteration, and record them as soon as they are completed. Buildkite has at most ~100 pending jobs, so keeping all those IDs should be OK.