Skip to content

[CI] Add queue size, running count metrics #122714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 16, 2025
Merged

Conversation

Keenuts
Copy link
Contributor

@Keenuts Keenuts commented Jan 13, 2025

This commits allows the container to report 3 additional metrics at
every sampling event:

  • a heartbeat
  • the size of the workflow queue (filtered)
  • the number of running workflows (filtered)

The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.

In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.

@Keenuts
Copy link
Contributor Author

Keenuts commented Jan 13, 2025

Pending on #122708

Copy link

github-actions bot commented Jan 13, 2025

✅ With the latest revision this PR passed the Python code formatter.

@Keenuts Keenuts force-pushed the metrics-queue-size branch 6 times, most recently from 0632089 to 9b8c3c4 Compare January 13, 2025 16:38
@Keenuts
Copy link
Contributor Author

Keenuts commented Jan 13, 2025

Tested locally by setting the 3 env variables, new metrics are showing in Grafana

$ export GRAFANA_METRICS_USERID=XXX
$ export GRAFANA_API_KEY=XXX
$ export GITHUB_TOKEN=XXX
$ python3 metrics.py
Uploaded 7 metrics

@Keenuts Keenuts marked this pull request as ready for review January 13, 2025 16:39
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

This commits allows the container to report 3 additional metrics at
every sampling event:
- a heartbeat
- the size of the workflow queue (filtered)
- the number of running workflows (filtered)

The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.

In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.

Signed-off-by: Nathan Gauër <[email protected]>
@Keenuts Keenuts force-pushed the metrics-queue-size branch from 9b8c3c4 to 07fb21b Compare January 14, 2025 10:20
@Keenuts Keenuts merged commit 13b4428 into llvm:main Jan 16, 2025
8 checks passed
@Keenuts Keenuts deleted the metrics-queue-size branch January 16, 2025 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants