Skip to content

Update NGF documentation on prometheus metrics #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bjee19
Copy link
Contributor

@bjee19 bjee19 commented Mar 3, 2025

Update documentation on prometheus metrics.

Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard.

Solution: Update the metrics.

Testing: Ran make watch and it looks good.

Checklist

Before merging a pull request, run through this checklist and mark each as complete.

  • [x ] I have read the contributing guidelines
  • I have signed the F5 Contributor License Agreement (CLA)
  • I have rebased my branch onto main
  • I have ensured my PR is targeting the main branch and pulling from my branch from my own fork
  • I have ensured that the commit messages adhere to Conventional Commits
  • I have ensured that documentation content adheres to the style guide
  • If the change involves potentially sensitive changes1, I have assessed the possible impact
  • If applicable, I have added tests that prove my fix is effective or that my feature works
  • I have ensured that existing tests pass after adding my changes
  • If applicable, I have updated README.md and CHANGELOG.md

Footnotes

  1. Potentially sensitive changes include anything involving code, personally identify information (PII), live URLs or significant amounts of new or revised documentation. Please refer to our style guide for guidance about placeholder content.

@bjee19 bjee19 requested a review from a team as a code owner March 3, 2025 22:55
@bjee19 bjee19 requested a review from a team March 3, 2025 22:55
@bjee19
Copy link
Contributor Author

bjee19 commented Mar 3, 2025

Screenshot 2025-03-03 at 2 10 18 PM

Here's what the grafana dashboard looks like.

@JTorreG JTorreG added the product/ngf Issues related to NGINX Gateway Fabric label Mar 4, 2025
Copy link
Contributor

@sjberman sjberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things:

  • The graph seems to have a lot of duplicate entries for the same IP address. Any idea why?
  • Seems like there are way fewer metrics than we had before. Do you know if they have been squashed together? For example, I don't see any of the server_zone metrics here, and I'm wondering if they're now part of a different metric.

@bjee19
Copy link
Contributor Author

bjee19 commented Mar 4, 2025

Here's a snapshot of what the dashboard looks like now, not sure why it looks different, but this is what it should look like with the active connections and processed connections having those subcategories.

image

@bjee19
Copy link
Contributor Author

bjee19 commented Mar 4, 2025

@sjberman

The graph seems to have a lot of duplicate entries for the same IP address. Any idea why?

Not sure why, I had scaled up nginx instances, but in the total requests, there shouldn't be different lines for the same IP address. The new dashboard doesn't have those duplicates, perhaps as i was scaling up and down nginx replicas there was some weird interaction with how prometheus scrapes data and thus some duplicate entries appeared when an nginx instance uses the same ip address as before or something like that.

Seems like there are way fewer metrics than we had before. Do you know if they have been squashed together? For example, I don't see any of the server_zone metrics here, and I'm wondering if they're now part of a different metric.

Is this in the grafana dashboard or the prometheus docs?

@sjberman
Copy link
Contributor

sjberman commented Mar 4, 2025

@bjee19

Is this in the grafana dashboard or the prometheus docs?

This is in the prometheus docs, you can see the metrics in the prometheus-exporter docs as well. These metrics exist in the latest NGF release. Just wasn't sure if agent collapses metrics into each other.

@bjee19
Copy link
Contributor Author

bjee19 commented Mar 4, 2025

@sjberman

Ah you're talking about the NGINX OSS/Plus metrics that we used to have through the prometheus exporter. I'll double check to see if the metrics have been collapsed into otherones but from my understanding that is why I added this line:

NGINX Gateway Fabric currently supports a subset of all metrics available through NGINX OSS and Plus.

As maybe through agent because we don't write access logs, perhaps we can't get all the metrics we used to be able to get.

@sjberman
Copy link
Contributor

sjberman commented Mar 4, 2025

The server_zone metrics I was referring to were Plus API metrics, so nothing to do with access logs. I think those were just the HTTP status codes that you could see with OSS.

But I would hope that all Plus metrics are available as before.

@bjee19
Copy link
Contributor Author

bjee19 commented Mar 4, 2025

@sjberman

I'm not gonna go fully in depth but I think that you are right that at least some of the server_zone metrics are available, but are condensed down into other metrics.

For example:

nginx_http_requests_total outputs this:

nginx_http_requests_total{app_kubernetes_io_instance="my-release", app_kubernetes_io_managed_by="my-release-nginx", app_kubernetes_io_name="gateway-nginx", gateway_networking_k8s_io_gateway_name="gateway", instance="10.244.0.25:9113", instance_id="e8d1bda6-397e-3b98-a179-e500ff99fbc7", instance_type="nginxplus", job="kubernetes-pods", namespace="default", nginx_zone_name="cafe.example.com", nginx_zone_type="SERVER", node="kind-control-plane", pod="gateway-nginx-56d765d857-vwtjv", pod_template_hash="56d765d857", resource_id="27112175-b9d2-3670-886c-6f0eb01ee3a5"} -> 0

which includes zone name, zone type, and i would assume the output is the same as nginxplus_server_zone_requests from the NGINX Plus metric from the prometheus exporter.

Another example is:

nginx_http_response_status_responses_total outputs this:

nginx_http_response_status_responses_total{app_kubernetes_io_instance="my-release", app_kubernetes_io_managed_by="my-release-nginx", app_kubernetes_io_name="gateway-nginx", gateway_networking_k8s_io_gateway_name="gateway", instance="10.244.0.25:9113", instance_id="e8d1bda6-397e-3b98-a179-e500ff99fbc7", instance_type="nginxplus", job="kubernetes-pods", namespace="default", nginx_status_range="2xx", nginx_zone_name="cafe.example.com", nginx_zone_type="SERVER", node="kind-control-plane", pod="gateway-nginx-56d765d857-vwtjv", pod_template_hash="56d765d857", resource_id="27112175-b9d2-3670-886c-6f0eb01ee3a5"}

with the status range going from 1xx -> 5xx

which matches up with nginxplus_server_zone_responses from the NGINX Plus metric from the prometheus exporter.

@sjberman
Copy link
Contributor

sjberman commented Mar 4, 2025

@bjee19 I did just find these two documents from agent:

Does this jive with what you see? I would say we could just link to these docs, but right now they're on the feature branch in in the internal directory, so we'll see.

@bjee19
Copy link
Contributor Author

bjee19 commented Mar 4, 2025

@sjberman

Yea they are pretty much the same, there are a couple ones which we don't seem to have such as nginx.cache.*, but everything else we get. I'd be in support of linking to it also, perhaps in a revision in the future.

notably nothing specific to server_zone

Copy link
Contributor

@ADubhlaoich ADubhlaoich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left an edit suggestion for the one sentence in question.

Copy link
Contributor

@ADubhlaoich ADubhlaoich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM, just had some additional non-blocking suggestions.

@bjee19 bjee19 changed the title Update documentation on prometheus metrics Update NGF documentation on prometheus metrics Mar 5, 2025
@bjee19 bjee19 merged commit 2f554a2 into nginx:ngf-feature-cp-dp-split Mar 5, 2025
5 checks passed
bjee19 added a commit that referenced this pull request Apr 28, 2025
Update documentation on prometheus metrics.

Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard.

Solution: Update the metrics.


* Add feedback
bjee19 added a commit to bjee19/documentation that referenced this pull request Apr 28, 2025
Update documentation on prometheus metrics.

Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard.

Solution: Update the metrics.


* Add feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/ngf Issues related to NGINX Gateway Fabric
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants