Update NGF documentation on prometheus metrics #249

bjee19 · 2025-03-03T22:55:47Z

Update documentation on prometheus metrics.

Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard.

Solution: Update the metrics.

Testing: Ran make watch and it looks good.

Checklist

Before merging a pull request, run through this checklist and mark each as complete.

[x ] I have read the contributing guidelines
I have signed the F5 Contributor License Agreement (CLA)
I have rebased my branch onto main
I have ensured my PR is targeting the main branch and pulling from my branch from my own fork
I have ensured that the commit messages adhere to Conventional Commits
I have ensured that documentation content adheres to the style guide
If the change involves potentially sensitive changes¹, I have assessed the possible impact
If applicable, I have added tests that prove my fix is effective or that my feature works
I have ensured that existing tests pass after adding my changes
If applicable, I have updated README.md and CHANGELOG.md

Potentially sensitive changes include anything involving code, personally identify information (PII), live URLs or significant amounts of new or revised documentation. Please refer to our style guide for guidance about placeholder content. ↩

bjee19 · 2025-03-03T22:56:28Z

Here's what the grafana dashboard looks like.

content/ngf/how-to/monitoring/prometheus.md

sjberman

Few things:

The graph seems to have a lot of duplicate entries for the same IP address. Any idea why?
Seems like there are way fewer metrics than we had before. Do you know if they have been squashed together? For example, I don't see any of the server_zone metrics here, and I'm wondering if they're now part of a different metric.

content/ngf/how-to/monitoring/prometheus.md

static/ngf/ngf-grafana-dashboard.json

bjee19 · 2025-03-04T19:32:27Z

Here's a snapshot of what the dashboard looks like now, not sure why it looks different, but this is what it should look like with the active connections and processed connections having those subcategories.

bjee19 · 2025-03-04T19:35:07Z

@sjberman

The graph seems to have a lot of duplicate entries for the same IP address. Any idea why?

Not sure why, I had scaled up nginx instances, but in the total requests, there shouldn't be different lines for the same IP address. The new dashboard doesn't have those duplicates, perhaps as i was scaling up and down nginx replicas there was some weird interaction with how prometheus scrapes data and thus some duplicate entries appeared when an nginx instance uses the same ip address as before or something like that.

Seems like there are way fewer metrics than we had before. Do you know if they have been squashed together? For example, I don't see any of the server_zone metrics here, and I'm wondering if they're now part of a different metric.

Is this in the grafana dashboard or the prometheus docs?

sjberman · 2025-03-04T19:37:01Z

@bjee19

Is this in the grafana dashboard or the prometheus docs?

This is in the prometheus docs, you can see the metrics in the prometheus-exporter docs as well. These metrics exist in the latest NGF release. Just wasn't sure if agent collapses metrics into each other.

bjee19 · 2025-03-04T19:41:44Z

@sjberman

Ah you're talking about the NGINX OSS/Plus metrics that we used to have through the prometheus exporter. I'll double check to see if the metrics have been collapsed into otherones but from my understanding that is why I added this line:

NGINX Gateway Fabric currently supports a subset of all metrics available through NGINX OSS and Plus.

As maybe through agent because we don't write access logs, perhaps we can't get all the metrics we used to be able to get.

sjberman · 2025-03-04T19:52:04Z

The server_zone metrics I was referring to were Plus API metrics, so nothing to do with access logs. I think those were just the HTTP status codes that you could see with OSS.

But I would hope that all Plus metrics are available as before.

bjee19 · 2025-03-04T21:39:03Z

@sjberman

I'm not gonna go fully in depth but I think that you are right that at least some of the server_zone metrics are available, but are condensed down into other metrics.

For example:

nginx_http_requests_total outputs this:

nginx_http_requests_total{app_kubernetes_io_instance="my-release", app_kubernetes_io_managed_by="my-release-nginx", app_kubernetes_io_name="gateway-nginx", gateway_networking_k8s_io_gateway_name="gateway", instance="10.244.0.25:9113", instance_id="e8d1bda6-397e-3b98-a179-e500ff99fbc7", instance_type="nginxplus", job="kubernetes-pods", namespace="default", nginx_zone_name="cafe.example.com", nginx_zone_type="SERVER", node="kind-control-plane", pod="gateway-nginx-56d765d857-vwtjv", pod_template_hash="56d765d857", resource_id="27112175-b9d2-3670-886c-6f0eb01ee3a5"} -> 0

which includes zone name, zone type, and i would assume the output is the same as nginxplus_server_zone_requests from the NGINX Plus metric from the prometheus exporter.

Another example is:

nginx_http_response_status_responses_total outputs this:

nginx_http_response_status_responses_total{app_kubernetes_io_instance="my-release", app_kubernetes_io_managed_by="my-release-nginx", app_kubernetes_io_name="gateway-nginx", gateway_networking_k8s_io_gateway_name="gateway", instance="10.244.0.25:9113", instance_id="e8d1bda6-397e-3b98-a179-e500ff99fbc7", instance_type="nginxplus", job="kubernetes-pods", namespace="default", nginx_status_range="2xx", nginx_zone_name="cafe.example.com", nginx_zone_type="SERVER", node="kind-control-plane", pod="gateway-nginx-56d765d857-vwtjv", pod_template_hash="56d765d857", resource_id="27112175-b9d2-3670-886c-6f0eb01ee3a5"}

with the status range going from 1xx -> 5xx

which matches up with nginxplus_server_zone_responses from the NGINX Plus metric from the prometheus exporter.

sjberman · 2025-03-04T21:52:33Z

@bjee19 I did just find these two documents from agent:

Does this jive with what you see? I would say we could just link to these docs, but right now they're on the feature branch in in the internal directory, so we'll see.

bjee19 · 2025-03-04T21:55:59Z

@sjberman

Yea they are pretty much the same, there are a couple ones which we don't seem to have such as nginx.cache.*, but everything else we get. I'd be in support of linking to it also, perhaps in a revision in the future.

notably nothing specific to server_zone

content/ngf/how-to/monitoring/prometheus.md

ADubhlaoich

LGTM. Left an edit suggestion for the one sentence in question.

ADubhlaoich

Still LGTM, just had some additional non-blocking suggestions.

content/ngf/how-to/monitoring/prometheus.md

Update documentation on prometheus metrics. Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard. Solution: Update the metrics. * Add feedback

Update documentation on prometheus metrics

0433bd5

bjee19 requested a review from a team as a code owner March 3, 2025 22:55

bjee19 requested a review from a team March 3, 2025 22:55

bjee19 commented Mar 3, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

salonichf5 reviewed Mar 3, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

JTorreG added the product/ngf Issues related to NGINX Gateway Fabric label Mar 4, 2025

sjberman reviewed Mar 4, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

static/ngf/ngf-grafana-dashboard.json Outdated Show resolved Hide resolved

bjee19 added 2 commits March 4, 2025 11:13

Update dashboard and fix small feedback

7ad87fc

Another update to dashboard

e816275

sjberman reviewed Mar 4, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

Correct documentation on namespace of metrics

5ddb904

sjberman approved these changes Mar 4, 2025

View reviewed changes

ADubhlaoich reviewed Mar 5, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

ADubhlaoich approved these changes Mar 5, 2025

View reviewed changes

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

content/ngf/how-to/monitoring/prometheus.md Outdated Show resolved Hide resolved

salonichf5 approved these changes Mar 5, 2025

View reviewed changes

bjee19 changed the title ~~Update documentation on prometheus metrics~~ Update NGF documentation on prometheus metrics Mar 5, 2025

Add feedback

e42310a

bjee19 merged commit 2f554a2 into nginx:ngf-feature-cp-dp-split Mar 5, 2025
5 checks passed

Update NGF documentation on prometheus metrics #249

Update NGF documentation on prometheus metrics #249

Uh oh!

Conversation

bjee19 commented Mar 3, 2025

Checklist

Footnotes

Uh oh!

bjee19 commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

sjberman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bjee19 commented Mar 4, 2025

Uh oh!

bjee19 commented Mar 4, 2025

Uh oh!

sjberman commented Mar 4, 2025

Uh oh!

bjee19 commented Mar 4, 2025

Uh oh!

sjberman commented Mar 4, 2025

Uh oh!

bjee19 commented Mar 4, 2025

Uh oh!

sjberman commented Mar 4, 2025

Uh oh!

bjee19 commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ADubhlaoich left a comment

Choose a reason for hiding this comment

Uh oh!

ADubhlaoich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bjee19 commented Mar 4, 2025 •

edited

Loading