@@ -7,27 +7,81 @@ This directory contains configuration for visualizing metrics from the metrics a
7
7
- ** Prometheus** : Collects and stores metrics from the service
8
8
- ** Grafana** : Provides visualization dashboards for the metrics
9
9
10
+ ## Topology
11
+
12
+ Default Service Relationship Diagram:
13
+ ```
14
+ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
15
+ │ nats-server │ │ etcd-server │ │dcgm-exporter│
16
+ │ :4222 │ │ :2379 │ │ :9400 │
17
+ │ :6222 │ │ :2380 │ │ │
18
+ │ :8222 │ │ │ │ │
19
+ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
20
+ │ │ │
21
+ │ :8222/varz │ :2379/metrics │ :9400/metrics
22
+ │ │ │
23
+ ▼ │ │
24
+ ┌─────────────┐ │ │
25
+ │nats-prom-exp│ │ │
26
+ │ :7777 │ │ │
27
+ │ │ │ │
28
+ │ /metrics │ │ │
29
+ └──────┬──────┘ │ │
30
+ │ │ │
31
+ │ :7777/metrics │ │
32
+ │ │ │
33
+ ▼ ▼ ▼
34
+ ┌─────────────────────────────────────────────────┐
35
+ │ prometheus │
36
+ │ :9090 │
37
+ │ │
38
+ │ scrapes: nats-prom-exp:7777/metrics │
39
+ │ etcd-server:2379/metrics │
40
+ │ dcgm-exporter:9400/metrics │
41
+ └──────────────────┬──────────────────────────────┘
42
+ │
43
+ │ :9090/query API
44
+ │
45
+ ▼
46
+ ┌─────────────┐
47
+ │ grafana │
48
+ │ :3001 │
49
+ │ │
50
+ └─────────────┘
51
+ ```
52
+
53
+ Networks:
54
+ - monitoring: nats-prom-exp, etcd-server, dcgm-exporter, prometheus, grafana
55
+ - default: nats-server (accessible via host network)
56
+
10
57
## Getting Started
11
58
12
59
1 . Make sure Docker and Docker Compose are installed on your system
13
60
14
- 2 . Start the ` components/metrics ` application to begin monitoring for metric events from dynamo workers
15
- and aggregating them on a prometheus metrics endpoint: ` http://localhost:9091/metrics ` .
61
+ 2 . Start the visualization stack:
16
62
17
- 3 . Start worker(s) that publishes KV Cache metrics.
18
- - For quick testing, ` examples/rust/service_metrics/bin/server.rs ` can populate dummy KV Cache metrics.
19
- - For a real workflow with real data, see the KV Routing example in ` examples/python_rs/llm/vllm ` .
63
+ ``` bash
64
+ docker compose --profile metrics up -d
65
+ ```
20
66
21
- 4 . Start the visualization stack:
67
+ 3 . Web servers started. The ones that end in /metrics are in Prometheus format:
68
+ - Grafana: ` http://localhost:3001 ` (default login: dynamo/dynamo)
69
+ - Prometheus Server: ` http://localhost:9090 `
70
+ - NATS Server: ` http://localhost:8222 ` (monitoring endpoints: /varz, /healthz, etc.)
71
+ - NATS Prometheus Exporter: ` http://localhost:7777/metrics `
72
+ - etcd Server: ` http://localhost:2379/metrics `
73
+ - DCGM Exporter: ` http://localhost:9401/metrics `
22
74
23
- ``` bash
24
- docker compose --profile metrics up -d
25
- ```
75
+ 4 . Optionally, if you want to experiment further:
76
+ Start the ` components/metrics ` application to begin monitoring for metric events from dynamo workers
77
+ and aggregating them on a prometheus metrics endpoint: ` http://localhost:9091/metrics ` .
78
+
79
+ Then, uncomment the appropriate lines in prometheus.yml.
80
+
81
+ 5 . Optionally, start worker(s) that publishes KV Cache metrics:
82
+ - For quick testing, ` examples/rust/service_metrics/bin/server.rs ` can populate dummy KV Cache metrics.
83
+ - For a real workflow with real data, see the KV Routing example in ` examples/python_rs/llm/vllm ` .
26
84
27
- 5 . Web servers started:
28
- - Grafana: ` http://localhost:3001 ` (default login: admin/admin) (started by docker compose)
29
- - Prometheus Server: ` http://localhost:9090 ` (started by docker compose)
30
- - Prometheus Metrics Endpoint: ` http://localhost:9091/metrics ` (started by ` components/metrics ` application)
31
85
32
86
## Configuration
33
87
@@ -42,6 +96,7 @@ Note: You may need to adjust the target based on your host configuration and net
42
96
Grafana is pre-configured with:
43
97
- Prometheus datasource
44
98
- Sample dashboard for visualizing service metrics
99
+ ![ grafana image] ( ./grafana1.png )
45
100
46
101
## Required Files
47
102
0 commit comments