You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Problem:** In elastic/docs#2752, we updated the URL prefix (`welcome-to-elastic`) and name for the "Welcome to Elastic Docs" docs. However, we still have some stray links that use the old `/welcome-to-elastic` URL prefix
**Solution:** Updates several outdated links to use an attribute.
(cherry picked from commit 01bbc42)
Copy file name to clipboardExpand all lines: docs/en/ingest-management/elastic-agent/scaling-on-kubernetes.asciidoc
+23-23Lines changed: 23 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -9,12 +9,12 @@ For more information on how to deploy {agent} on {k8s}, please review these page
9
9
[discrete]
10
10
== Observability at scale
11
11
12
-
This document summarizes some key factors and best practices for using https://www.elastic.co/guide/en/welcome-to-elastic/current/getting-started-kubernetes.html[Elastic {observability}] to monitor {k8s} infrastructure at scale. Users need to consider different parameters and adjust {stack} accordingly. These elements are affected as the size of {k8s} cluster increases:
12
+
This document summarizes some key factors and best practices for using {estc-welcome-current}/getting-started-kubernetes.html[Elastic {observability}] to monitor {k8s} infrastructure at scale. Users need to consider different parameters and adjust {stack} accordingly. These elements are affected as the size of {k8s} cluster increases:
13
13
14
14
- The amount of metrics being collected from several {k8s} endpoints
15
15
- The {agent}'s resources to cope with the high CPU and Memory needs for the internal processing
16
16
- The {es} resources needed due to the higher rate of metric ingestion
17
-
- The Dashboard's visualizations response times as more data are requested on a given time window
17
+
- The Dashboard's visualizations response times as more data are requested on a given time window
18
18
19
19
The document is divided in two main sections:
20
20
@@ -41,7 +41,7 @@ The {k8s} {observability} is based on https://docs.elastic.co/en/integrations/ku
41
41
42
42
Controller manager and Scheduler datastreams are being enabled only on the specific node that actually runs based on autodiscovery rules
43
43
44
-
The default manifest provided deploys {agent} as DaemonSet which results in an {agent} being deployed on every node of the {k8s} cluster.
44
+
The default manifest provided deploys {agent} as DaemonSet which results in an {agent} being deployed on every node of the {k8s} cluster.
45
45
46
46
Additionally, by default one agent is elected as **leader** (for more information visit <<kubernetes_leaderelection-provider>>). The {agent} Pod which holds the leadership lock is responsible for collecting the cluster-wide metrics in addition to its node's metrics.
47
47
@@ -58,7 +58,7 @@ The DaemonSet deployment approach with leader election simplifies the installati
58
58
[discrete]
59
59
=== Specifying resources and limits in Agent manifests
60
60
61
-
Resourcing of your Pods and the Scheduling priority (check section <<agent-scheduling,Scheduling priority>>) of them are two topics that might be affected as the {k8s} cluster size increases.
61
+
Resourcing of your Pods and the Scheduling priority (check section <<agent-scheduling,Scheduling priority>>) of them are two topics that might be affected as the {k8s} cluster size increases.
62
62
The increasing demand of resources might result to under-resource the Elastic Agents of your cluster.
63
63
64
64
Based on our tests we advise to configure only the `limit` section of the `resources` section in the manifest. In this way the `request`'s settings of the `resources` will fall back to the `limits` specified. The `limits` is the upper bound limit of your microservice process, meaning that can operate in less resources and protect {k8s} to assign bigger usage and protect from possible resource exhaustion.
@@ -76,11 +76,11 @@ Based on our https://github.com/elastic/elastic-agent/blob/main/docs/elastic-age
76
76
77
77
Sample Elastic Agent Configurations:
78
78
|===
79
-
| No of Pods in K8s Cluster | Leader Agent Resources | Rest of Agents
> The above tests were performed with {agent} version 8.7 and scraping period of `10sec` (period setting for the {k8s} integration). Those numbers are just indicators and should be validated for each different {k8s} environment and amount of workloads.
@@ -94,19 +94,19 @@ Although daemonset installation is simple, it can not accommodate the varying ag
94
94
95
95
- A dedicated {agent} deployment of a single Agent for collecting cluster wide metrics from the apiserver
96
96
97
-
- Node level {agent}s(no leader Agent) in a Daemonset
97
+
- Node level {agent}s(no leader Agent) in a Daemonset
98
98
99
99
- kube-state-metrics shards and {agent}s in the StatefulSet defined in the kube-state-metrics autosharding manifest
100
-
100
+
101
101
Each of these groups of {agent}s will have its own policy specific to its function and can be resourced independently in the appropriate manifest to accommodate its specific resource requirements.
102
102
103
-
Resource assignment led us to alternatives installation methods.
103
+
Resource assignment led us to alternatives installation methods.
104
104
105
105
IMPORTANT: The main suggestion for big scale clusters *is to install {agent} as side container along with `kube-state-metrics` Shard*. The installation is explained in details https://github.com/elastic/elastic-agent/tree/main/docs/manifests/kustomize-autosharding[{agent} with Kustomize in Autosharding]
106
106
107
107
The following **alternative configuration methods** have been verified:
108
108
109
-
1. With `hostNetwork:false`
109
+
1. With `hostNetwork:false`
110
110
- {agent} as Side Container within KSM Shard pod
111
111
- For non-leader {agent} deployments that collect per KSM shards
112
112
2. With `taint/tolerations` to isolate the {agent} daemonset pods from rest of deployments
@@ -116,10 +116,10 @@ You can find more information in the document called https://github.com/elastic/
116
116
Based on our https://github.com/elastic/elastic-agent/blob/ksmsharding/docs/elastic-agent-scaling-tests.md[{agent} scaling tests], the following table aims to assist users on how to configure their KSM Sharding as {k8s} cluster scales:
117
117
|===
118
118
| No of Pods in K8s Cluster | No of KSM Shards | Agent Resources
119
-
| 1000 | No Sharding can be handled with default KSM config | limits: memory: 700Mi , cpu:500m
> The tests above were performed with {agent} version 8.8 + TSDB Enabled and scraping period of `10sec` (for the {k8s} integration). Those numbers are just indicators and should be validated per different {k8s} policy configuration, along with applications that the {k8s} cluster might include
The configuration of Elastic Stack needs to be taken under consideration in large scale deployments. In case of Elastic Cloud deployments the choice of the deployment https://www.elastic.co/guide/en/cloud/current/ec-getting-started-profiles.html[{ecloud} hardware profile] is important.
155
+
The configuration of Elastic Stack needs to be taken under consideration in large scale deployments. In case of Elastic Cloud deployments the choice of the deployment https://www.elastic.co/guide/en/cloud/current/ec-getting-started-profiles.html[{ecloud} hardware profile] is important.
156
156
157
157
For heavy processing and big ingestion rate needs, the `CPU-optimised` profile is proposed.
158
158
@@ -161,7 +161,7 @@ For heavy processing and big ingestion rate needs, the `CPU-optimised` profile i
161
161
== Validation and Troubleshooting practices
162
162
163
163
[discrete]
164
-
=== Define if Agents are collecting as expected
164
+
=== Define if Agents are collecting as expected
165
165
166
166
After {agent} deployment, we need to verify that Agent services are healthy, not restarting (stability) and that collection of metrics continues with expected rate (latency).
167
167
@@ -217,7 +217,7 @@ Components:
217
217
Healthy: communicating with pid '42462'
218
218
------------------------------------------------
219
219
220
-
It is a common problem of lack of CPU/memory resources that agent process restart as {k8s} size grows. In the logs of agent you
220
+
It is a common problem of lack of CPU/memory resources that agent process restart as {k8s} size grows. In the logs of agent you
You can verify the instant resource consumption by running `top pod` command and identify if agents are close to the limits you have specified in your manifest.
232
+
You can verify the instant resource consumption by running `top pod` command and identify if agents are close to the limits you have specified in your manifest.
233
233
234
234
[source,bash]
235
235
------------------------------------------------
@@ -261,7 +261,7 @@ Identify how many events have been sent to {es}:
0 commit comments