Skip to content

[ETCM-49] Replace statsd metrics with prometheus #668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,17 @@ in the root of the project.

This updates all submodules and creates a distribution zip in `~/target/universal/`.

### Monitoring

#### Locally build & run monitoring client

```
# Build monitoring client docker image
projectRoot $ docker build -f ./docker/monitoring-client.Dockerfile -t mantis-monitoring-client ./docker/
# Run monitoring client in http://localhost:9090
projectRoot $ docker run --network=host mantis-monitoring-client
```

### Feedback

Feedback gratefully received through the Ethereum Classic Forum (http://forum.ethereumclassic.org/)
Expand Down
23 changes: 13 additions & 10 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import scala.sys.process.Process
val nixBuild = sys.props.isDefinedAt("nix")

val commonSettings = Seq(
name := "mantis",
name := "mantis-core",
version := "3.0",
scalaVersion := "2.12.12",
testOptions in Test += Tests
Expand Down Expand Up @@ -51,8 +51,6 @@ val dep = {
"net.java.dev.jna" % "jna" % "4.5.1",
"org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.0",
"com.github.scopt" %% "scopt" % "3.7.0",
// Metrics (https://github.com/DataDog/java-dogstatsd-client)
"com.datadoghq" % "java-dogstatsd-client" % "2.5",
"org.xerial.snappy" % "snappy-java" % "1.1.7.2",
// Logging
"ch.qos.logback" % "logback-classic" % "1.2.3",
Expand Down Expand Up @@ -87,14 +85,19 @@ val root = {
.configs(Integration, Benchmark, Evm, Ets, Snappy, Rpc)
.settings(commonSettings: _*)
.settings(
libraryDependencies ++= dep
libraryDependencies ++= Seq(
dep,
Dependencies.micrometer,
Dependencies.prometheus
).flatten
)
.settings(inConfig(Integration)(Defaults.testSettings): _*)
.settings(inConfig(Benchmark)(Defaults.testSettings): _*)
.settings(inConfig(Evm)(Defaults.testSettings): _*)
.settings(inConfig(Ets)(Defaults.testSettings): _*)
.settings(inConfig(Snappy)(Defaults.testSettings): _*)
.settings(inConfig(Rpc)(Defaults.testSettings): _*)
.settings(executableScriptName := name.value)
.settings(inConfig(Integration)(Defaults.testSettings) : _*)
.settings(inConfig(Benchmark)(Defaults.testSettings) : _*)
.settings(inConfig(Evm)(Defaults.testSettings) : _*)
.settings(inConfig(Ets)(Defaults.testSettings) : _*)
.settings(inConfig(Snappy)(Defaults.testSettings) : _*)
.settings(inConfig(Rpc)(Defaults.testSettings) : _*)

if (!nixBuild)
root
Expand Down
2 changes: 2 additions & 0 deletions docker/monitoring-client.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
FROM prom/prometheus
ADD ./prometheus.yml /etc/prometheus/
29 changes: 29 additions & 0 deletions docker/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Please, don't use any default port allocations.
# https://github.com/prometheus/prometheus/wiki/Default-port-allocations
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: prometheus
honor_timestamps: true
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:9090
labels:
alias: prometheus
- job_name: node
honor_timestamps: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:13798
labels:
alias: mantis-node
27 changes: 27 additions & 0 deletions project/Dependencies.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import sbt._

object Dependencies {

val prometheus: Seq[ModuleID] = {
val provider = "io.prometheus"
val version = "0.8.0"
Seq(
provider % "simpleclient" % version,
provider % "simpleclient_logback" % version,
provider % "simpleclient_hotspot" % version,
provider % "simpleclient_httpserver" % version
)
}

val micrometer: Seq[ModuleID] = {
val provider = "io.micrometer"
val version = "1.0.4"
Seq(
// Required to compile metrics library https://github.com/micrometer-metrics/micrometer/issues/1133#issuecomment-452434205
"com.google.code.findbugs" % "jsr305" % "3.0.2" % Optional,
provider % "micrometer-core" % version,
provider % "micrometer-registry-jmx" % version,
provider % "micrometer-registry-prometheus" % version
)
}
}
328 changes: 312 additions & 16 deletions repo.nix

Large diffs are not rendered by default.

29 changes: 3 additions & 26 deletions src/main/resources/application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -504,35 +504,12 @@ mantis {

metrics {
# Set to `true` iff your deployment supports metrics collection.
# We push metrics to a StatsD-compatible agent and we use Datadog for collecting them in one place.
# We expose metrics using a Prometheus server
# We default to `false` here because we do not expect all deployments to support metrics collection.
enabled = false

# The StatsD-compatible agent host.
host = "localhost"

# The StatsD-compatible agent port.
port = 8125

# All metrics sent will be tagged with the environment.
# Sample values are: `public`, `private`, `production`, `dev`, depending on the use-case.
# If metrics are `enabled`, this is mandatory and must be set explicitly to a non-null value.
#
environment = ""

# All metrics sent will be tagged with the deployment.
# Sample values are: `kevm-testnet`, `iele-testnet`, `raft-kevm`, depending on the use-case.
# If metrics are `enabled`, this is mandatory and must be set explicitly to a non-null value.
#
deployment = ""

# Size of the metrics requests queue.
# If the queue contains that many outstanding requests to the metrics agent, then
# subsequent requests are blocked until the queue has room again.
queue-size = 1024

# Iff true, any errors during metrics client operations will be logged.
log-errors = true
# The port for setting up a Prometheus server over localhost.
port = 13798
}

logging {
Expand Down
3 changes: 3 additions & 0 deletions src/main/resources/logback.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@
</encoder>
</appender>

<appender name="METRICS" class="io.prometheus.client.logback.InstrumentedAppender" />

<root level="DEBUG">
<if condition='p("ASJSON").contains("true")'>
<then>
Expand All @@ -51,6 +53,7 @@
</else>
</if>
<appender-ref ref="FILE" />
<appender-ref ref="METRICS" />
</root>

<logger name="io.iohk.ethereum.network.rlpx.RLPxConnectionHandler" level="INFO" />
Expand Down
4 changes: 4 additions & 0 deletions src/main/scala/io/iohk/ethereum/domain/BlockBody.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ case class BlockBody(transactionList: Seq[SignedTransaction], uncleNodesList: Se
|uncleNodesList: $uncleNodesList
|}
""".stripMargin

lazy val numberOfTxs: Int = transactionList.size

lazy val numberOfUncles: Int = uncleNodesList.size
}

object BlockBody {
Expand Down
6 changes: 3 additions & 3 deletions src/main/scala/io/iohk/ethereum/ledger/BlockImport.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ import io.iohk.ethereum.consensus.validators.BlockHeaderError.HeaderParentNotFou
import io.iohk.ethereum.domain._
import io.iohk.ethereum.ledger.BlockExecutionError.{ UnKnownExecutionError, ValidationBeforeExecError }
import io.iohk.ethereum.ledger.BlockQueue.Leaf
import io.iohk.ethereum.metrics.{ Metrics, MetricsClient }
import io.iohk.ethereum.utils.{ BlockchainConfig, Logger }
import org.bouncycastle.util.encoders.Hex

Expand Down Expand Up @@ -68,8 +67,9 @@ class BlockImport(
})

if (importedBlocks.nonEmpty) {
val maxNumber = importedBlocks.map(_.block.header.number).max.toLong
MetricsClient.get().gauge(Metrics.LedgerImportBlockNumber, maxNumber)
importedBlocks.map(
blockData => BlockMetrics.measure(blockData.block, blockchain.getBlockByHash)
)
}

result
Expand Down
42 changes: 42 additions & 0 deletions src/main/scala/io/iohk/ethereum/ledger/BlockMetrics.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package io.iohk.ethereum.ledger

import akka.util.ByteString
import com.google.common.util.concurrent.AtomicDouble
import io.iohk.ethereum.domain.Block
import io.iohk.ethereum.metrics.MetricsContainer

case object BlockMetrics extends MetricsContainer {

private[this] final val BlockNumberGauge =
metrics.registry.gauge("sync.block.number.gauge", new AtomicDouble(0d))
private[this] final val BlockGasLimitGauge =
metrics.registry.gauge("sync.block.gasLimit.gauge", new AtomicDouble(0d))
private[this] final val BlockGasUsedGauge =
metrics.registry.gauge("sync.block.gasUsed.gauge", new AtomicDouble(0d))
private[this] final val BlockDifficultyGauge =
metrics.registry.gauge("sync.block.difficulty.gauge", new AtomicDouble(0d))
private[this] final val BlockTransactionsGauge =
metrics.registry.gauge("sync.block.transactions.gauge", new AtomicDouble(0d))
private[this] final val BlockUnclesGauge =
metrics.registry.gauge("sync.block.uncles.gauge", new AtomicDouble(0d))
private[this] final val TimeBetweenParentGauge =
metrics.registry.gauge("sync.block.timeBetweenParent.seconds.gauge", new AtomicDouble(0d))

def measure(block: Block, getBlockByHashFn: ByteString => Option[Block]): Unit = {
BlockNumberGauge.set(block.number.toDouble)
BlockGasLimitGauge.set(block.header.gasLimit.toDouble)
BlockGasUsedGauge.set(block.header.gasUsed.toDouble)
BlockDifficultyGauge.set(block.header.difficulty.toDouble)
BlockTransactionsGauge.set(block.body.numberOfTxs)
BlockUnclesGauge.set(block.body.numberOfUncles)

getBlockByHashFn(block.header.parentHash) match {
case Some(parentBlock) =>
val timeBetweenBlocksInSeconds: Long =
block.header.unixTimestamp - parentBlock.header.unixTimestamp
TimeBetweenParentGauge.set(timeBetweenBlocksInSeconds)
case None => ()
}
}

}
11 changes: 11 additions & 0 deletions src/main/scala/io/iohk/ethereum/metrics/AppJmxConfig.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
package io.iohk.ethereum.metrics

import io.micrometer.jmx.JmxConfig

class AppJmxConfig extends JmxConfig {
override def get(key: String): String = null

override def prefix(): String = Metrics.MetricsPrefix

override def domain(): String = prefix()
}
33 changes: 33 additions & 0 deletions src/main/scala/io/iohk/ethereum/metrics/DeltaSpikeGauge.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package io.iohk.ethereum.metrics

import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger}

/**
* A gauge that starts at `0` and can be triggered to go to `1`.
* Next time it is sampled, it goes back to `0`.
* This is normally used for either one-off signals (e.g. when an application starts)
* or slowly re-appearing signals. Specifically, the sampling rate must be greater
* than the rate the signal is triggered.
*/
class DeltaSpikeGauge(name: String, metrics: Metrics) {
private[this] final val isTriggeredRef = new AtomicBoolean(false)
private[this] final val valueRef = new AtomicInteger(0)

private[this] def getValue(): Double = {
if (isTriggeredRef.compareAndSet(true, false)) {
valueRef.getAndSet(0)
} else {
valueRef.get()
}
}

private[this] final val gauge = metrics.gauge(name, () ⇒ getValue)

def trigger(): Unit = {
if (isTriggeredRef.compareAndSet(false, true)) {
valueRef.set(1)
// Let one of the exporting metric registries pick up the `1`.
// As soon as that happens, `getValue` will make sure that we go back to `0`.
}
}
}
57 changes: 57 additions & 0 deletions src/main/scala/io/iohk/ethereum/metrics/MeterRegistryBuilder.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
package io.iohk.ethereum.metrics

import io.iohk.ethereum.utils.Logger
import io.iohk.ethereum.utils.LoggingUtils.getClassName
import io.micrometer.core.instrument.composite.CompositeMeterRegistry
import io.micrometer.core.instrument.config.MeterFilter
import io.micrometer.core.instrument._
import io.micrometer.prometheus.{PrometheusMeterRegistry, PrometheusConfig}
import io.prometheus.client.CollectorRegistry
import io.micrometer.jmx.JmxMeterRegistry

object MeterRegistryBuilder extends Logger {

private[this] final val StdMetricsClock = Clock.SYSTEM

private[this] def onMeterAdded(m: Meter): Unit =
log.debug(s"New ${getClassName(m)} metric: " + m.getId.getName)

/**
* Build our meter registry consist in:
* 1. Create each Meter registry
* 2. Config the resultant composition
*/
def build(metricsPrefix: String): MeterRegistry = {

val jmxMeterRegistry = new JmxMeterRegistry(new AppJmxConfig, StdMetricsClock)

log.info(s"Build JMX Meter Registry: ${jmxMeterRegistry}")

val prometheusMeterRegistry =
new PrometheusMeterRegistry(
PrometheusConfig.DEFAULT,
CollectorRegistry.defaultRegistry,
StdMetricsClock
);

log.info(s"Build Prometheus Meter Registry: ${prometheusMeterRegistry}")

val registry = new CompositeMeterRegistry(
StdMetricsClock,
java.util.Arrays.asList(jmxMeterRegistry, prometheusMeterRegistry)
)
// Ensure that all metrics have the `Prefix`.
// We are of course mainly interested in those that we do not control,
// e.g. those coming from `JvmMemoryMetrics`.
registry
.config()
.meterFilter(new MeterFilter {
override def map(id: Meter.Id): Meter.Id = {
id.withName(MetricsUtils.mkNameWithPrefix(metricsPrefix)(id.getName))
}
})
.onMeterAdded(onMeterAdded)

registry
}
}
Loading