Skip to content

Update design docs based on meetings with team #1802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
package software.amazon.awssdk.annotations;

/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
import java.util.concurrent.CompletableFuture;

/**
* Interface to report and publish the collected SDK metric events to external
* sources.
* <p>
* Conceptually, a publisher receives a stream of {@link MetricEvents} objects
* overs its lifetime through its {@link #consume(MetricEvents)} method.
* Implementations are then free further aggregate these events into sets of
* metrics that are then published to some external system for further use.
* As long as a publisher is not closed, then it can receive {@code
* MetricEvents} objects at any time. In addition, as the SDK makes use of
* multithreading, it's possible that the publisher is shared concurrently by
* multiple threads, and necessitates that all implementations are threadsafe.
* <p>
* <b>Example:</b>
* <p>
* At {@code t0}:
* <pre>
* metricEventsBuilder.putMetricEvent(Events.MARSHALLING_START, Instant.now());
* </pre>
* <p>
* At {@code t1}, after mashalling is complete:
* <pre>
* metricEventsBuilder.putMetricEvent(Events.MARSHALLING_END, Instant.now());
* }
* </pre>
* <p>
* At {@code t2} after the SDK operation is complete:
* <pre>
* {@code
* MetricEvents metricEvents = metricEventsBuilder.build();
* metricPublisher.consume(metricEvents)
* .whenComplete((r,t) -> {
* if (t == null) {
* log.debug("Publishing completed successfully.");
* } else {
* log.error("Publishing of " + metricEvents + " was unsuccessful", e);
* }
* });
* }
* </pre>
* At some later {@code tN}, the publisher can then choose to aggregate all of
* the {@code metricEvents} it has received and publish them.
* <p>
* The SDK may invoke methods on the interface from multiple threads
* concurrently so implementations must be threadsafe.
*/
@SdkPublicApi
@ThreadSafe
public interface MetricPublisher extends AutoCloseable {
/**
* Notify the publisher of new metric data. After this call returns, the
* caller can safely discard the given {@code metricEvents} instance if it
* no longer needs it. Implementations are strongly encouraged to complete
* the aggregation and publishing of metrics in an asynchronous manner to
* avoid blocking the calling thread.
* <p>
* With the exception of a {@code null} {@code metricEvents}, all
* invocations of this method must return normally. The only legal way to
* report an error is by completing the returned future exceptionally. This
* is to ensure that callers of the publisher can safely assume that even
* in situations where an error happens during publishing that it will not
* interrupt the calling thread.
* <p>
* The future is completed when the metrics calculated or otherwise derived
* from the given {@code metricEvents} have been published.
*
* @param metricEvents The metric events.
* @return A future representing the publishing of the given metric events.
* @throws IllegalArgumentException If {@code metricEvents} is {@code null}.
*/
CompletableFuture<Void> consume(MetricEvents metricEvents);

/**
* Close this publisher, allowing it to free any resources it holds and
* prevents further use.
* <p>
* Implementations <b>must</b> block until all pending metrics are
* published and all held resources are freed.
*/
@Override
void close();

class MetricEvents {
}
}
63 changes: 40 additions & 23 deletions docs/design/core/metrics/Design.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,50 @@
# SDK Metrics System
## Concepts
### Metric
* A representation of data collected
* Metric can be one of the following types: Counter, Gauge, Timer
* Metric can be associated to a category. Some of the metric categories are Default, HttpClient, Streaming etc

### MetricRegistry

* A MetricRegistry represent an interface to store the collected metric data. It can hold different types of Metrics
described above
* MetricRegistry is generic and not tied to specific category (ApiCall, HttpClient etc) of metrics.
* Each API call has it own instance of a MetricRegistry. All metrics collected in the ApiCall lifecycle are stored in
that instance.
* A MetricRegistry can store other instances of same type. This can be used to store metrics for each Attempt in an Api
Call.
* A measure of some aspect of the SDK. Examples include request latency, number
of pooled connections and retries executed.

* A metric is associated to a category. Some of the metric categories are
`Default`, `HttpClient` and `Streaming`. This enables customers to enable
metrics only for categories they are interested in.

Refer to the [Metrics List](./MetricsList.md) document for a complete list of
standard metrics collected by the SDK.

### Metric Events

* `MetricEvents` is a typesafe collection of raw data from which metrics are
drawn. Depending on the metric, the data can be used directly or derived
from the collected data points.

* `MetricEvents` objects allow for nesting. This enables events to be
collected in the context of other metric events. For example, for single
API call, there may be multiple request attempts if there are retries. Each
attempt's associated metric events can be stored in their own
`MetricEvents`.

* Every unique event added to `MetricEvents` may only be added once. When a
metric event is added, it cannot be modified, e.g. by overwriting the stored
event data with a new one. Once an event has been recorded, it should not be
possible to change it.

* [Interface prototype](prototype/MetricRegistry.java)

### MetricPublisher

* A MetricPublisher represent an interface to publish the collected metrics to a external source.
* SDK provides implementations to publish metrics to services like [Amazon
CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side
Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK
Metrics for Enterprise Support)
* Customers can implement the interface and register the custom implementation to publish metrics to a platform not
supported in the SDK.
* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency,
* A `MetricPublisher` publishes collected metrics to a system(s) outside of the
SDK. It takes a `MetricEvents` object, potentially transforms the data into
richer metrics, and also into a format the receiver expects.
* By default, the SDK will provide implementations to publish metrics to [Amazon
CloudWatch](https://aws.amazon.com/cloudwatch/) and [Client Side
Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html)
(also known as AWS SDK Metrics for Enterprise Support).
* Metrics publishers are pluggable within the SDK, allowing customers to
provide their own custom implementations.
* Metric publishers can have different behaviors in terms of list of metrics to
publish, publishing frequency,
configuration needed to publish etc.
* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when
the application fails and customer wants to flush metrics before exiting the application.

* [Interface prototype](prototype/MetricPublisher.java)

### Reporting
Expand Down
31 changes: 31 additions & 0 deletions docs/design/core/metrics/prototype/MetricEventRecord.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package software.amazon.awssdk.metrics;

/**
* A container associating an event with its data.
*/
interface MetricEventRecord<T> {
/**
* @return The metric event.
*/
MetricEvent<T> getEvent();

/**
* @return The data assocaited with this event.
*/
T getData();
}
64 changes: 64 additions & 0 deletions docs/design/core/metrics/prototype/MetricEvents.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package software.amazon.awssdk.metrics;

import java.util.List;
import java.util.Map;
import java.util.Optional;
import software.amazon.awssdk.annotations.SdkPublicApi;
import software.amazon.awssdk.metrics.meter.Counter;
import software.amazon.awssdk.metrics.meter.Gauge;
import software.amazon.awssdk.metrics.meter.Metric;
import software.amazon.awssdk.metrics.meter.Timer;

/**
* An immutable object used to store metric events collected by the SDK.
*/
@SdkPublicApi
public interface MetricEvents extends Iterable<MetricEventRecord<?>> {
/**
* Return the metric data associated with the given event. Returns {@code
* null} if no event is found.
*/
<T> T getMetricEventData(MetricEvent<T> event);

/**
* Return an iterator of the contained metric events and their data.
*/
@Override
Iterator<MetricEventRecord<?>> iterator();

/**
* Builder for a {@code MetricEvents}.
* <p>
* Implementations are not guaranteed to be threadsafe so external
* synchronzation must be used if being shared by multiple threads.
*/
interface Builder {
/**
* Add the given metric with associated data.
*
* @throws IllegalArgumentException If the given event is already
* present, and or {@code eventData} is {@code null}.
*/
<T> void putMetricEvent(MetricEvent<T> event, T eventData);

/**
* Build this {@code MetricEvents} object.
*/
MetricEvents build();
}
}
88 changes: 53 additions & 35 deletions docs/design/core/metrics/prototype/MetricPublisher.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,56 +13,74 @@
* permissions and limitations under the License.
*/

package software.amazon.awssdk.metrics.publisher;
package software.amazon.awssdk.metrics;

import java.util.concurrent.CompletableFuture;
import software.amazon.awssdk.annotations.SdkPublicApi;
import software.amazon.awssdk.metrics.registry.MetricRegistry;

/**
* Interface to report and publish the collected SDK metrics to external sources.
*
* Publisher implementations create and maintain resources (like clients, thread pool etc) that are used for publishing.
* They should be closed in the close() method to avoid resource leakage.
*
* Interface to report and publish the collected SDK metric events to external
* sources.
* <p>
* As metrics are not part of the business logic, failures caused by metrics features should not fail the application.
* So SDK publisher implementations suppress all errors during the metrics publishing and log them.
* </p>
*
* Conceptually, a publisher receives a stream of {@link MetricEvents} objects
* overs its lifetime through its {@link #consume{MetricEvents)} method.
* Implementations are then free further aggregate these events into sets of
* metrics that are then published to some external system for further use.
* As long as a publisher is not closed, then it can receive {@code
* MetricEvents} objects at any time. In addition, as the SDK makes use of
* multithreading, it's possible that the publisher is shared concurrently by
* multiple threads, and necessitates that all implementations are threadsafe.
* <p>
* <b>Example:</b>
* At {@code t0}:
* {@code
* metricEventsBuilder.putMetricEvent(Events.MARSHALLING_START, Instant.now());
* }
* <p>
* At {@code t1}, after mashalling is complete:
* {@code
* metricEventsBuilder.putMetricEvent(Events.MARSHALLING_END, Instant.now());
* }
* <p>
* At {@code t2} after the SDK operation is complete:
* <p>
* In certain situations (high throttling errors, metrics are reported faster than publishing etc), storing all the metrics
* might take up lot of memory and can crash the application. In these cases, it is recommended to have a max limit on
* number of metrics stored or memory used for metrics and drop the metrics when the limit is breached.
* </p>
* {@code
* metricPublisher.consume(metricEventsBuilder.build());
* }
* At some later {@code tN}, the publisher can then choose to aggregate all of
* the {@code metricEvents} it has received and publish them.
*
*
* Implementations must be threadsafe.
*/
@SdkPublicApi
@ThreadSafe
public interface MetricPublisher extends AutoCloseable {

/**
* Registers the metric information supplied in MetricsRegistry. The reported metrics can be transformed and
* stored in a format the publisher uses to publish the metrics.
* Notify the publisher of new metric data. After this call returns, the
* caller can safely discard the given {@code metricEvents} instance if it
* no longer needs it. Implementations are strongly encouraged to complete
* the aggregation and publishing of metrics in an asynchronous manner to
* avoid blocking the calling thread.
* <p>
* With the exception of a {@code null} {@code metricEvents}, all
* invocations of this method must return normally. This is to ensure that
* callers of the publisher can safely assume that even in situations where
* an error happens during publishing that it will not interrupt the calling
* thread.
*
* This method is called at the end of each request execution to report all the metrics collected
* for that request (including retry attempt metrics)
* @throws IllegalArgumentException If {@code metricEvents} is {@code null}.
*/
void registerMetrics(MetricRegistry metricsRegistry);
void consume(MetricEvents metricEvents);

/**
* Publish all metrics stored in the publisher. If all available metrics cannot be published in a single call,
* multiple calls will be made to publish the metrics.
*
* It is recommended to publish the metrics in a non-blocking way. As it is common to publish metrics to an external
* source which involves network calls, the method is intended to be implemented in a non-blocking way and thus
* returns a {@link CompletableFuture}.
*
* Depending on the implementation, the metrics are published to the external source periodically like:
* a) after a certain time period
* b) after n metrics are registered
* c) after the buffer is full
*
* Implementations can also call publish method for every reported metric. But this can be expensive and
* is not recommended.
* Close this publisher, allowing it to free any resources it holds and
* prevents further use.
* <p>
* Implementations <b>must</b> block until all pending metrics are
* published and all held resources are freed.
*/
CompletableFuture<Void> publish();
@Override
void close();
}
Loading