Skip to content

Logging stacktrace for the exception if MetricValidation failed for each retry #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

jerry-shao
Copy link
Contributor

@jerry-shao jerry-shao commented Jul 25, 2024

Issue description:

Today, when E2E test for metrics failed to get expected metrics, it retries for maximum 80 times. However, it won't tell us which metric was missing until it exhaust all retries at the very end. Example: https://github.com/aws-observability/aws-application-signals-test-framework/actions/runs/10082523244/job/27876967293#step:27:100

Description of changes:

This change logs stacktrace of the failed attempt in warning level, so operator will see the missing metric even after one retry.

Example stacktrace in workflow log:

17:46:15.330 [main] WARN  com.amazon.aoc.helpers.RetryHelper - com.amazon.aoc.exception.BaseException: metric in 
toBeCheckedMetricList: {Namespace: ApplicationSignals,MetricName: Latency,Dimensions: [{Name: Environment,Value: eks:e2e-canary-test/ns-10167100950-965}, {Name: Hello,Value: world!}, {Name: Operation,Value: GET /mysql}, {Name: Service,Value: sample-application-java-eks-10167100950-965-1}]} is not found in 
baseMetricList: [{Namespace: ApplicationSignals,MetricName: Error,Dimensions: [{Name: Environment,Value: eks:e2e-canary-test/ns-10167100950-965}, {Name: Operation,Value: GET /**}, {Name: Service,Value: sample-r-app-deployment-java-eks-10167100950-965-1}]}, {Namespace: ApplicationSignals,MetricName: Error,Dimensions: [{Name: Environment,Value: eks:e2e-canary-test/ns-10167100950-965}, {Name: Operation,Value: GET /aws-sdk-call}, {Name: RemoteOperation,Value: GetBucketLocation}, {Name: RemoteResourceIdentifier,Value: e2e-test-bucket-name-java-eks-10167100950-965-1}, {Name: RemoteResourceType,Value: AWS::S3::Bucket}, {Name: RemoteService,Value: AWS::S3}, {Name: Service,Value: sample-application-java-eks-10167100950-965-1}]}, {Namespace: ApplicationSignals,MetricName: Error,Dimensions: [{Name: Environment,Value: eks:e2e-canary-test/ns-10167100950-965}, {Name: Operation,Value: GET /aws-sdk-call}, {Name: RemoteOperation,Value: GetBucketLocation}, {Name: RemoteService,Value: AWS::S3}, {Name: Service,Value: sample-appl
17:46:15.330 [main] INFO  com.amazon.aoc.helpers.RetryHelper - retrying after 10 seconds
17:46:25.330 [main] INFO  com.amazon.aoc.helpers.RetryHelper - retry attempt left : 78 

Test run: https://github.com/jerry-shao/aws-application-signals-test-framework/actions/runs/10167100950/job/28118707771

Ensure you've run the following tests on your changes and include the link below:
To do so, create a test.yml file with name: Test and workflow description to test your changes, then remove the file for your PR. Link your test run in your PR description. This process is a short term solution while we work on creating a staging environment for testing.

NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL. See the needs keyword to run tests in succession.

  • Run Java EKS on e2e-playground in us-east-1 and eu-central-2
  • Run Python EKS on e2e-playground in us-east-1 and eu-central-2
  • Run metric limiter on EKS cluster e2e-playground in us-east-1 and eu-central-2
  • Run EC2 tests in all regions
  • Run K8s on a separate K8s cluster (check IAD test account for master node endpoints; these will change as we create and destroy clusters for OS patching)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@majanjua-amzn majanjua-amzn merged commit 62c471c into aws-observability:main Jul 30, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants