Skip to content

[Native K8s]: E2E test implementation in IAD #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/appsignals-e2e-ec2-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ env:
APP_SIGNALS_ADOT_JAR: "https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar"
METRIC_NAMESPACE: AppSignals
LOG_GROUP_NAME: /aws/appsignals/generic
TEST: ${{ inputs.test }}
GET_ADOT_JAR_COMMAND: "wget -O adot.jar https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar"
GET_CW_AGENT_RPM_COMMAND: "wget -O cw-agent.rpm https://amazoncloudwatch-agent-${{ inputs.aws-region }}.s3.${{ inputs.aws-region }}.amazonaws.com/amazon_linux/amd64/1.300031.0b313/amazon-cloudwatch-agent.rpm"
TEST_RESOURCES_FOLDER: /home/runner/work/aws-application-signals-test-framework/aws-application-signals-test-framework
Expand Down
26 changes: 26 additions & 0 deletions .github/workflows/appsignals-e2e-k8s-canary-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
## SPDX-License-Identifier: Apache-2.0

## This workflow aims to run the Application Signals end-to-end tests as a canary to
## test the artifacts for App Signals enablement. It will deploy the CloudWatch Agent
## Operator and our sample app and remote service onto a native K8s cluster, call the
## APIs, and validate the generated telemetry, including logs, metrics, and traces.
## It will then clean up the cluster and EC2 instance it runs on for the next test run.
name: App Signals Enablement - E2E K8s Canary Testing
on:
schedule:
- cron: '*/15 * * * *' # run the workflow every 15 minutes
workflow_dispatch: # be able to run the workflow on demand

permissions:
id-token: write
contents: read

jobs:
e2e-k8s-test:
uses: ./.github/workflows/appsignals-e2e-k8s-test.yml
secrets: inherit
with:
# To run in more regions, a cluster must be provisioned manually on EC2 instances in that region
aws-region: 'us-east-1'
caller-workflow-name: 'appsignals-e2e-k8s-canary-test'
210 changes: 210 additions & 0 deletions .github/workflows/appsignals-e2e-k8s-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
## Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
## SPDX-License-Identifier: Apache-2.0

# This is a reusable workflow for running the E2E test for App Signals.
# It is meant to be called from another workflow.
# Read more about reusable workflows: https://docs.github.com/en/actions/using-workflows/reusing-workflows#overview
name: App Signals Enablement E2E Testing - K8s on EC2 Use Case
on:
workflow_call:
inputs:
aws-region:
required: true
type: string
caller-workflow-name:
required: true
type: string

concurrency:
group: '${{ github.workflow }} @ ${{ inputs.aws-region }}'
cancel-in-progress: false

permissions:
id-token: write
contents: read

env:
# The presence of this env var is required for use by terraform and AWS CLI commands
# It is not redundant
AWS_DEFAULT_REGION: ${{ inputs.aws-region }}
TEST_ACCOUNT: ${{ secrets.APP_SIGNALS_E2E_TEST_ACC }}
METRIC_NAMESPACE: AppSignals
LOG_GROUP_NAME: /aws/appsignals/k8s
MASTER_NODE_SSH_KEY: ${{ secrets.APP_SIGNALS_E2E_K8S_SSH_KEY_IAD }}
MAIN_SERVICE_ENDPOINT: ${{ secrets.APP_SIGNALS_E2E_K8S_MASTER_NODE_ENDPOINT }}
SAMPLE_APP_NAMESPACE: sample-app-namespace
TEST_RESOURCES_FOLDER: /__w/aws-application-signals-test-framework/aws-application-signals-test-framework

jobs:
e2e-k8s-test:
runs-on: ubuntu-latest
container:
image: public.ecr.aws/h6o3z5z9/aws-application-signals-test-framework-workflow-container:latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Generate testing id
run: echo TESTING_ID="${{ env.AWS_DEFAULT_REGION }}-${{ github.run_id }}-${{ github.run_number }}" >> $GITHUB_ENV

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.E2E_SECRET_TEST_ROLE_ARN }}
aws-region: us-east-1

- name: Retrieve account
uses: aws-actions/aws-secretsmanager-get-secrets@v1
with:
secret-ids:
ACCOUNT_ID, region-account/${{ env.AWS_DEFAULT_REGION }}

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ env.ACCOUNT_ID }}:role/${{ secrets.E2E_TEST_ROLE_ARN }}
aws-region: ${{ env.AWS_DEFAULT_REGION }}

- name: Prepare and upload sample app deployment files
working-directory: terraform/k8s/deploy/resources
run: |
sed -i 's#\${TESTING_ID}#${{ env.TESTING_ID }}#' frontend-service-depl.yaml
sed -i 's#\${IMAGE}#${{ env.ACCOUNT_ID }}.dkr.ecr.${{ env.AWS_DEFAULT_REGION }}.amazonaws.com/${{ secrets.APP_SIGNALS_E2E_FE_SA_IMG }}#' frontend-service-depl.yaml
sed -i 's#\${TESTING_ID}#${{ env.TESTING_ID }}#' remote-service-depl.yaml
sed -i 's#\${IMAGE}#${{ env.ACCOUNT_ID }}.dkr.ecr.${{ env.AWS_DEFAULT_REGION }}.amazonaws.com/${{ secrets.APP_SIGNALS_E2E_RE_SA_IMG }}#' remote-service-depl.yaml
aws s3api put-object --bucket ${{ secrets.APP_SIGNALS_E2E_EC2_JAR }}-prod-${{ env.AWS_DEFAULT_REGION }} --key frontend-service-depl.yaml --body frontend-service-depl.yaml
aws s3api put-object --bucket ${{ secrets.APP_SIGNALS_E2E_EC2_JAR }}-prod-${{ env.AWS_DEFAULT_REGION }} --key remote-service-depl.yaml --body remote-service-depl.yaml

- name: Initiate Terraform
uses: ./.github/workflows/actions/execute_and_retry
with:
command: "cd ${{ env.TEST_RESOURCES_FOLDER }}/terraform/k8s/deploy && terraform init && terraform validate"
cleanup: "rm -rf .terraform && rm -rf .terraform.lock.hcl"

- name: Deploy Operator and Sample App using Terraform
working-directory: terraform/k8s/deploy
run: |
terraform apply -auto-approve \
-var="aws_region=${{ env.AWS_DEFAULT_REGION }}" \
-var="test_id=${{ env.TESTING_ID }}" \
-var="ssh_key=${{ env.MASTER_NODE_SSH_KEY }}" \
-var="host=${{ env.MAIN_SERVICE_ENDPOINT }}"

- name: Get Remote Service IP
run: |
echo REMOTE_SERVICE_IP="$(aws ssm get-parameter --region ${{ env.AWS_DEFAULT_REGION }} --name remote-service-ip | jq -r '.Parameter.Value')" >> $GITHUB_ENV

# This steps increases the speed of the validation by creating the telemetry data in advance
# It is run after the gradle build to give the app time to initialize after the pods become ready
- name: Call all test APIs
continue-on-error: true
run: |
curl -S -s http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/outgoing-http-call/; echo
curl -S -s http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/aws-sdk-call/; echo
curl -S -s http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/remote-service?ip=${{ env.REMOTE_SERVICE_IP }}/; echo
curl -S -s http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/client-call/; echo

# Validation for pulse telemetry data
- name: Validate generated EMF logs
id: log-validation
run: ./gradlew validator:run --args='-c k8s/log-validation.yml
--testing-id ${{ env.TESTING_ID }}
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
--region ${{ env.AWS_DEFAULT_REGION }}
--account-id ${{ env.ACCOUNT_ID }}
--metric-namespace ${{ env.METRIC_NAMESPACE }}
--log-group ${{ env.LOG_GROUP_NAME }}
--platform-info k8s-cluster-${{ env.TESTING_ID }}
--app-namespace ${{ env.SAMPLE_APP_NAMESPACE }}
--service-name sample-application-${{ env.TESTING_ID }}
--remote-service-name sample-r-app-deployment-${{ env.TESTING_ID }}
--request-body ip=${{ env.REMOTE_SERVICE_IP }}
--rollup'

- name: Validate generated metrics
id: metric-validation
if: (success() || steps.log-validation.outcome == 'failure') && !cancelled()
run: ./gradlew validator:run --args='-c k8s/metric-validation.yml
--testing-id ${{ env.TESTING_ID }}
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
--region ${{ env.AWS_DEFAULT_REGION }}
--account-id ${{ env.ACCOUNT_ID }}
--metric-namespace ${{ env.METRIC_NAMESPACE }}
--log-group ${{ env.LOG_GROUP_NAME }}
--platform-info k8s-cluster-${{ env.TESTING_ID }}
--app-namespace ${{ env.SAMPLE_APP_NAMESPACE }}
--service-name sample-application-${{ env.TESTING_ID }}
--remote-service-name sample-r-app-deployment-${{ env.TESTING_ID }}
--remote-service-deployment-name sample-r-app-deployment-${{ env.TESTING_ID }}
--request-body ip=${{ env.REMOTE_SERVICE_IP }}
--rollup'

- name: Validate generated traces
id: trace-validation
if: (success() || steps.log-validation.outcome == 'failure' || steps.metric-validation.outcome == 'failure') && !cancelled()
run: ./gradlew validator:run --args='-c k8s/trace-validation.yml
--testing-id ${{ env.TESTING_ID }}
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
--region ${{ env.AWS_DEFAULT_REGION }}
--account-id ${{ env.ACCOUNT_ID }}
--metric-namespace ${{ env.METRIC_NAMESPACE }}
--log-group ${{ env.LOG_GROUP_NAME }}
--platform-info k8s-cluster-${{ env.TESTING_ID }}
--app-namespace ${{ env.SAMPLE_APP_NAMESPACE }}
--service-name sample-application-${{ env.TESTING_ID }}
--remote-service-name sample-r-app-deployment-${{ env.TESTING_ID }}
--remote-service-deployment-name sample-r-app-deployment-${{ env.TESTING_ID }}
--request-body ip=${{ env.REMOTE_SERVICE_IP }}
--rollup'

- name: Publish metric on test result
if: always()
run: |
if [ "${{ steps.log-validation.outcome }}" = "success" ] && [ "${{ steps.metric-validation.outcome }}" = "success" ] && [ "${{ steps.trace-validation.outcome }}" = "success" ]; then
aws cloudwatch put-metric-data --namespace 'ADOT/GitHubActions' \
--metric-name Failure \
--dimensions repository=${{ github.repository }},branch=${{ github.ref_name }},workflow=${{ inputs.caller-workflow-name }} \
--value 0.0 \
--region ${{ env.AWS_DEFAULT_REGION }}
else
aws cloudwatch put-metric-data --namespace 'ADOT/GitHubActions' \
--metric-name Failure \
--dimensions repository=${{ github.repository }},branch=${{ github.ref_name }},workflow=${{ inputs.caller-workflow-name }} \
--value 1.0 \
--region ${{ env.AWS_DEFAULT_REGION }}
fi

# Clean up Procedures
- name: Initiate Terraform for Cleanup
if: always()
uses: ./.github/workflows/actions/execute_and_retry
with:
command: "cd ${{ env.TEST_RESOURCES_FOLDER }}/terraform/k8s/cleanup && terraform init && terraform validate"
cleanup: "rm -rf .terraform && rm -rf .terraform.lock.hcl"

- name: Clean Up Operator and Sample App using Terraform
if: always()
working-directory: terraform/k8s/cleanup
run: |
terraform apply -auto-approve \
-var="aws_region=${{ env.AWS_DEFAULT_REGION }}" \
-var="test_id=${{ env.TESTING_ID }}" \
-var="ssh_key=${{ env.MASTER_NODE_SSH_KEY }}" \
-var="host=${{ env.MAIN_SERVICE_ENDPOINT }}"

- name: Terraform destroy - deployment
if: always()
continue-on-error: true
working-directory: terraform/k8s/deploy
run: |
terraform destroy -auto-approve \
-var="test_id=${{ env.TESTING_ID }}"

- name: Terraform destroy - cleanup
if: always()
continue-on-error: true
working-directory: terraform/k8s/cleanup
run: |
terraform destroy -auto-approve \
-var="test_id=${{ env.TESTING_ID }}"
37 changes: 37 additions & 0 deletions terraform/k8s/cleanup/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@


resource "null_resource" "cleanup" {
connection {
type = "ssh"
user = var.user
private_key = var.ssh_key
host = var.host
}

provisioner "remote-exec" {
inline = [
<<-EOF
# Allow terraform to fail any of the following steps without exiting
set +e

# Uninstall the operator and remove the repo from the EC2 instance
echo "LOG: Uninstalling CloudWatch Agent Operator"
helm uninstall --debug --namespace amazon-cloudwatch amazon-cloudwatch-operator --ignore-not-found
echo "LOG: Deleting CloudWatch Agent Operator repo from environment"
[ ! -e amazon-cloudwatch-agent-operator ] || sudo rm -r amazon-cloudwatch-agent-operator

# Delete sample app resources
echo "LOG: Deleting sample app namespace"
kubectl delete namespace sample-app-namespace
echo "LOG: Deleting sample app deployment files"
[ ! -e frontend-service-depl.yaml ] || rm frontend-service-depl.yaml
[ ! -e remote-service-depl.yaml ] || rm remote-service-depl.yaml
sleep 10

# Print cluster state when done clean up procedures
echo "LOG: Printing cluster state after cleanup"
kubectl get pods -A
EOF
]
}
}
36 changes: 36 additions & 0 deletions terraform/k8s/cleanup/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# ------------------------------------------------------------------------
# Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
# -------------------------------------------------------------------------

variable "test_id" {
default = "dummy-123"
}

variable "aws_region" {
default = "<aws-region>"
}

variable "user" {
default = "ec2-user"
}

variable "ssh_key" {
default = "<MASTER_NODE_SSH_KEY>"
description = "This variable is responsible for providing the SSH key of the master node to allow terraform to interact with the cluster"
}

variable "host" {
default = "<HOST_IP_OR_DNS>"
description = "This variable is responsible for defining which host (ec2 instance) we connect to for the K8s-on-EC2 test"
}
Loading