Skip to content

Commit d6728ac

Browse files
authored
Remove Public Endpoint from Canaries (#141)
*Issue description:* This is a part of two series of PR to remove public endpoints for E2E testing to comply with security best practices. - [Deploy Traffic Generator](#140) - [Remove Public Endpoints from Canaries](#141) Since the public endpoints will be removed, we are unable to call the sample app APIs directly from the workflow. Therefore, we will be using a traffic generator that is installed alongside the sample app applications to call the APIs *K8s* The K8s cluster has been updated to stop exposing the 30100 port. Additionally, once this PR has been merged to main, we will need to update the security groups of the EC2 instances containing the K8s cluster to close all traffic other than from SSH. The SSH traffic will still be needed for the E2E test to be able to update the K8s cluster inside the EC2 instances. *EC2* The security group for the EC2 instances will be updated to only allow inbound traffic from SSH and other EC2 instances in the same security group. Merge this PR first, then update the security group configuration. *Description of changes:* - Deploy the traffic generator in the sample app namespace. - Stop exposing port 30100 for K8s - Stop creating ingress port for the EKS - Install traffic generator in the main ec2 instance and run it in the background for EC2 - Remove HttpCaller from the validator and update the trace validator to search for traces using filters rather than traceId Test was done by running the a workflow in a playground k8s cluster with the security group configured to only allow SSH traffic. Full workflow run: https://github.com/aws-observability/aws-application-signals-test-framework/actions/runs/10167440327 Test run with new security group for EC2: https://github.com/aws-observability/aws-application-signals-test-framework/actions/runs/10135108011 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
1 parent 5788e57 commit d6728ac

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+393
-867
lines changed

.github/workflows/java-ec2-asg-e2e-test.yml

Lines changed: 10 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -128,54 +128,6 @@ jobs:
128128
echo "Terraform deployment was unsuccessful. Will attempt to retry deployment."
129129
fi
130130
131-
# If the deployment_failed is still 0, then the terraform deployment succeeded and now try to connect to the endpoint.
132-
# Attempts to connect will be made for up to 10 minutes
133-
if [ $deployment_failed -eq 0 ]; then
134-
echo "Attempting to connect to the endpoint"
135-
main_service_instance_id=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names ec2-single-asg-${{ env.TESTING_ID }} --region ${{ env.E2E_TEST_AWS_REGION }} --query "AutoScalingGroups[].Instances[0].InstanceId" --output text)
136-
main_service_public_ip=$(aws ec2 describe-instances --instance-ids $main_service_instance_id --region ${{ env.E2E_TEST_AWS_REGION }} --query "Reservations[].Instances[].PublicIpAddress" --output text)
137-
main_service_private_dns_name=$(aws ec2 describe-instances --instance-ids $main_service_instance_id --region ${{ env.E2E_TEST_AWS_REGION }} --query "Reservations[].Instances[].PrivateDnsName" --output text)
138-
139-
echo "INSTANCE_ID=$main_service_instance_id" >> $GITHUB_ENV
140-
echo "MAIN_SERVICE_ENDPOINT=$main_service_public_ip:8080" >> $GITHUB_ENV
141-
echo "PRIVATE_DNS_NAME=$main_service_private_dns_name" >> $GITHUB_ENV
142-
echo "EC2_INSTANCE_AMI=$(terraform output ec2_instance_ami)" >> $GITHUB_ENV
143-
echo "REMOTE_SERVICE_IP=$(terraform output sample_app_remote_service_public_ip)" >> $GITHUB_ENV
144-
145-
main_service_sample_app_endpoint=http://$main_service_public_ip:8080
146-
echo "The main service endpoint is $main_service_sample_app_endpoint"
147-
148-
attempt_counter=0
149-
max_attempts=30
150-
until $(curl --output /dev/null --silent --head --fail $(echo "$main_service_sample_app_endpoint" | tr -d '"')); do
151-
if [ ${attempt_counter} -eq ${max_attempts} ];then
152-
echo "Failed to connect to endpoint. Will attempt to redeploy sample app."
153-
deployment_failed=1
154-
break
155-
fi
156-
157-
printf '.'
158-
attempt_counter=$(($attempt_counter+1))
159-
sleep 10
160-
done
161-
162-
echo "Attempting to connect to the remote sample app endpoint"
163-
remote_sample_app_endpoint=http://$(terraform output sample_app_remote_service_public_ip):8080/healthcheck
164-
attempt_counter=0
165-
max_attempts=30
166-
until $(curl --output /dev/null --silent --head --fail $(echo "$remote_sample_app_endpoint" | tr -d '"')); do
167-
if [ ${attempt_counter} -eq ${max_attempts} ];then
168-
echo "Failed to connect to endpoint. Will attempt to redeploy sample app."
169-
deployment_failed=1
170-
break
171-
fi
172-
173-
printf '.'
174-
attempt_counter=$(($attempt_counter+1))
175-
sleep 10
176-
done
177-
fi
178-
179131
# If the success is 1 then either the terraform deployment or the endpoint connection failed, so first destroy the
180132
# resources created from terraform and try again.
181133
if [ $deployment_failed -eq 1 ]; then
@@ -195,14 +147,16 @@ jobs:
195147
fi
196148
done
197149
198-
# This steps increases the speed of the validation by creating the telemetry data in advance
199-
- name: Call all test APIs
200-
continue-on-error: true
201-
run: |
202-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/outgoing-http-call"
203-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/aws-sdk-call?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"
204-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/remote-service?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"
205-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/client-call"
150+
- name: Get the sample app and EC2 instance information
151+
working-directory: terraform/java/ec2/asg
152+
run: |
153+
main_service_instance_id=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names ec2-single-asg-${{ env.TESTING_ID }} --region ${{ env.E2E_TEST_AWS_REGION }} --query "AutoScalingGroups[].Instances[0].InstanceId" --output text)
154+
main_service_private_dns_name=$(aws ec2 describe-instances --instance-ids $main_service_instance_id --region ${{ env.E2E_TEST_AWS_REGION }} --query "Reservations[].Instances[].PrivateDnsName" --output text)
155+
echo "INSTANCE_ID=$main_service_instance_id" >> $GITHUB_ENV
156+
echo "MAIN_SERVICE_ENDPOINT=localhost:8080" >> $GITHUB_ENV
157+
echo "PRIVATE_DNS_NAME=$main_service_private_dns_name" >> $GITHUB_ENV
158+
echo "EC2_INSTANCE_AMI=$(terraform output ec2_instance_ami)" >> $GITHUB_ENV
159+
echo "REMOTE_SERVICE_IP=$(terraform output sample_app_remote_service_private_ip)" >> $GITHUB_ENV
206160
207161
- name: Initiate Gradlew Daemon
208162
if: steps.initiate-gradlew == 'failure'

.github/workflows/java-ec2-default-e2e-test.yml

Lines changed: 2 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ env:
3030
LOG_GROUP_NAME: /aws/application-signals/data
3131
TEST_RESOURCES_FOLDER: ${GITHUB_WORKSPACE}
3232

33-
3433
jobs:
3534
java-ec2-default:
3635
runs-on: ubuntu-latest
@@ -129,42 +128,6 @@ jobs:
129128
echo "Terraform deployment was unsuccessful. Will attempt to retry deployment."
130129
fi
131130
132-
# If the deployment_failed is still 0, then the terraform deployment succeeded and now try to connect to the endpoint.
133-
# Attempts to connect will be made for up to 10 minutes
134-
if [ $deployment_failed -eq 0 ]; then
135-
echo "Attempting to connect to the endpoint"
136-
main_sample_app_endpoint=http://$(terraform output sample_app_main_service_public_dns):8080
137-
attempt_counter=0
138-
max_attempts=30
139-
until $(curl --output /dev/null --silent --head --fail $(echo "$main_sample_app_endpoint" | tr -d '"')); do
140-
if [ ${attempt_counter} -eq ${max_attempts} ];then
141-
echo "Failed to connect to endpoint. Will attempt to redeploy sample app."
142-
deployment_failed=1
143-
break
144-
fi
145-
146-
printf '.'
147-
attempt_counter=$(($attempt_counter+1))
148-
sleep 10
149-
done
150-
151-
echo "Attempting to connect to the remote sample app endpoint"
152-
remote_sample_app_endpoint=http://$(terraform output sample_app_remote_service_public_ip):8080/healthcheck
153-
attempt_counter=0
154-
max_attempts=30
155-
until $(curl --output /dev/null --silent --head --fail $(echo "$remote_sample_app_endpoint" | tr -d '"')); do
156-
if [ ${attempt_counter} -eq ${max_attempts} ];then
157-
echo "Failed to connect to endpoint. Will attempt to redeploy sample app."
158-
deployment_failed=1
159-
break
160-
fi
161-
162-
printf '.'
163-
attempt_counter=$(($attempt_counter+1))
164-
sleep 10
165-
done
166-
fi
167-
168131
# If the success is 1 then either the terraform deployment or the endpoint connection failed, so first destroy the
169132
# resources created from terraform and try again.
170133
if [ $deployment_failed -eq 1 ]; then
@@ -192,19 +155,10 @@ jobs:
192155
- name: Get the sample app and EC2 instance information
193156
working-directory: terraform/java/ec2/default
194157
run: |
195-
echo "MAIN_SERVICE_ENDPOINT=$(terraform output sample_app_main_service_public_dns):8080" >> $GITHUB_ENV
196-
echo "REMOTE_SERVICE_IP=$(terraform output sample_app_remote_service_public_ip)" >> $GITHUB_ENV
158+
echo "MAIN_SERVICE_ENDPOINT=localhost:8080" >> $GITHUB_ENV
159+
echo "REMOTE_SERVICE_IP=$(terraform output sample_app_remote_service_private_ip)" >> $GITHUB_ENV
197160
echo "MAIN_SERVICE_INSTANCE_ID=$(terraform output main_service_instance_id)" >> $GITHUB_ENV
198161
199-
# This steps increases the speed of the validation by creating the telemetry data in advance
200-
- name: Call all test APIs
201-
continue-on-error: true
202-
run: |
203-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/outgoing-http-call"
204-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/aws-sdk-call?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"
205-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/remote-service?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"
206-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}/client-call"
207-
208162
- name: Initiate Gradlew Daemon
209163
if: steps.initiate-gradlew == 'failure'
210164
uses: ./.github/workflows/actions/execute_and_retry

.github/workflows/java-eks-e2e-test.yml

Lines changed: 8 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -208,8 +208,9 @@ jobs:
208208
-var="rds_mysql_cluster_endpoint=${{env.RDS_MYSQL_CLUSTER_ENDPOINT}}" \
209209
-var="rds_mysql_cluster_username=${{env.RDS_MYSQL_CLUSTER_SECRETS_USERNAME}}" \
210210
-var='rds_mysql_cluster_password=${{env.RDS_MYSQL_CLUSTER_SECRETS_PASSWORD}}' \
211-
|| deployment_failed=$?
212-
211+
-var='account_id=${{ env.ACCOUNT_ID }}' \
212+
|| deployment_failed=$?
213+
213214
if [ $deployment_failed -ne 0 ]; then
214215
echo "Terraform deployment was unsuccessful. Will attempt to retry deployment."
215216
fi
@@ -232,39 +233,6 @@ jobs:
232233
233234
execute_and_retry 2 "kubectl delete pods --all -n ${{ env.SAMPLE_APP_NAMESPACE }}" "" 60
234235
execute_and_retry 2 "kubectl wait --for=condition=Ready --request-timeout '5m' pod --all -n ${{ env.SAMPLE_APP_NAMESPACE }}" "" 10
235-
236-
echo "Attempting to connect to the main sample app endpoint"
237-
main_sample_app_endpoint=http://$(terraform output sample_app_endpoint)
238-
attempt_counter=0
239-
max_attempts=60
240-
until $(curl --output /dev/null --silent --head --fail $(echo "$main_sample_app_endpoint" | tr -d '"')); do
241-
if [ ${attempt_counter} -eq ${max_attempts} ];then
242-
echo "Failed to connect to endpoint ($main_sample_app_endpoint). Will attempt to redeploy sample app."
243-
deployment_failed=1
244-
break
245-
fi
246-
247-
printf '.'
248-
attempt_counter=$(($attempt_counter+1))
249-
sleep 10
250-
done
251-
252-
echo "Attempting to connect to the remote sample app endpoint"
253-
remote_sample_app_endpoint=http://$(terraform output sample_remote_app_endpoint)/healthcheck
254-
echo $remote_sample_app_endpoint
255-
attempt_counter=0
256-
max_attempts=30
257-
until $(curl --output /dev/null --silent --head --fail $(echo "$remote_sample_app_endpoint" | tr -d '"')); do
258-
if [ ${attempt_counter} -eq ${max_attempts} ];then
259-
echo "Failed to connect to endpoint. Will attempt to redeploy sample app."
260-
deployment_failed=1
261-
break
262-
fi
263-
264-
printf '.'
265-
attempt_counter=$(($attempt_counter+1))
266-
sleep 10
267-
done
268236
fi
269237
270238
# If the deployment_failed is 1 then either the terraform deployment or the endpoint connection failed, so first destroy the
@@ -333,18 +301,13 @@ jobs:
333301
echo "REMOTE_SERVICE_POD_IP=$(kubectl get pods -n ${{ env.SAMPLE_APP_NAMESPACE }} --selector=app=remote-app -o jsonpath='{.items[0].status.podIP}')" >> $GITHUB_ENV
334302
335303
- name: Get the sample app endpoint
336-
working-directory: terraform/java/eks
337-
run: echo "APP_ENDPOINT=$(terraform output sample_app_endpoint)" >> $GITHUB_ENV
304+
run: echo "APP_ENDPOINT=$(kubectl get pods -n ${{ env.SAMPLE_APP_NAMESPACE }} --selector=app=sample-app -o jsonpath='{.items[0].status.podIP}'):8080" >> $GITHUB_ENV
338305

339-
# This steps increases the speed of the validation by creating the telemetry data in advance
340-
- name: Call all test APIs
341-
continue-on-error: true
306+
- name: Set endpoints for the traffic generator
342307
run: |
343-
curl -S -s "http://${{ env.APP_ENDPOINT }}/outgoing-http-call"
344-
curl -S -s "http://${{ env.APP_ENDPOINT }}/aws-sdk-call?ip=${{ env.REMOTE_SERVICE_POD_IP }}&testingId=${{ env.TESTING_ID }}"
345-
curl -S -s "http://${{ env.APP_ENDPOINT }}/remote-service?ip=${{ env.REMOTE_SERVICE_POD_IP }}&testingId=${{ env.TESTING_ID }}"
346-
curl -S -s "http://${{ env.APP_ENDPOINT }}/client-call"
347-
curl -S -s "http://${{ env.APP_ENDPOINT }}/mysql"
308+
# Add the appropriate environment variables to the traffic generator
309+
kubectl set env -n ${{ env.SAMPLE_APP_NAMESPACE }} deployment/traffic-generator MAIN_ENDPOINT=${{ env.APP_ENDPOINT }}
310+
kubectl set env -n ${{ env.SAMPLE_APP_NAMESPACE }} deployment/traffic-generator REMOTE_ENDPOINT=${{ env.REMOTE_SERVICE_POD_IP }}
348311
349312
- name: Initiate Gradlew Daemon
350313
if: steps.initiate-gradlew == 'failure'

.github/workflows/java-k8s-e2e-test.yml

Lines changed: 5 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -125,35 +125,11 @@ jobs:
125125
-var="patch_image_arn=${{ env.PATCH_IMAGE_ARN }}" \
126126
-var="release_testing_ecr_account=${{ env.RELEASE_TESTING_ECR_ACCOUNT }}"
127127
128-
- name: Get Remote Service IP
128+
- name: Get Main and Remote Service IP
129129
run: |
130+
echo MAIN_SERVICE_IP="$(aws ssm get-parameter --region ${{ env.E2E_TEST_AWS_REGION }} --name main-service-ip-${{ env.TESTING_ID }} | jq -r '.Parameter.Value')" >> $GITHUB_ENV
130131
echo REMOTE_SERVICE_IP="$(aws ssm get-parameter --region ${{ env.E2E_TEST_AWS_REGION }} --name remote-service-ip-${{ env.TESTING_ID }} | jq -r '.Parameter.Value')" >> $GITHUB_ENV
131132
132-
- name: Wait for app endpoint to come online
133-
id: endpoint-check
134-
run: |
135-
attempt_counter=0
136-
max_attempts=30
137-
until $(curl --output /dev/null --silent --head --fail http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/); do
138-
if [ ${attempt_counter} -eq ${max_attempts} ];then
139-
echo "Max attempts reached"
140-
exit 1
141-
fi
142-
143-
printf '.'
144-
attempt_counter=$(($attempt_counter+1))
145-
sleep 10
146-
done
147-
# This steps increases the speed of the validation by creating the telemetry data in advance
148-
# It is run after the gradle build to give the app time to initialize after the pods become ready
149-
- name: Call all test APIs
150-
continue-on-error: true
151-
run: |
152-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/outgoing-http-call"; echo
153-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/aws-sdk-call?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"; echo
154-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/remote-service?ip=${{ env.REMOTE_SERVICE_IP }}&testingId=${{ env.TESTING_ID }}"; echo
155-
curl -S -s "http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100/client-call"; echo
156-
157133
- name: Initiate Gradlew Daemon
158134
if: steps.initiate-gradlew == 'failure'
159135
uses: ./.github/workflows/actions/execute_and_retry
@@ -169,7 +145,7 @@ jobs:
169145
id: log-validation
170146
run: ./gradlew validator:run --args='-c java/k8s/log-validation.yml
171147
--testing-id ${{ env.TESTING_ID }}
172-
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
148+
--endpoint http://${{ env.MAIN_SERVICE_IP }}:8080
173149
--region ${{ env.E2E_TEST_AWS_REGION }}
174150
--account-id ${{ env.ACCOUNT_ID }}
175151
--metric-namespace ${{ env.METRIC_NAMESPACE }}
@@ -186,7 +162,7 @@ jobs:
186162
if: (success() || steps.log-validation.outcome == 'failure') && !cancelled()
187163
run: ./gradlew validator:run --args='-c java/k8s/metric-validation.yml
188164
--testing-id ${{ env.TESTING_ID }}
189-
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
165+
--endpoint http://${{ env.MAIN_SERVICE_IP }}:8080
190166
--region ${{ env.E2E_TEST_AWS_REGION }}
191167
--account-id ${{ env.ACCOUNT_ID }}
192168
--metric-namespace ${{ env.METRIC_NAMESPACE }}
@@ -204,7 +180,7 @@ jobs:
204180
if: (success() || steps.log-validation.outcome == 'failure' || steps.metric-validation.outcome == 'failure') && !cancelled()
205181
run: ./gradlew validator:run --args='-c java/k8s/trace-validation.yml
206182
--testing-id ${{ env.TESTING_ID }}
207-
--endpoint http://${{ env.MAIN_SERVICE_ENDPOINT }}:30100
183+
--endpoint http://${{ env.MAIN_SERVICE_IP }}:8080
208184
--region ${{ env.E2E_TEST_AWS_REGION }}
209185
--account-id ${{ env.ACCOUNT_ID }}
210186
--metric-namespace ${{ env.METRIC_NAMESPACE }}

0 commit comments

Comments
 (0)