Skip to content

Add Terraform Init to Dockerfile and also create ECR per region #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

harrryr
Copy link
Contributor

@harrryr harrryr commented Jul 3, 2024

Issue description:
E2E tests pull image from ECR when workflow starts. Due to the high number of E2E tests using the same ECR, the ECR often throttles and causes the test to fail relatively frequently. This issue will become worse as we add more language and platform support, and currently requires asking for a service quota increase manually. Additionally, another top issue for E2E test failure is due to Terraform failing to initialize.

Description of changes:
We decided to allocate an ECR for each region. Since there are currently 9 canaries running, it is not possible for it to exceed the 10 request/s API limit so this should resolve the throttling issue

The e2e-test-docker-image-build.yml has been seperated into two workflows: github-runner-image-build.yml and github-runner-image-push.yml. The reason for this change is due to the following circumstances:

  1. We build a docker image for each different canaries (The only difference between the images is the Terraform Init. This won't have an impact on the API throttling issue, but it was compartmentalize the images to each different use case).
  2. Each image is 3.5gb in size. Github runner disk size is 12gb so we can't build all the artifacts and push it to ECR. We have to build and delete the file one by one to manage disk space. Note: Out of the 3.5gb, 700mb is from the Terraform Init, so we considered building all the Terraform code in one image since the final size will be 2.8gb + 0.7gb * 8 = 8.4gb. However, this isn't scalable since if we add 2 more languages we will encounter the same issue again.
  3. To loop through each image, Github API was used. However, Github API doesn't support downloading artifacts from current runs, it only allows downloading artifacts from completed runs.
  4. Therefore, two workflow files were created. github-runner-image-build.yml will run whenever there is a change in Terraform or Dockerfile code and build the images, github-runner-image-push.yml will run whenever the previous workflow run is ran, then push the images to the ECR.

As mentioned above, Terraform Init is now done during docker build. However, since the build and push process takes around an hour, there is an edge case where the workflow files are updated to the latest PR but the docker image isn't updated yet with the latest Terraform. To resolve this 1 hour period, extra steps were added in the workflow file to check if there is a difference in the latest and docker's Terraform file, and if there is, then run the Terraform Init step inside the workflow.

Test
Several test runs:

  • Testing to check if github-runner-image-build.yml builds images properly: Link
  • Testing to check if github-runner-image-push.yml pushes images if provided with the correct run_id: Link
  • The workflow_run condition only works in the main branch. Therefore, this condition was tested on forked repo: (Build Link, Push Link with correct run_id
  • Testing to check if the Terraform Diff logic triggers if there is difference: Link
  • E2E test to check if it works with the new images: Link, Link for K8s

Note: Currently the images have been deployed to all the regions. However, we should run this workflow again after this PR is merged to double check that it works as expected

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch from ef08e25 to 7cc27cf Compare July 3, 2024 17:14
georgeboc pushed a commit to georgeboc/aws-application-signals-test-framework that referenced this pull request Jul 8, 2024
Fix Adot main_build.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch 27 times, most recently from 9087a36 to d8c305d Compare July 16, 2024 21:30
@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch 10 times, most recently from 14a1f8b to ad10c1e Compare July 19, 2024 17:58
@harrryr harrryr changed the title Add Terraform Init to Dockerfile and also create ECR per canary Add Terraform Init to Dockerfile and also create ECR per region Jul 19, 2024
@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch 5 times, most recently from 53768d8 to f54725b Compare July 19, 2024 21:22
Copy link
Contributor

@majanjua-amzn majanjua-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this PR into two PRs? One for image build/push, then we can run a test with the officially created new images, then we can update all the canaries in a second PR

Comment on lines 42 to 46
- name: Upload docker image
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.terraform-dir.name }}.tar
path: ${{ matrix.terraform-dir.name }}.tar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To where? Not sure I understand what's happening here, where do the images built go if they're not pushed immediately?
Does this rely on GH's version of ECR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github can store artifacts created by workflows. The artifacts are attached to the workflow run that created them. Not ECR, it can store any files such as .txt, .tar, etc. As you can see, we converted the image to a .tar file and uploaded it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we enforce the push is run right after the build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through the workflow run condition "Build Github Runner Image", type: "completed"

- name: Output Public ECR Url
id: get-public-ecr-url
run: |
echo "public_ecr_url=$(aws ecr-public describe-repositories --repository-names ${{ env.E2E_RUNNER_ECR_NAME }} --query "repositories[0].repositoryUri" --output text)" >> $GITHUB_OUTPUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we can use [0] because we're specifically looking for E2E_RUNNER_ECR_NAME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 8 to 9
workflows:
- "Build Github Runner Image"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this runs after the other workflow or does it trigger when the other triggers? ie. are they parallelized or serialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are serialized. It get triggered when the "Build Github Runner Image" workflow run is completed on the main branch

Copy link
Contributor

@majanjua-amzn majanjua-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this for one canary first, like Java EKS, to see the result?

@@ -31,10 +31,39 @@ env:
TEST_RESOURCES_FOLDER: ${GITHUB_WORKSPACE}

jobs:
get-public-ecr-url:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this as an action?

harrryr added a commit that referenced this pull request Jul 24, 2024
*Issue description:*
Splitting this PR into two to reduce scope:
#105

This PR will cover building and releasing the image into the public ECR.
Once merged and images released successfully, will make second PR for
the canaries.

- Testing to check if `github-runner-image-build.yml` builds images
properly:
[Link](https://github.com/aws-observability/aws-application-signals-test-framework/actions/runs/10012887630)
- Testing to check if `github-runner-image-push.yml` pushes images if
provided with the correct run_id:
[Link](https://github.com/aws-observability/aws-application-signals-test-framework/actions/runs/10012993832)

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@harrryr harrryr closed this Jul 24, 2024
@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch from 004b7c0 to af5bff3 Compare July 24, 2024 23:11
@harrryr harrryr reopened this Jul 24, 2024
@harrryr harrryr closed this Jul 24, 2024
@harrryr harrryr force-pushed the add-terraform-to-dockerfile branch from b62f363 to af5bff3 Compare July 24, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants