Allow TargetGroup endpoints outside the ELB VPC #1862

shoekstra · 2021-03-06T21:11:51Z

To support the scenario where the ELB is deployed in a separate VPC to the EKS cluster, we need to conditionally set the AvailabilityZone field to all when the target is outside the VPC as per the TargetDescription docs.

This PR addresses this by checking if the pod IP is found in the ELB VPC's CIDR or a secondary CIDR; to improve performance and reduce calls to the AWS API, VPC info is cached for a configurable amount of time, defaulting to 5 minutes.

Signed-off-by: Stephen Hoekstra [email protected]

k8s-ci-robot · 2021-03-06T21:11:53Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2021-03-06T21:11:58Z

Welcome @shoekstra!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-03-06T21:11:59Z

Hi @shoekstra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

codecov-io · 2021-03-06T21:15:48Z

Codecov Report

Merging #1862 (e39d354) into main (04c9bf5) will decrease coverage by 0.10%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #1862      +/-   ##
==========================================
- Coverage   47.98%   47.88%   -0.11%     
==========================================
  Files         110      110              
  Lines        6177     6190      +13     
==========================================
  Hits         2964     2964              
- Misses       2938     2951      +13     
  Partials      275      275

Impacted Files	Coverage Δ
pkg/targetgroupbinding/networking_manager.go	`24.78% <0.00%> (-0.11%)`	⬇️
pkg/targetgroupbinding/resource_manager.go	`18.25% <0.00%> (-0.96%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04c9bf5...e39d354. Read the comment docs.

acjohnson · 2021-03-08T22:16:54Z

👍

shoekstra · 2021-03-12T13:02:42Z

CLA has been signed 👍

shoekstra · 2021-03-12T13:03:08Z

/assign @M00nF1sh

M00nF1sh · 2021-03-17T21:29:19Z

@shoekstra
Thanks for making this change 👍
I have some concerns with current implementation:

the DescribeVPC is been called within the critical path of target registration(which is been called frequently). There should be another component that retrieve VPC CIDR with cache. (e.g. cache with 5mins, so that added secondary cidr can work).
we should handle VPC with multiple secondary CIDRs.

shoekstra · 2021-03-18T21:02:05Z

Hey @M00nF1sh, thanks for the feedback!

I understand both concerns...

I wasn't sure where to do the call; do I understand that it's OK where it is, but I should move it to it's own function that has a 5min cache? I wasn't sure where else to put it, if there is a better place pls let me know.
Good point, I'll update the PR to check all secondary CIDRs.

cmd-werner-diers · 2021-04-23T01:21:01Z

Hi @shoekstra any change to fix the conflicts so that your fix can be merged? Cheers

shoekstra · 2021-04-25T21:10:03Z

Hi @M00nF1sh,

I took inspiration from the AZ info interface and gave this another go by implementing the feature in a similar way. I think I've now satisfied the points in your previous feedback, please let me know.

I was thinking to also set target-type to ip if a pod is in a remote VPC, but didn't implement that yet. For now it's still required to set alb.ingress.kubernetes.io/target-type: ip in the ingress or connecting to pods in a remote VPC will fail...

Thanks!

M00nF1sh · 2021-11-18T01:17:52Z

/approve

k8s-ci-robot · 2021-11-18T01:18:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: M00nF1sh, shoekstra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [M00nF1sh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

M00nF1sh · 2021-11-18T02:21:27Z

/lgtm

shoekstra · 2021-11-21T16:26:25Z

Great to hear @M00nF1sh! All sounds good to me. Looking forward to switching back to the official image 👍

acjohnson · 2021-12-09T00:41:22Z

So happy this has shipped; I just tested using the official v2.3.1 image and it's working perfectly. Thank you @shoekstra @M00nF1sh and all 🎉

wuyudian1 · 2022-05-31T10:09:46Z

We use amazon/aws-alb-ingress-controller:v2.4.1 installed by helm(chart: aws-load-balancer-controller-1.4.1, app version: v2.4.1)

Still can not Register pod to a TargetGroup(in another VPC), the controoler log:

{"level":"error","ts":1653988309.5380044,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"tgb-cloud-appserverweb","namespace":"us-platform","error":"ValidationError: The Availability Zone for IP address '10.151.135.88' must be 'all' for Application Load Balancer target groups, when not within the VPC\n\tstatus code: 400, request id: 1b1b5648-be4e-45fb-89a0-e601bc232b0b"}

nuriel77 · 2022-05-31T10:34:12Z

@wuyudian1 you need to make sure that your targets are in zone "all" and not on a specific zone.

wuyudian1 · 2022-05-31T10:46:33Z

The targets are my PODs, do you mean that I need to setup some configurations in the Deployment to change their AZ to be "all"?

wuyudian1 · 2022-05-31T11:24:07Z

I dont know how to make it "all", hoping someone may help me

wuyudian1 · 2022-05-31T15:03:13Z

My TargetGroupBinding definition:

apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: tgb-cloud-appserverweb
  namespace: us-platform
spec:
  serviceRef:
    name: svc-clusterip-cloud-appserverweb
    port: 8171
  targetGroupARN: arn:aws:elasticloadbalancing:us-east-1:94...0:targetgroup/xxxxxx/93e...2b9
  targetType: ip

Contoller log:

{"level":"info","ts":1654007320.3697112,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:us-east-1:94...0:targetgroup/xxxxxx/93e...2b9","targets":[{"AvailabilityZone":null,"Id":"10.151.135.88","Port":8171}]}
{"level":"error","ts":1654007320.6442428,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"tgb-cloud-appserverweb","namespace":"us-platform","error":"ValidationError: The Availability Zone for IP address '10.151.135.88' must be 'all' for Application Load Balancer target groups, when not within the VPC\n\tstatus code: 400, request id: 9ac57241-60d9-4875-9bcd-e86ed867e7a6"}

We can see that LB Controller got "null" at AvailabilityZone, so, I'm wondering how to fix it

nuriel77 · 2022-05-31T15:34:43Z

@wuyudian1

Is your ALB configured in all Availability Zones?
Did you set the ALB's vpcId in the Helm values?
Sorry if this is not accurate with regards to what may be the reason, it has been a while since I've touched this.

wuyudian1 · 2022-06-01T02:54:52Z

@nuriel77 set the vpcId of VPC where my TrgetGroup should be in, it works fine, thanks for your response.

wuyudian1 · 2022-06-15T02:23:43Z

If I understand correctly then yes, you'll need to deploy multiple instances of the controller:

Create load-balancers normally in the EKS VPC

To support this you can deploy an instance with no VPC ID specified (it'll default to the VPC where the EKS instance resides)

Use a TargetGroupBinding to manage an existing TargetGroup that exists on a LB on a different VPC

To support this you can deploy an additional instance/ingress class with the VPC ID of the VPC hosting the LB specified. I didn't test this myself but others indicated using TargetGroupBinding with an LB in a remote VPC is working for them.

@shoekstra If there's more than one LB controller instance(with vpcID set to different values), how dose a TargetGroupBinding object(which trying to bind POD IPs to a specific TargetGroup in a specific VPC) choose the correct/proper contoller instance ?

quocle-vizio · 2022-06-16T20:37:51Z

i am having this issue too, deployed the controller in another VPC (ingress vpc), the alb gets provision and the target groups gets created, but no registered target. Here is the log I get from the controller.
{"level":"error","ts":1655411581.6259413,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-echoserv-echoserv-43459c5438","namespace":"echoserver","error":"InvalidGroup.NotFound: You have specified two resources that belong to different networks.\n\tstatus code: 400, request id: 28737654-657c-40a0-9b72-ed242e9fd57f"}

any ideas?

* Allow TargetGroup endpoints outside the ELB VPC To support the scenario where the ELB is deployed in a separate VPC to the EKS cluster, we need to conditionally set the `AvailabilityZone` field to `all` when the target is outside of the ELB VPC. This commit addresses this by checking if the pod IP is found in the ELB VPC's CIDR or a secondary CIDR; to improve performance and reduce calls to the AWS API, VPC info is cached for a configurable amount of time, defaulting to 5 minutes. Signed-off-by: Stephen Hoekstra <[email protected]> * Make isELBV2TargetInELBVPC more efficient Don't need to check vpc.CidrBlock as it is part of vpc.CidrBlockAssociationSet. isIPinCIDR is now also a bit more robust in checking if an IP is part of a CIDR. Signed-off-by: Stephen Hoekstra <[email protected]> * Add vpcID to defaultResourceManager We're already passing this into NewDefaultResourceManager so no need to also add it to NetworkingManager. Signed-off-by: Stephen Hoekstra <[email protected]> * Change VPC cache duration to 10m Also switched cache value to be time.Duration from int. Signed-off-by: Stephen Hoekstra <[email protected]> * Whoops, we should return an error if found... * Use apimachinery/pkg/util/cache for caching Signed-off-by: Stephen Hoekstra <[email protected]> * remove the aws-vpc-cache-ttl flag fix unit tests Co-authored-by: M00nF1sh <[email protected]>

codecov-commenter · 2024-12-13T21:02:32Z

Codecov Report

Attention: Patch coverage is 50.00000% with 29 lines in your changes missing coverage. Please review.

Project coverage is 54.18%. Comparing base (db5fc12) to head (4b5d1a7).
Report is 534 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/targetgroupbinding/resource_manager.go	30.00%	12 Missing and 2 partials ⚠️
pkg/networking/vpc_info_provider_mocks.go	0.00%	11 Missing ⚠️
pkg/networking/vpc_info_provider.go	85.18%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1862      +/-   ##
==========================================
- Coverage   54.19%   54.18%   -0.02%     
==========================================
  Files         136      138       +2     
  Lines        7702     7757      +55     
==========================================
+ Hits         4174     4203      +29     
- Misses       3212     3234      +22     
- Partials      316      320       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 6, 2021

k8s-ci-robot requested review from kishorj and M00nF1sh March 6, 2021 21:11

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 6, 2021

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 6, 2021

acjohnson mentioned this pull request Mar 8, 2021

added all to targetgroup update #1845

Closed

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 12, 2021

k8s-ci-robot assigned M00nF1sh Mar 12, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2021

shoekstra force-pushed the allow_ips_outside_elb_vpc branch from e39d354 to e0aed65 Compare April 25, 2021 20:45

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 25, 2021

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2021

shoekstra force-pushed the allow_ips_outside_elb_vpc branch from e0aed65 to 62e9d67 Compare April 26, 2021 09:19

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2021

shoekstra force-pushed the allow_ips_outside_elb_vpc branch from 62e9d67 to c5c46fd Compare May 10, 2021 14:26

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 10, 2021

M00nF1sh force-pushed the allow_ips_outside_elb_vpc branch from 365ca4e to 4b5d1a7 Compare November 17, 2021 23:02

M00nF1sh added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 18, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2021

k8s-ci-robot merged commit 1ca12db into kubernetes-sigs:main Nov 18, 2021

M00nF1sh mentioned this pull request Nov 19, 2021

support IPv6 Pods during AZ detection #2375

Merged

6 tasks

shoekstra deleted the allow_ips_outside_elb_vpc branch November 21, 2021 16:25

sgielen mentioned this pull request Nov 23, 2021

Failing to set up NLB when it is in another VPC than the EKS cluster #2381

Closed

oliviassss mentioned this pull request Dec 16, 2021

🥳 aws-load-balancer-controller v2.3.1 Automated Release! 🥑 oliviassss/eks-charts#7

Closed

M00nF1sh mentioned this pull request Jan 26, 2022

Question: Can we have ALB/NLB in one vpc and EKS cluster in other like Azure Application Gateway ingress controller ? #2465

Closed

oliviassss mentioned this pull request Feb 9, 2022

🥳 aws-load-balancer-controller v2.3.1 Automated Release! 🥑 oliviassss/eks-charts#18

Closed

KamikazeZirou mentioned this pull request May 13, 2022

[Feature] TargetGroupBinding for TargetGroup outside of ALB Controller's VPC #2650

Closed

rodrigorfk mentioned this pull request Mar 17, 2023

[Feature request] - Support for multiple VPCs on a single installation of the controller #3109

Closed

acjohnson mentioned this pull request Apr 25, 2023

[Feature Request] TargetGroupBinding can't bind pod ip to another vpc's targetgroup #1569

Closed

Allow TargetGroup endpoints outside the ELB VPC #1862

Allow TargetGroup endpoints outside the ELB VPC #1862

Uh oh!

Conversation

shoekstra commented Mar 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Mar 6, 2021

Uh oh!

k8s-ci-robot commented Mar 6, 2021

Uh oh!

k8s-ci-robot commented Mar 6, 2021

Uh oh!

codecov-io commented Mar 6, 2021

Codecov Report

Uh oh!

acjohnson commented Mar 8, 2021

Uh oh!

shoekstra commented Mar 12, 2021

Uh oh!

shoekstra commented Mar 12, 2021

Uh oh!

M00nF1sh commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoekstra commented Mar 18, 2021

Uh oh!

cmd-werner-diers commented Apr 23, 2021

Uh oh!

shoekstra commented Apr 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

M00nF1sh commented Nov 18, 2021

Uh oh!

k8s-ci-robot commented Nov 18, 2021

Uh oh!

M00nF1sh commented Nov 18, 2021

Uh oh!

shoekstra commented Nov 21, 2021

Uh oh!

acjohnson commented Dec 9, 2021

Uh oh!

wuyudian1 commented May 31, 2022

Uh oh!

nuriel77 commented May 31, 2022

Uh oh!

wuyudian1 commented May 31, 2022

Uh oh!

wuyudian1 commented May 31, 2022

Uh oh!

wuyudian1 commented May 31, 2022

Uh oh!

nuriel77 commented May 31, 2022

Uh oh!

wuyudian1 commented Jun 1, 2022

Uh oh!

wuyudian1 commented Jun 15, 2022

Uh oh!

quocle-vizio commented Jun 16, 2022

Uh oh!

codecov-commenter commented Dec 13, 2024

Codecov Report

Uh oh!

Uh oh!

shoekstra commented Mar 6, 2021 •

edited

Loading

M00nF1sh commented Mar 17, 2021 •

edited

Loading

shoekstra commented Apr 25, 2021 •

edited

Loading