Skip to content

Allow TargetGroup endpoints outside the ELB VPC #1862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shoekstra
Copy link
Contributor

@shoekstra shoekstra commented Mar 6, 2021

Closes #1569

To support the scenario where the ELB is deployed in a separate VPC to the EKS cluster, we need to conditionally set the AvailabilityZone field to all when the target is outside the VPC as per the TargetDescription docs.

This PR addresses this by checking if the pod IP is found in the ELB VPC's CIDR or a secondary CIDR; to improve performance and reduce calls to the AWS API, VPC info is cached for a configurable amount of time, defaulting to 5 minutes.

Signed-off-by: Stephen Hoekstra [email protected]

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 6, 2021
@k8s-ci-robot k8s-ci-robot requested review from kishorj and M00nF1sh March 6, 2021 21:11
@k8s-ci-robot
Copy link
Contributor

Welcome @shoekstra!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 6, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @shoekstra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 6, 2021
@codecov-io
Copy link

Codecov Report

Merging #1862 (e39d354) into main (04c9bf5) will decrease coverage by 0.10%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1862      +/-   ##
==========================================
- Coverage   47.98%   47.88%   -0.11%     
==========================================
  Files         110      110              
  Lines        6177     6190      +13     
==========================================
  Hits         2964     2964              
- Misses       2938     2951      +13     
  Partials      275      275              
Impacted Files Coverage Δ
pkg/targetgroupbinding/networking_manager.go 24.78% <0.00%> (-0.11%) ⬇️
pkg/targetgroupbinding/resource_manager.go 18.25% <0.00%> (-0.96%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04c9bf5...e39d354. Read the comment docs.

@acjohnson
Copy link

👍

@shoekstra
Copy link
Contributor Author

CLA has been signed 👍

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 12, 2021
@shoekstra
Copy link
Contributor Author

/assign @M00nF1sh

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Mar 17, 2021

@shoekstra
Thanks for making this change 👍
I have some concerns with current implementation:

  1. the DescribeVPC is been called within the critical path of target registration(which is been called frequently). There should be another component that retrieve VPC CIDR with cache. (e.g. cache with 5mins, so that added secondary cidr can work).
  2. we should handle VPC with multiple secondary CIDRs.

@shoekstra
Copy link
Contributor Author

Hey @M00nF1sh, thanks for the feedback!

I understand both concerns...

  1. I wasn't sure where to do the call; do I understand that it's OK where it is, but I should move it to it's own function that has a 5min cache? I wasn't sure where else to put it, if there is a better place pls let me know.
  2. Good point, I'll update the PR to check all secondary CIDRs.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2021
@cmd-werner-diers
Copy link

Hi @shoekstra any change to fix the conflicts so that your fix can be merged? Cheers

@shoekstra shoekstra force-pushed the allow_ips_outside_elb_vpc branch from e39d354 to e0aed65 Compare April 25, 2021 20:45
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 25, 2021
@shoekstra
Copy link
Contributor Author

shoekstra commented Apr 25, 2021

Hi @M00nF1sh,

I took inspiration from the AZ info interface and gave this another go by implementing the feature in a similar way. I think I've now satisfied the points in your previous feedback, please let me know.

I was thinking to also set target-type to ip if a pod is in a remote VPC, but didn't implement that yet. For now it's still required to set alb.ingress.kubernetes.io/target-type: ip in the ingress or connecting to pods in a remote VPC will fail...

Thanks!

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2021
@shoekstra shoekstra force-pushed the allow_ips_outside_elb_vpc branch from e0aed65 to 62e9d67 Compare April 26, 2021 09:19
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2021
@shoekstra shoekstra force-pushed the allow_ips_outside_elb_vpc branch from 62e9d67 to c5c46fd Compare May 10, 2021 14:26
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 10, 2021
@M00nF1sh M00nF1sh force-pushed the allow_ips_outside_elb_vpc branch from 365ca4e to 4b5d1a7 Compare November 17, 2021 23:02
@M00nF1sh M00nF1sh added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 18, 2021
@M00nF1sh
Copy link
Collaborator

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: M00nF1sh, shoekstra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2021
@M00nF1sh
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2021
@k8s-ci-robot k8s-ci-robot merged commit 1ca12db into kubernetes-sigs:main Nov 18, 2021
@shoekstra shoekstra deleted the allow_ips_outside_elb_vpc branch November 21, 2021 16:25
@shoekstra
Copy link
Contributor Author

Great to hear @M00nF1sh! All sounds good to me. Looking forward to switching back to the official image 👍

@acjohnson
Copy link

So happy this has shipped; I just tested using the official v2.3.1 image and it's working perfectly. Thank you @shoekstra @M00nF1sh and all 🎉

@wuyudian1
Copy link

We use amazon/aws-alb-ingress-controller:v2.4.1 installed by helm(chart: aws-load-balancer-controller-1.4.1, app version: v2.4.1)

Still can not Register pod to a TargetGroup(in another VPC), the controoler log:

{"level":"error","ts":1653988309.5380044,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"tgb-cloud-appserverweb","namespace":"us-platform","error":"ValidationError: The Availability Zone for IP address '10.151.135.88' must be 'all' for Application Load Balancer target groups, when not within the VPC\n\tstatus code: 400, request id: 1b1b5648-be4e-45fb-89a0-e601bc232b0b"}

@nuriel77
Copy link

@wuyudian1 you need to make sure that your targets are in zone "all" and not on a specific zone.
image

@wuyudian1
Copy link

The targets are my PODs, do you mean that I need to setup some configurations in the Deployment to change their AZ to be "all"?

@wuyudian1
Copy link

I dont know how to make it "all", hoping someone may help me

@wuyudian1
Copy link

My TargetGroupBinding definition:

apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: tgb-cloud-appserverweb
  namespace: us-platform
spec:
  serviceRef:
    name: svc-clusterip-cloud-appserverweb
    port: 8171
  targetGroupARN: arn:aws:elasticloadbalancing:us-east-1:94...0:targetgroup/xxxxxx/93e...2b9
  targetType: ip

Contoller log:

{"level":"info","ts":1654007320.3697112,"msg":"registering targets","arn":"arn:aws:elasticloadbalancing:us-east-1:94...0:targetgroup/xxxxxx/93e...2b9","targets":[{"AvailabilityZone":null,"Id":"10.151.135.88","Port":8171}]}
{"level":"error","ts":1654007320.6442428,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"tgb-cloud-appserverweb","namespace":"us-platform","error":"ValidationError: The Availability Zone for IP address '10.151.135.88' must be 'all' for Application Load Balancer target groups, when not within the VPC\n\tstatus code: 400, request id: 9ac57241-60d9-4875-9bcd-e86ed867e7a6"}

We can see that LB Controller got "null" at AvailabilityZone, so, I'm wondering how to fix it

@nuriel77
Copy link

@wuyudian1

  • Is your ALB configured in all Availability Zones?
  • Did you set the ALB's vpcId in the Helm values?
    Sorry if this is not accurate with regards to what may be the reason, it has been a while since I've touched this.

@wuyudian1
Copy link

@nuriel77 set the vpcId of VPC where my TrgetGroup should be in, it works fine, thanks for your response.

@wuyudian1
Copy link

If I understand correctly then yes, you'll need to deploy multiple instances of the controller:

  • Create load-balancers normally in the EKS VPC

To support this you can deploy an instance with no VPC ID specified (it'll default to the VPC where the EKS instance resides)

  • Use a TargetGroupBinding to manage an existing TargetGroup that exists on a LB on a different VPC

To support this you can deploy an additional instance/ingress class with the VPC ID of the VPC hosting the LB specified. I didn't test this myself but others indicated using TargetGroupBinding with an LB in a remote VPC is working for them.

@shoekstra If there's more than one LB controller instance(with vpcID set to different values), how dose a TargetGroupBinding object(which trying to bind POD IPs to a specific TargetGroup in a specific VPC) choose the correct/proper contoller instance ?

@quocle-vizio
Copy link

i am having this issue too, deployed the controller in another VPC (ingress vpc), the alb gets provision and the target groups gets created, but no registered target. Here is the log I get from the controller.
{"level":"error","ts":1655411581.6259413,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-echoserv-echoserv-43459c5438","namespace":"echoserver","error":"InvalidGroup.NotFound: You have specified two resources that belong to different networks.\n\tstatus code: 400, request id: 28737654-657c-40a0-9b72-ed242e9fd57f"}

any ideas?

Timothy-Dougherty pushed a commit to adammw/aws-load-balancer-controller that referenced this pull request Nov 9, 2023
* Allow TargetGroup endpoints outside the ELB VPC

To support the scenario where the ELB is deployed in a separate VPC to
the EKS cluster, we need to conditionally set the `AvailabilityZone`
field to `all` when the target is outside of the ELB VPC.

This commit addresses this by checking if the pod IP is found
in the ELB VPC's CIDR or a secondary CIDR; to improve performance and
reduce calls to the AWS API, VPC info is cached for a configurable
amount of time, defaulting to 5 minutes.

Signed-off-by: Stephen Hoekstra <[email protected]>

* Make isELBV2TargetInELBVPC more efficient

Don't need to check vpc.CidrBlock as it is part of vpc.CidrBlockAssociationSet. isIPinCIDR is now also a bit more robust in checking if an IP is part of a CIDR.

Signed-off-by: Stephen Hoekstra <[email protected]>

* Add vpcID to defaultResourceManager

We're already passing this into NewDefaultResourceManager so no need to also add it to NetworkingManager.

Signed-off-by: Stephen Hoekstra <[email protected]>

* Change VPC cache duration to 10m

Also switched cache value to be time.Duration from int.

Signed-off-by: Stephen Hoekstra <[email protected]>

* Whoops, we should return an error if found...

* Use apimachinery/pkg/util/cache for caching

Signed-off-by: Stephen Hoekstra <[email protected]>

* remove the aws-vpc-cache-ttl flag
fix unit tests

Co-authored-by: M00nF1sh <[email protected]>
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 50.00000% with 29 lines in your changes missing coverage. Please review.

Project coverage is 54.18%. Comparing base (db5fc12) to head (4b5d1a7).
Report is 534 commits behind head on main.

Files with missing lines Patch % Lines
pkg/targetgroupbinding/resource_manager.go 30.00% 12 Missing and 2 partials ⚠️
pkg/networking/vpc_info_provider_mocks.go 0.00% 11 Missing ⚠️
pkg/networking/vpc_info_provider.go 85.18% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1862      +/-   ##
==========================================
- Coverage   54.19%   54.18%   -0.02%     
==========================================
  Files         136      138       +2     
  Lines        7702     7757      +55     
==========================================
+ Hits         4174     4203      +29     
- Misses       3212     3234      +22     
- Partials      316      320       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] TargetGroupBinding can't bind pod ip to another vpc's targetgroup