Skip to content

Retrieve VPC ID from EC2 instance the controller is running on if possible #2824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 22, 2022

Conversation

olemarkus
Copy link
Contributor

@olemarkus olemarkus commented Oct 3, 2022

Issue

#2430

Description

Right now either metadata has to be available or one has to pass VPC ID as a flag.
This PR tries to fetch the VPC ID from the EC2 API if it can infer the instance ID from the node name.

It uses the "new" node name convention from cloud controller manager, which works well for my use case. A follow up would be to try to fetch the node object and extract the instance ID from the provider ID.

This approach was tested running LBC on instances with IMDS max-hop-limit 1 and IRSA enabled and it ran successfully without setting the vpc id flag.

The LBC Deployment requires the following:

spec:
  template:
    spec:
      containers:
      - env:
        - name: NODENAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

If this approach looks good, I can do any of the above deemed necessary.

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 3, 2022
pkg/aws/cloud.go Outdated
cfg.VpcID = *vpcID
}

} else {
Copy link
Collaborator

@M00nF1sh M00nF1sh Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's better to use metadata first, and then fallback to use nodeName given metadata is cheaper.
We might change our default manifest to have this NODEName env injected always, doing so can save an AWS api call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! Done.

pkg/aws/cloud.go Outdated
output, err := ec2Service.DescribeInstances(&ec2.DescribeInstancesInput{
InstanceIds: []*string{&nodeName},
})
if err != nil {
Copy link
Collaborator

@M00nF1sh M00nF1sh Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should change our logic here, if we don't have permission to describe the instance, we still should be able to fallback to metadata path.
similarly, if we check for metadata first then instance id(see my previous comment), we should change the logic to not fall when metadata is not available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also done.
Means less meaningful errors returned, but perhaps not very important anyway.

pkg/aws/cloud.go Outdated
ec2Service := services.NewEC2(sess)

if len(cfg.VpcID) == 0 {
nodeName := os.Getenv("NODENAME")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit comment, given this function is a bit long,
maybe created a dedicated file like pkg/aws/utils/vpc_id.go, and have functions like

inferVPCID(sess, metadata), which invokes inferVPCIDFromNodeName
inferVPCIDFromNodeName(sess, nodeName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't spot this comment, but ended up doing pretty much the same you describe here as it makes the fallback logic easier. I put the function below NewCloud though.

Copy link
Collaborator

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me with some nit comments

@olemarkus olemarkus requested review from M00nF1sh and removed request for kishorj October 6, 2022 20:27
@codecov-commenter
Copy link

Codecov Report

Base: 54.05% // Head: 54.05% // No change to project coverage 👍

Coverage data is based on head (331f509) compared to base (18019d0).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2824   +/-   ##
=======================================
  Coverage   54.05%   54.05%           
=======================================
  Files         144      144           
  Lines        8278     8278           
=======================================
  Hits         4475     4475           
  Misses       3479     3479           
  Partials      324      324           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -114,9 +108,19 @@ func NewCloud(cfg CloudConfig, metricsRegisterer prometheus.Registerer) (Cloud,
metricsCollector.InjectHandlers(&sess.Handlers)
}

ec2Service := services.NewEC2(sess)

if len(cfg.VpcID) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think we should use same check here, either len(cfg.VpcID) ==0 or vpcID == ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were referring to this one and the one two lines below? The latter was replaced by an err check now that the infer function returns errors.

pkg/aws/cloud.go Outdated
@@ -126,6 +130,37 @@ func NewCloud(cfg CloudConfig, metricsRegisterer prometheus.Registerer) (Cloud,
}, nil
}

func getVPC(metadata services.EC2Metadata, ec2Service services.EC2) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks the existing experience as it hide the error if metadata is not available since it hides the error.
I think change to below will be better:

inferVPCID(metadata, ec2Service) (vpcID, error) {
	var errors []err
	vpcID, err := metadata.vpcID()
	if err == nil {
		return vpcID
	}
	errors = append(errors, errors.Wrap(err, "cannot infer VPCID from metadata"))
	vpcID, err := inferVPCIDFromNodeName(...)
	if err == nil {
		return vpcID
	}
	errors = append(errors, err)
	return "", errors.NewAggregate(errors) # https://github.com/kubernetes/apimachinery/blob/478dd6ed1ec9b4cc3459dcd4f1d772013ac0103c/pkg/util/errors/errors.go#L46
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouh. Apimachinery has it's own error aggregate? Today I learnt. Will fix.

@olemarkus
Copy link
Contributor Author

Anything more needed here?

Copy link
Collaborator

@M00nF1sh M00nF1sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks :D now it looks good to me, we can include this in next release.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: M00nF1sh, olemarkus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2022
@k8s-ci-robot k8s-ci-robot merged commit fb1f93b into kubernetes-sigs:main Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants