Skip to content

WAFv2 support #1211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 18, 2020
Merged

WAFv2 support #1211

merged 10 commits into from
Apr 18, 2020

Conversation

Vlaaaaaaad
Copy link

@Vlaaaaaaad Vlaaaaaaad commented Mar 30, 2020

This PR adds support for attaching WAFv2 WebACLs to ALBs managed by the ALB Ingress Controller.

To do this, a new annotation is added:

  • alb.ingress.kubernetes.io/wafv2-acl-arn

Closes #1089


Now, this is a draft PR because I want to at least know I am on the right path. Early feedback and communication is good!
The code is not fully ready to be merged yet.

I saw that the AWS Console did allow me to attach both WAFv1 and WAFv2 WebACLs to the same ALB. That directly led to me creating a new controller for WAFv2, instead of extending the pre-existing WAF controller.
The WAFv2 implementation is heavily based on the existing WAF code and patterns.

Still to do, after I get a hint I am on the good path:

  • cleanup
    • a couple of random comments
    • a cache to implement so we don't get rate-limited - code deleted, no more need
  • tests. Unsure exactly what I can test, but we shall see
  • docs. I have to update the documentation with the new annotations and how they're used

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @Vlaaaaaaad. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 30, 2020
@rifelpet
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 30, 2020
@Vlaaaaaaad
Copy link
Author

Failed due to a timeout last time, so let's try again!

/test pull-aws-alb-ingress-controller-e2e-test

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Apr 1, 2020

/test pull-aws-alb-ingress-controller-e2e-test :D

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Apr 1, 2020

i think the tests fails due to lack of IAM permissions. the tests uses IAM permission from https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/master/docs/examples/iam-policy.json.
BTW, would we change the UX to just "waf-acl-arn" instead of "waf-acl-id" & "waf-acl-name". It's ok to this "waf-acl-arn" for wafv2 only and "waf-acl-id" for old wafv1 since wafv2 is going to be deprecated.

Also, the UX is better to be like:

  1. If "waf-acl-arn" is not specified, do not make any wafv2 API calls.
  2. If "waf-acl-arn" is specified and not empty, make wafv2 API calls to configure waf(and delete waf when delete ingress)
  3. if "waf-acl-arn" is specified but empty, make wafv2 API calls to cleanup waf(and delete waf when delete ingress)
    This make sure it won't break ppl if they forget to add wafv2 iam permissions. and avoid unnecessary API calls.

@Vlaaaaaaad
Copy link
Author

🤦‍♂ Yup, that's a very likely reason why tests fail.
When reconciling, I am checking if the current ALB has any WAFv2 WebACL attached. That is done using the WAFv2 API. I'll change the flow to be backward compatible -- total fail on my part, sorry.

In regards to having waf-acl-arn instead of waf-v2-web-acl-id and waf-v2-web-acl-name, I have 2 concerns:

  • I would really put some v2 somewhere. I think wafv2-acl-arn would be clearer and better show the difference between the annotations
  • there is no way to get the ARN from the AWS Console. All the Console shows is the name and id. Not a huge concern for people that use automation, but adds an extra step for people that do not automate( say an AWS CLI call to get the ARN from the name and id`)

On the other hand, using the ARN directly would eliminate the need for me to call the AWS API to get the ARN from the name+id. And it would eliminate the need for a cache. I like that a lot, so I think the trade-off might be worth it. I can add docs for how the ARN can be discovered.

I'll try to fix the above issues by early next week!

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Apr 1, 2020

thanks :D wafv2-acl-arn make sence. And i think it could be an UI change for waf team to disclose the arn information in UI.

@rifelpet
Copy link
Contributor

rifelpet commented Apr 1, 2020

I just wanted to mention that the wafv2 ACL ARN is available in the management console if you click the gear icon in the top right corner of the table and enable the ARN column. Definitely not ideal nor intuitive but its at least possible.

@Vlaaaaaaad
Copy link
Author

Fixed it based on the AWS Shield implementation( #1126).

I was a bit concerned about braking expectations: if I delete an SSL annotation, the SSL cert will be detached. If I delete a WAFv2 annotation nothing will happen.
Seeing how the same thing happens for AWS Shield, I made peace with it and just added a warning in the docs. Do you all think this is enough?

Still left to do: tests. I have a couple of ideas about what I would like to test, but unclear when I'll have the time to write them. By next Monday for sure.
I did do a bunch of manual tests and they work, but automated tests need to be implemented too.

@rifelpet
Copy link
Contributor

rifelpet commented Apr 6, 2020

I just want to mention that in general I think requiring an annotation be set to an empty string in order for that feature to be disabled on the ALB is very unintuitive. One would expect removing the annotation altogether to do the same thing.

I would expect that if someone didn't add the WAFv2 permissions when upgrading to a version of the controller that includes this functionality that the rest of the controller would still function as expected, and only WAFv2 calls would fail, causing noisy logs but be otherwise functional.

I could also see an argument that if the controller's IAM policy isn't setup correctly then it should fail early (exit?) in order to be more visible to the cluster administrator. I understand that some administrators don't want to enable something like WAFv2 (which could currently be accomplished by not including it in the controller's IAM policy), so perhaps flags on the controller that selectively disable certain functionality would be appropriate.

Thoughts @M00nF1sh ?

@Vlaaaaaaad
Copy link
Author

I'll implement this however y'all want 😆 I'll just give my opinion, but since @M00nF1sh maintains this and has to carry that burden, they have the final say.

I would expect that if someone didn't add the WAFv2 permissions when upgrading to a version of the controller that includes this functionality that the rest of the controller would still function as expected, and only WAFv2 calls would fail, causing noisy logs but be otherwise functional.

Besides performance issues( for every reconcile loop and every ALB, run GetWebACLForResource and fail), this option just feels unfriendly to me.

perhaps flags on the controller that selectively disable certain functionality would be appropriate

It seems that Feature Gates can be sent as command-line arguments, but I never saw that used: https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/95ee2ac800f6b7c1904d5d7fd69acb8867131983/internal/ingress/controller/config/features.go#L66

We could set WAFv2 support as false by default, and write up some docs for how to enable it using Feature Gates. I think it would be the friendliest option, considering that the official helm chart has extraArgs as an already existing parameter.


Anyway, this was also brought up in #1126 (comment):

we need to add a feature Gate for this feature to allow Customer disable it. (since it need new IAM permissions and will fail the controller unless these permissions are added, some customers may don't want to add such permissions or even unable to do so).

@rifelpet
Copy link
Contributor

rifelpet commented Apr 6, 2020

Sure thing, this was mostly directed at @M00nF1sh :) I think this PR looks great so far

I would argue that noisy PermissionDenied errors in the logs would be best because it would most clearly alert the administrator that something isn't configured correctly. I think I would rather have these errors in my logs continuously rather than waiting until I need to add my first wafv2 annotation to an ingress, though I could see see a debate for either side on this.

I guess it comes down to "should the controller run out-of-the-box without any errors when using an incomplete/up-to-date IAM policy?" I think the answer should be no. The controller will still work to the best of its ability, but a stream of PermissionDenied errors should be expected. When upgrading the controller the administrator should be expected to update the IAM policy to match the documented policy for the controller version. If an administrator chooses to deviate from the policy, they can use the FeatureGates to disable the associated functionality in the controller.

Personally I like the idea of using a feature gate for this and having it enabled by default.

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Apr 9, 2020

this looks good to me.
This also to give user an ability to express "don't manage wafv2 for me". so that they can attach wafv2 to ALB from ALB console/3rd party controller without through ALB. (And deletion of ALB will fail unless they remove the WAFV2 themself. we'll add finalizer to keep Ingress there).
If we make every annotation on ALB to behave similar(i.e. unspecified = we don't manage it for you, then it's won't be counterintuitive).

So there are three semantic:

  1. semantic WIP: Refactor everything into packages #1: (shield)
    • unspecified = don't manage it
    • specified but empty, defaulting settings related to featureX.
    • specified but non-empty, apply settings related to featureX.
  2. semantic Refactor josh #2: (most of existing annotations)
    • unspecified = defaulting settings
    • specified but empty = defaulting settings
    • specified but non-empty = apply settings related to featureX.
  3. possible semantic vendor: vendor all of the dependencies #3: (none of annotations have it)
    • unspecified = defaulting settings
    • specified but empty(or a explicit "unmanaged" keyword), don't manage it
    • specified but non-empty = apply settings related to featureX.

I favor semantic #1 since it's more flexible. (i have see use cases like a company wide policy that automatically attach waf to all ALBs, which is out of k8s's control).
Also I do agree it's a bit un-convenient for users that are used to the semantic #2, and i'm personally ok to keep use semantic #2, and use feature flag instead.
@rifelpet @Vlaaaaaaad what do you guys think :D

@rifelpet
Copy link
Contributor

I think semantic #3 should be considered along with feature flags. If either the user (someone managing an ingress resource) or the cluster administrator (someone managing the ingress controller) desire the behavior "don't manage WAFv2 for me" either because they don't want to allow WAFv2 permissions or they have an external service managing that feature for the ALB, that is a conscious decision they need to make therefor it should be "opt in". The user could achieve that by setting the annotation value to external or something, and the cluster administrator could achieve that by setting the feature flag that disables WAFv2 api calls for all ingresses.

Regarding behavior for setting an annotation's value to an empty string, I feel like this would be a problem not unique to AWS ALB Ingress Controller. Do other ingress controllers have this problem? Do they solve it with empty annotation values?


If we make every annotation on ALB to behave similar ... then it won't be counterintuitive

I disagree. Consistency does not imply intuitiveness. Especially if this is the only ingress controller with this behavior. If someone with experience with other ingress controllers starts using this one, the behavior wont be intuitive to them.

I think as someone glancing at an ingress definition, seeing

alb.ingress.kubernetes.io/wafv2-acl-arn: external

more clearly helps me understand how the WAFv2 ACL attachment is being managed, rather than an empty string or no annotation at all.

@M00nF1sh
Copy link
Collaborator

alb.ingress.kubernetes.io/wafv2-acl-arn: external kind of convince me :D
I agree to use semantic #2 and a feature flag for now :D.
what do you think @Vlaaaaaaad

@Vlaaaaaaad
Copy link
Author

I am a tad confused. Let me try to clarify.

So, what you are proposing is this:

  • WAFv2 feature flag enabled by default
  • alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:...:a9613b = attach specified WAFv2
  • alb.ingress.kubernetes.io/wafv2-acl-arn: external = do nothing
  • alb.ingress.kubernetes.io/wafv2-acl-arn: '' = detach any WAFv2 present
  • no annotation = also detach any WAFv2 present
  • lack of IAM Permissions: fail Reconcile loop, log errors hinting at lack of IAM permissions

Did I get it correctly?

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Apr 14, 2020

I am a tad confused. Let me try to clarify.

So, what you are proposing is this:

  • WAFv2 feature flag enabled by default
  • alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:...:a9613b = attach specified WAFv2
  • alb.ingress.kubernetes.io/wafv2-acl-arn: external = do nothing
  • alb.ingress.kubernetes.io/wafv2-acl-arn: '' = detach any WAFv2 present
  • no annotation = also detach any WAFv2 present
  • lack of IAM Permissions: fail Reconcile loop, log errors hinting at lack of IAM permissions

Did I get it correctly?

Yes, but we don't need to support "alb.ingress.kubernetes.io/wafv2-acl-arn: external = do nothing" for now. :D sorry for the back and forth

@Vlaaaaaaad
Copy link
Author

No need to apologize @M00nF1sh. This is important 🙂

I'll code it up and test it manually this week. I hope that by Monday, I'll have tests added to the PR and have everything ready for review!

@Vlaaaaaaad
Copy link
Author

Ok, changed it to the above flow and manually tested all combinations.

Lack of WAFv2 permissions will result in the following error:

E0415 19:51:03.765344       1 controller.go:217] kubebuilder/controller 
"msg"="Reconciler error" 
"error"="failed get WAFv2 webACL for load balancer arn:...a00:
  AccessDeniedException: User: arn:aws:.....0 is not authorized to perform: 
  wafv2:GetWebACLForResource 
  on resource: arn:aws:wafv2:....:regional/webacl/*\n\t
  status code: 400, 
  request id: b28825ee-7cc3-4a6c-a0aa-5497f6fd0003"  
"controller"="alb-ingress-controller" 
"request"={"Namespace":"echoserver","Name":"echoserver"}

The error is quite clear and signals to any cluster admin that there is a lack of IAM permissions.


Still left to do: add tests. We're getting closer and closer!

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 16, 2020
@Vlaaaaaaad
Copy link
Author

Added tests for the WAFv2 🎉
I have absolutely no idea what I am doing, so please review the tests carefully. I based them on the existing WAF and Shield tests, but the tests I wrote seem brittle and not ideal. But that may be just my opinion.

Aaaaand with the tests added I am also marking this PR as ready for review! Don't be afraid to be merciless in your reviews, I can take it!

@Vlaaaaaaad Vlaaaaaaad marked this pull request as ready for review April 16, 2020 21:54
@Vlaaaaaaad Vlaaaaaaad changed the title [WIP][Feedback needed] WAFv2 support WAFv2 support Apr 16, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2020
@M00nF1sh
Copy link
Collaborator

Thanks for contributing this 🤣
Looks pretty good to me.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 18, 2020
@M00nF1sh M00nF1sh added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: Vlaaaaaaad

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b2a4dbf into kubernetes-sigs:master Apr 18, 2020
@Vlaaaaaaad Vlaaaaaaad deleted the waf-v2-clean branch April 25, 2020 20:03
@ahilmathew
Copy link

Has this been added to the documentation :(

@Vlaaaaaaad
Copy link
Author

@M00nF1sh do I have to do anything for the docs to be updated? The docs were added with this PR and they are present in master, but the live docs don't seem to include them.

@apolegoshko
Copy link

+1 for updating documentation :)

@Vlaaaaaaad
Copy link
Author

Missing docs are tracked as part of #1324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't attach new AWS WAF v2 ACLs - WAFNonexistentItemException
6 participants