-
Notifications
You must be signed in to change notification settings - Fork 72
OCPBUGS-1684: Optimize certificate generation #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@tmshort: This pull request references Jira Issue OCPBUGS-1684, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@tmshort: This pull request references Jira Issue OCPBUGS-1684, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
The first job took 17s, subsequent jobs are taking 3~4s:
Assuming all nodes are roughly equivalent; this is a significant time savings. |
/retest |
Reduce certificate generation to once-a-day. Optimize loading of the certificate. Rename operations to correctly identify what they do. Signed-off-by: Todd Short <[email protected]>
Signed-off-by: Todd Short <[email protected]>
/hold |
/retest |
1 similar comment
/retest |
/hold cancel |
@tmshort: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
InsecureSkipVerify: true, | ||
Certificates: []tls.Certificate{*tlsCert}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it was like that originally in getHttpClient
, but this thing makes me nervous.
Do you know why we have to use InsecureSkipVerify
? Can't we set RootCAs
to a x509.CertPool
containing correct cert so that our self singed cert can be verified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: I think it's a bit beyond the scope of this Bug/PR to fix the certificate verification.
I was just focusing on the certificate generation and not the underlying security policy.
The pod template in the CronJob does not mount the olm-operator-serving-cert
secret, which should be the certificate (and key) of the server. So it does not have access. I don't know what the CA is for that certificate (or if it's self-signed - as I have not figured out where it's generated - I'm guessing some openshift/cert-mgr magic. @awgreene do you know?).
(Note that there are two pods from which profiling information is collected, so both secrets, the other being catalog-operator-serving-cert
, would need to be referenced.)
The collect-profiles pod collects from two different pods. It would require changing the manifests to mount the secret(s), adding a CLI option to reference the secret(s), then code to reference. As the it might require multiple arguments, and that doesn't scale. If I could figure out where the certificate comes from, and if the two certificates use a common CA, that would ease the burden.
I will note that the olm and catalog pods do appear to validate the client, as they mount the pprof
secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: the catalog-operator-serving-cert
and olm-operator-serving-cert
are both signed by CN=openshift-service-serving-signer@1683311309
(presumably the number at the end is unique per instance). The signing cert is included in the certificate chain.
This certificate (and key) is located in the signing-key
secret in the openshift-service-ca
namespace; so no access. The key being there is a bit dangerous.
Optimally, the CA cert (only) would be made available somewhere, and I'm not sure that's the case. There are no secrets in the default
namespace.
Service CA certificates describes the behavior of the OpenShift CA, and how to use it. Rotation would have to be handled (OLM does this already), and the collect-profiles pod really ought to use the same certificate generation mechanism, but it's designed for services (servers) not clients, AFAICT (a fake service could be defined, though).
Definitely beyond the scope of this bugfix/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m1kola ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this change for now as it addresses the bug defined in the ticket. We should probably consider if we should disable this job by default as I don't believe anyone has actually utilized the pprof data collected by the job since this feature was introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: I think it's a bit beyond the scope of this Bug/PR to fix the certificate verification.
Absolutely. I implied that when I said it was like that in getHttpClient
, but I'm sorry that I did not make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the variable name change improvements.
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create a ticket to address InsecureSkipVerify
. It has potential for man in the middle attack.
But I agree that it is not in the scope of this change.
/lgtm
@tmshort: Jira Issue OCPBUGS-1684: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-1684 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awgreene, m1kola, tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick 4.13 |
@tmshort: cannot checkout In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.13 |
@tmshort: #486 failed to apply on top of branch "release-4.13":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira cherrypick OCPBUGS-1684 |
@tmshort: Jira Issue OCPBUGS-1684 has been cloned as Jira Issue OCPBUGS-13321. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@tmshort: Jira Issue OCPBUGS-13321: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-13321 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@tmshort: Jira Issue OCPBUGS-1684 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.