Skip to content

OCPBUGS-1684: Optimize certificate generation #486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 9, 2023

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Apr 27, 2023

Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 27, 2023
@openshift-ci-robot
Copy link

@tmshort: This pull request references Jira Issue OCPBUGS-1684, which is invalid:

  • expected the bug to target the "4.14.0" version, but it targets "4.12.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from grokspawn and perdasilva April 27, 2023 13:36
@tmshort
Copy link
Contributor Author

tmshort commented Apr 27, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 27, 2023
@openshift-ci-robot
Copy link

@tmshort: This pull request references Jira Issue OCPBUGS-1684, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmshort
Copy link
Contributor Author

tmshort commented Apr 27, 2023

/retest

@tmshort
Copy link
Contributor Author

tmshort commented Apr 27, 2023

The first job took 17s, subsequent jobs are taking 3~4s:

NAMESPACE                              NAME                        COMPLETIONS   DURATION   AGE
openshift-operator-lifecycle-manager   collect-profiles-28043685   1/1           17s        38m
openshift-operator-lifecycle-manager   collect-profiles-28043700   1/1           4s         24m
openshift-operator-lifecycle-manager   collect-profiles-28043715   1/1           4s         9m37s
openshift-operator-lifecycle-manager   collect-profiles-28043730   1/1           3s         8m46s
openshift-operator-lifecycle-manager   collect-profiles-28043745   1/1           4s         114s

Assuming all nodes are roughly equivalent; this is a significant time savings.
EDIT: This is also the behavior for non-modified olm.

@tmshort
Copy link
Contributor Author

tmshort commented Apr 27, 2023

/retest

tmshort added 2 commits May 2, 2023 10:54
Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

Signed-off-by: Todd Short <[email protected]>
@tmshort
Copy link
Contributor Author

tmshort commented May 2, 2023

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 2, 2023
@tmshort
Copy link
Contributor Author

tmshort commented May 2, 2023

/retest

1 similar comment
@tmshort
Copy link
Contributor Author

tmshort commented May 2, 2023

/retest

@tmshort
Copy link
Contributor Author

tmshort commented May 2, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 2, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 2, 2023

@tmshort: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment on lines +161 to +162
InsecureSkipVerify: true,
Certificates: []tls.Certificate{*tlsCert},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was like that originally in getHttpClient, but this thing makes me nervous.

Do you know why we have to use InsecureSkipVerify? Can't we set RootCAs to a x509.CertPool containing correct cert so that our self singed cert can be verified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: I think it's a bit beyond the scope of this Bug/PR to fix the certificate verification.

I was just focusing on the certificate generation and not the underlying security policy.

The pod template in the CronJob does not mount the olm-operator-serving-cert secret, which should be the certificate (and key) of the server. So it does not have access. I don't know what the CA is for that certificate (or if it's self-signed - as I have not figured out where it's generated - I'm guessing some openshift/cert-mgr magic. @awgreene do you know?).

(Note that there are two pods from which profiling information is collected, so both secrets, the other being catalog-operator-serving-cert, would need to be referenced.)

The collect-profiles pod collects from two different pods. It would require changing the manifests to mount the secret(s), adding a CLI option to reference the secret(s), then code to reference. As the it might require multiple arguments, and that doesn't scale. If I could figure out where the certificate comes from, and if the two certificates use a common CA, that would ease the burden.

I will note that the olm and catalog pods do appear to validate the client, as they mount the pprof secret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: the catalog-operator-serving-cert and olm-operator-serving-cert are both signed by CN=openshift-service-serving-signer@1683311309 (presumably the number at the end is unique per instance). The signing cert is included in the certificate chain.

This certificate (and key) is located in the signing-key secret in the openshift-service-ca namespace; so no access. The key being there is a bit dangerous.

Optimally, the CA cert (only) would be made available somewhere, and I'm not sure that's the case. There are no secrets in the default namespace.

Service CA certificates describes the behavior of the OpenShift CA, and how to use it. Rotation would have to be handled (OLM does this already), and the collect-profiles pod really ought to use the same certificate generation mechanism, but it's designed for services (servers) not clients, AFAICT (a fake service could be defined, though).

Definitely beyond the scope of this bugfix/PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change for now as it addresses the bug defined in the ticket. We should probably consider if we should disable this job by default as I don't believe anyone has actually utilized the pprof data collected by the job since this feature was introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: I think it's a bit beyond the scope of this Bug/PR to fix the certificate verification.

Absolutely. I implied that when I said it was like that in getHttpClient, but I'm sorry that I did not make it more clear.

Copy link
Contributor

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the variable name change improvements.
/approve

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 9, 2023
Copy link
Contributor

@m1kola m1kola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create a ticket to address InsecureSkipVerify. It has potential for man in the middle attack.

But I agree that it is not in the scope of this change.

/lgtm

@openshift-merge-robot openshift-merge-robot merged commit 969aa60 into openshift:master May 9, 2023
@openshift-ci-robot
Copy link

@tmshort: Jira Issue OCPBUGS-1684: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-1684 has been moved to the MODIFIED state.

In response to this:

Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, m1kola, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort
Copy link
Contributor Author

tmshort commented May 9, 2023

/cherry-pick 4.13

@tmshort tmshort deleted the OCPBUGS-1684m branch May 9, 2023 15:50
@openshift-cherrypick-robot

@tmshort: cannot checkout 4.13: error checking out 4.13: exit status 1. output: error: pathspec '4.13' did not match any file(s) known to git

In response to this:

/cherry-pick 4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmshort
Copy link
Contributor Author

tmshort commented May 9, 2023

/cherry-pick release-4.13

@openshift-cherrypick-robot

@tmshort: #486 failed to apply on top of branch "release-4.13":

Applying: OCPBUGS-1684: Optimize certificate generation
Applying: Make 'ci/prow/verify' happy
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
CONFLICT (content): Merge conflict in vendor/modules.txt
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Make 'ci/prow/verify' happy
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmshort
Copy link
Contributor Author

tmshort commented May 9, 2023

/jira cherrypick OCPBUGS-1684

@openshift-ci-robot
Copy link

@tmshort: Jira Issue OCPBUGS-1684 has been cloned as Jira Issue OCPBUGS-13321. Will retitle bug to link to clone.
/retitle OCPBUGS-13321: OCPBUGS-1684: Optimize certificate generation

In response to this:

/jira cherrypick OCPBUGS-1684

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title OCPBUGS-1684: Optimize certificate generation OCPBUGS-13321: OCPBUGS-1684: Optimize certificate generation May 9, 2023
@openshift-ci-robot
Copy link

@tmshort: Jira Issue OCPBUGS-13321: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-13321 has been moved to the MODIFIED state.

In response to this:

Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmshort tmshort changed the title OCPBUGS-13321: OCPBUGS-1684: Optimize certificate generation OCPBUGS-1684: Optimize certificate generation May 9, 2023
@openshift-ci-robot
Copy link

@tmshort: Jira Issue OCPBUGS-1684 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to this:

Reduce certificate generation to once-a-day.
Optimize loading of the certificate.
Rename operations to correctly identify what they do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants