Skip to content

Update install.sh to use kubectl create #2771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 11, 2022

Conversation

exdx
Copy link
Member

@exdx exdx commented May 10, 2022

The install of OLM CRDs via install.sh via apply was failing due to the
'last-applied-configuration' annotation causing the size of the CRD
annotations to be too large for the server to accept. Creating the CRDs via
kubectl create does not cause the annotation to be automatically
appended to the object, so the application goes through successfully.
Installing via create means that the install.sh script does not support updating an
existing OLM installation, but there are already checks in place to abort the
install if an existing OLM installation is detected.

Signed-off-by: Daniel Sover [email protected]

Closes #2767

Description of the change:

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky
  • Tests that remove the [FLAKE] tag are no longer flaky

@openshift-ci openshift-ci bot requested review from anik120 and awgreene May 10, 2022 14:42
Copy link
Member

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out locally using a fresh kind cluster:

$ kind delete cluster ; kind create cluster
$ git checkout master
$ ./scripts/install.sh v0.21.1
customresourcedefinition.apiextensions.k8s.io/catalogsources.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/installplans.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/olmconfigs.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operatorconditions.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operatorgroups.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operators.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/subscriptions.operators.coreos.com created
The CustomResourceDefinition "clusterserviceversions.operators.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

And when pulling down these changes:

$ kind delete cluster ; kind create cluster
$ gh pr checkout 2771
$ ./scripts/install.sh v0.21.1
customresourcedefinition.apiextensions.k8s.io/catalogsources.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/clusterserviceversions.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/installplans.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/olmconfigs.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/operatorconditions.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/operatorgroups.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/operators.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/subscriptions.operators.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/catalogsources.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/clusterserviceversions.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/installplans.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/olmconfigs.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operatorconditions.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operatorgroups.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operators.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/subscriptions.operators.coreos.com condition met
namespace/olm serverside-applied
namespace/operators serverside-applied
serviceaccount/olm-operator-serviceaccount serverside-applied
clusterrole.rbac.authorization.k8s.io/system:controller:operator-lifecycle-manager serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/olm-operator-binding-olm serverside-applied
olmconfig.operators.coreos.com/cluster serverside-applied
deployment.apps/olm-operator serverside-applied
deployment.apps/catalog-operator serverside-applied
clusterrole.rbac.authorization.k8s.io/aggregate-olm-edit serverside-applied
clusterrole.rbac.authorization.k8s.io/aggregate-olm-view serverside-applied
operatorgroup.operators.coreos.com/global-operators serverside-applied
operatorgroup.operators.coreos.com/olm-operators serverside-applied
clusterserviceversion.operators.coreos.com/packageserver serverside-applied
catalogsource.operators.coreos.com/operatorhubio-catalog serverside-applied
Waiting for deployment "olm-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "olm-operator" successfully rolled out
Waiting for deployment "catalog-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "catalog-operator" successfully rolled out
Package server phase: Installing
Package server phase: Succeeded
deployment "packageserver" successfully rolled out

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2022
@timflannagan
Copy link
Member

FYI - it looks like SSA GA'd using k8s 1.22 so this solution is potentially problematic for older clusters.

@exdx
Copy link
Member Author

exdx commented May 10, 2022

FYI - it looks like SSA GA'd using k8s 1.22 so this solution is potentially problematic for older clusters.

Hmm -- should we instead create the resources to avoid any issues on older clusters?

@timflannagan
Copy link
Member

@exdx That seems like a valid alternative, but something I realize now is what happens if I have an OLM installation that stamped out by this install.sh that used the create semantics, and I want to update OLM using kubectl apply ... down the line?

@exdx
Copy link
Member Author

exdx commented May 10, 2022

@exdx That seems like a valid alternative, but something I realize now is what happens if I have an OLM installation that stamped out by this install.sh that used the create semantics, and I want to update OLM using kubectl apply ... down the line?

Yes, that's the update path, and we don't have a supported way of doing it today unforunately (tracked in #2695). The update would fail without the --server-side=true flag set. Maybe it's worth documenting this in the release notes? So users on existing OLM installations have a path forward.

The install of OLM CRDs via install.sh via apply was failing due to the
'last-applied-configuration' annotation causing the size of the CRD
annotations to be too large for the server to accept. Creating the CRDs via
kubectl create does not cause the annotation to be automatically
appended to the object, so the application goes through successfully.
Installing via create means that the install.sh script does not support updating an
existing OLM installation, but there are already checks in place to abort the
install if an existing OLM installation is detected.

Signed-off-by: Daniel Sover <[email protected]>
@exdx exdx force-pushed the fix/server-side-apply branch from 6737f60 to 33e86c2 Compare May 10, 2022 17:37
@exdx
Copy link
Member Author

exdx commented May 10, 2022

Updated the script to use kubectl create instead.

@exdx exdx changed the title Update install.sh to use server-side apply Update install.sh to use kubectl create May 10, 2022
Copy link
Member

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
Nice work

@openshift-ci
Copy link

openshift-ci bot commented May 10, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, exdx, timflannagan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [awgreene,timflannagan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@awgreene
Copy link
Member

@timflannagan @exdx this does bring into question what version of k8s we support, should we commit to a set number of the most recent releases?

@exdx
Copy link
Member Author

exdx commented May 10, 2022

@timflannagan @exdx this does bring into question what version of k8s we support, should we commit to a set number of the most recent releases?

I think ideally we would support N-2 releases behind, but I don't think we necessarily have the ability to ensure that, unless we build out the e2e suite to also test on past versions -- which is possible. Since our upstream support guarantee is best effort, I think being always on the latest k8s release and suggesting users on older clusters use an older version is reasonable.

@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@exdx
Copy link
Member Author

exdx commented May 10, 2022

I'm not sure why, but that test seems to be consistently failing on this PR, so it suggests it must be related?

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2022
@exdx
Copy link
Member Author

exdx commented May 10, 2022

Looks like it passed

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@exdx
Copy link
Member Author

exdx commented May 11, 2022

/override flaky-e2e-tests

@openshift-ci
Copy link

openshift-ci bot commented May 11, 2022

@exdx: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

  • flaky-e2e-tests

Only the following contexts were expected:

  • DCO
  • e2e-tests
  • image
  • lint
  • tide
  • unit
  • vendor
  • verify

In response to this:

/override flaky-e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@exdx
Copy link
Member Author

exdx commented May 11, 2022

Going to merge manually since the flaky-e2e job is not required, and the bot is hung up on it for some reason.

@exdx exdx merged commit 1672739 into operator-framework:master May 11, 2022
@joaomlneto
Copy link

joaomlneto commented May 11, 2022

@exdx I just gave it a go and it fails to install on microk8s.
Should I transform this into a separate issue, or reopen #2767 ?

Steps to reproduce:

  1. Get an instance in Hetzner Cloud (I got a 16-core, 32GB RAM)
  2. apt update && apt upgrade -y && apt install snapd -y && snap install microk8s --classic && snap install kubectl --classic
  3. mkdir ~/.kube && microk8s config > .kube/config
  4. curl -sL https://raw.githubusercontent.com/exdx/operator-lifecycle-manager/33e86c2850975a793e121b200476a95511179dc6/scripts/install.sh | bash -s v0.21.1

Outcome:

  • Script ends with CSV "packageserver" failed to reach phase succeeded.
  • CPU usage hangs at 100% of 1 core for at least 10 minutes.
  • kubectl get csv -n "olm" packageserver -o jsonpath='{.status.phase}' returns Installing

Output:

# curl -sL https://raw.githubusercontent.com/exdx/operator-lifecycle-manager/33e86c2850975a793e121b200476a95511179dc6/scripts/install.sh | bash -s v0.21.1
customresourcedefinition.apiextensions.k8s.io/catalogsources.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/clusterserviceversions.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/installplans.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/olmconfigs.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operatorconditions.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operatorgroups.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/operators.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/subscriptions.operators.coreos.com created
customresourcedefinition.apiextensions.k8s.io/catalogsources.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/clusterserviceversions.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/installplans.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/olmconfigs.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operatorconditions.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operatorgroups.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/operators.operators.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/subscriptions.operators.coreos.com condition met
namespace/olm created
namespace/operators created
serviceaccount/olm-operator-serviceaccount created
clusterrole.rbac.authorization.k8s.io/system:controller:operator-lifecycle-manager created
clusterrolebinding.rbac.authorization.k8s.io/olm-operator-binding-olm created
olmconfig.operators.coreos.com/cluster created
deployment.apps/olm-operator created
deployment.apps/catalog-operator created
clusterrole.rbac.authorization.k8s.io/aggregate-olm-edit created
clusterrole.rbac.authorization.k8s.io/aggregate-olm-view created
operatorgroup.operators.coreos.com/global-operators created
operatorgroup.operators.coreos.com/olm-operators created
clusterserviceversion.operators.coreos.com/packageserver created
catalogsource.operators.coreos.com/operatorhubio-catalog created
Waiting for deployment "olm-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "olm-operator" successfully rolled out
Waiting for deployment "catalog-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "catalog-operator" successfully rolled out
Package server phase: Installing
CSV "packageserver" failed to reach phase succeeded

@exdx
Copy link
Member Author

exdx commented May 11, 2022

Hi @joaomlneto, I'd recommend opening a separate issue as this seems unrelated to the size of the last-applied-configuration annotation. I'm not sure how well OLM works on smaller k8s environments like microk8s or k3s, I know there were some issues in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error installing v0.21.1 crds
6 participants