Skip to content

Improve rabbitmq reset playbook matching #1103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Alex-Welsh
Copy link
Member

@Alex-Welsh Alex-Welsh commented Jun 19, 2024

Two issues were picked up during the Sussex deployment. The original command matched against the correct lines but also these:

ceph-42b27b44-271a-11ef-8338-58a2e1dff6f2@mds.manila-cephfs.artemis-cp-01.rygqwc.service
●
●
●

The first issue is that it was matching against ceph containers which happen to have manila in the name
The second was that it was matching against failed units, which show up in the list like this:

kolla-haproxy-container.service      loaded    active   running   docker kolla-haproxy-container.service
● kolla-heat_api-container.service      loaded    failed   failed    docker kolla-heat_api-container.service

So it's matching against the dot
Stopped containers don't need to be stopped and restarted so we should only match against running units

@Alex-Welsh Alex-Welsh requested a review from MoteHue June 19, 2024 09:16
@Alex-Welsh Alex-Welsh requested a review from a team as a code owner June 19, 2024 09:16
@MoteHue
Copy link
Contributor

MoteHue commented Jun 19, 2024

Stopped containers don't need to be stopped and restarted so we should only match against running units

Is this always true? I reckon a container could get into a failed state due to rabbit issues, it might be better to match against the description as this is the last field and always matches the kolla container name.

@Alex-Welsh
Copy link
Member Author

Stopped containers don't need to be stopped and restarted so we should only match against running units

Is this always true? I reckon a container could get into a failed state due to rabbit issues, it might be better to match against the description as this is the last field and always matches the kolla container name.

Feels a bit wrong to match against the description rather than the ID but fair point

MoteHue
MoteHue previously approved these changes Jun 20, 2024
@@ -68,4 +68,4 @@
# The following services use RabbitMQ.
- name: Restart OpenStack services
shell: >-
systemctl -a | egrep '(barbican|blazar|cinder|cloudkitty|designate|heat|ironic|keystone|magnum|manila|neutron|nova|octavia)' | awk '{ print $1 }' | xargs systemctl restart
systemctl list-units --type=service | egrep '(barbican|blazar|cinder|cloudkitty|designate|heat|ironic|keystone|magnum|manila|neutron|nova|octavia)' | egrep -v ceph | awk '{ print $NF }' | xargs systemctl restart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about $NF in your awk here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is to match the description. A bit dangerous if this changes in kolla-ansible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New approach, splitting every word out to a new line and selecting unique entries with better regex matching (must start with kolla-, end with -container.service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this now pick up each container twice, from the name and description? Could there be any issues from calling the restart twice in quick succession?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemctl list-units supports passing a pattern in, can we do that instead of the grep? https://www.freedesktop.org/software/systemd/man/latest/systemctl.html#list-units%20PATTERN%E2%80%A6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added | uniq to stop it duplicating lines.

I don't think the systemctl pattern really helps. Grep is more versatile here

@Alex-Welsh Alex-Welsh requested a review from priteau June 26, 2024 09:14
@darmach
Copy link
Member

darmach commented Jun 28, 2024

@Alex-Welsh It's always good to check if a PR attempting to fix your issue is not already opened :)

#1073

@darmach
Copy link
Member

darmach commented Jun 28, 2024

Oh, I see that you've seen that one (as you've upvoted Michal's suggestion) :)
Why open the new pull request then @Alex-Welsh ?

@Alex-Welsh
Copy link
Member Author

Oh, I see that you've seen that one (as you've upvoted Michal's suggestion) :) Why open the new pull request then @Alex-Welsh ?

I can barely remember what I had for breakfast this morning, let alone anything that happened in May 😆. I'm happy to close this one and continue with yours if you update it

@darmach
Copy link
Member

darmach commented Jul 11, 2024

@Alex-Welsh actually, it's my bad! I fixed matching aswell and never pushed to that branch I pasted earler! That's why I found this redundant. Great job and thanks!

@MoteHue
Copy link
Contributor

MoteHue commented Jul 12, 2024

https://github.com/stackhpc/stackhpc-kayobe-config/blob/stackhpc/2023.1/etc/kayobe/ansible/stop-openstack-services.yml Just realised we might want to change this too, as it will also be matching against the dot on failed services. Less impactful as stopping a failed service probably won't do much, but still seems worth doing.

@Alex-Welsh
Copy link
Member Author

Closing in favor of #1073

@Alex-Welsh Alex-Welsh closed this Sep 6, 2024
@Alex-Welsh Alex-Welsh deleted the fix-rabbit-reset branch November 22, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants