Skip to content

Deploy rebuild machinery #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 25, 2021
Merged

Deploy rebuild machinery #44

merged 8 commits into from
Mar 25, 2021

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Mar 10, 2021

Seems to have got missed in move to multi-environments.

@sjpb sjpb requested a review from jovial March 10, 2021 14:28
@sjpb
Copy link
Collaborator Author

sjpb commented Mar 10, 2021

Need to:

  • fix validation so that if the role default for openhpc_rebuild_clouds is ok, that works
  • fix CI so that openstack token issue "works" as in returns rc=0.


- name: Validate rebuild configuration
hosts: openhpc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit openstack specific. Is there a way we could only run this only if we are using openstack?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebuild functionality is openstack specific - maybe we need to add a group?

Copy link
Collaborator

@jovial jovial Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah or possibly just a variable that sets the cloud type? E.g

appliance_cloud_type: openstack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have gone with a group as its consistent with all other features and there could be reasons why you only want to enable rebuild on a subset of nodes I guess.

that: openhpc_clouds_path.stat.exists
fail_msg: "openrc file {{ openhpc_rebuild_clouds }} on localhost (specified by `openhpc_rebuild_clouds`) does not exist"
- name: Check OpenRC file (openhpc_rebuild_clouds) is usable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add no_log here so that we don't leak a token in stdout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I can find on the manpage is:

--log-file ¶
Specify a file to log output. Disabled by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate tasks now defined in collection's role which does set this.

@sjpb
Copy link
Collaborator Author

sjpb commented Mar 10, 2021

@sjpb sjpb marked this pull request as draft March 11, 2021 09:22
@sjpb sjpb force-pushed the feature/rebuild branch from 24a130e to f9b6476 Compare March 11, 2021 09:32
@sjpb sjpb force-pushed the feature/rebuild branch from f9b6476 to 0836adc Compare March 11, 2021 09:34
@sjpb sjpb marked this pull request as ready for review March 11, 2021 10:47

[rebuild:children]
compute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only issue here is that the vagrant example is using this file and it will fall the validation as it is not an openstack cloud, see: https://github.com/stackhpc/openhpc-demo/blob/c9fc16510513f1a655049c7576f9ce672a850f18/environments/vagrant-example/ansible.cfg#L8.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got some incompatible pieces here:

  • Defining a group to turn on rebuild seems "right" as it's consistent with the way we do everything else
  • Testing the everything template using vagrant seems right

Seems like there's a few approaches which would keep both of the above:

  • Modify the inventory groups when running vagrant. Not easy to do directly
  • Special-case the rebuild validate & deploy when running vagrant. Messy, and probably the problems above are really a specific case of a more general "we can't do this in vagrant" problem.
  • Make vagrant pass the validation task - the "main" looks like it should be ok - might need openhpc_rebuild_reconfigure: false

Copy link
Collaborator

@jovial jovial Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got some incompatible pieces here:

  • Defining a group to turn on rebuild seems "right" as it's consistent with the way we do everything else
  • Testing the everything template using vagrant seems right

Seems like there's a few approaches which would keep both of the above:

  • Modify the inventory groups when running vagrant. Not easy to do directly
  • Special-case the rebuild validate & deploy when running vagrant. Messy, and probably the problems above are really a specific case of a more general "we can't do this in vagrant" problem.
  • Make vagrant pass the validation task - the "main" looks like it should be ok - might need openhpc_rebuild_reconfigure: false

To me it seems like the right place to declare an openstack cloud is in the inventory generated by the provisioning script as it has all the info about how the machines were provisioned. We should only install the openstack rebuild script on an openstack cloud. We could do this with a feature flag or via a constructed inventory (put the hosts in openstack_rebuild if the cloud is openstack). Unsupported clouds should ship with rebuild feature disabled. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have implemented @jovial's proposed approach in c20faac.

@jovial
Copy link
Collaborator

jovial commented Mar 11, 2021

Unsure why we hit: https://github.com/stackhpc/openhpc-demo/runs/2085005006#step:7:80 - ansible version related? Think I hit the same issue here and had to use the unqualified name:
https://github.com/stackhpc/openhpc-demo/blob/c9fc16510513f1a655049c7576f9ce672a850f18/ansible/monitoring.yml#L54
But we should consider bumping the version of ansible if this fixes that issue.

Edit: this looks like a red herring as on re-run it found the role, see: https://github.com/stackhpc/openhpc-demo/pull/44/checks?check_run_id=2085683879#step:7:106

echo "fake openstack command"
EOL
chmod u+x /tmp/vagrant-example/openstack
PATH=/tmp/vagrant-example:$PATH
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to export this for it to be visible in the ansible process.

@sjpb sjpb merged commit 1a23c33 into main Mar 25, 2021
@sjpb sjpb deleted the feature/rebuild branch March 25, 2021 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants