Skip to content

Support deploying multinodes on Leafcloud #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
8079864
Update Terraform provider hashes
markgoddard Apr 8, 2024
4662b4f
Remove unused configure-local-networking.sh hello.sh in scripts/
markgoddard Apr 8, 2024
d93e2a8
Grow Ansible control host root volume
markgoddard Apr 8, 2024
9bec8e8
Remove Ansible playbooks targeting seed & overcloud hosts
markgoddard Apr 8, 2024
af972d9
Remove hosts from Ansible inventory except for Ansible control host
markgoddard Apr 10, 2024
f922a98
Change default root_domain to multinode.stackhpc.com
markgoddard Apr 8, 2024
1d3f31a
Support attaching a floating IP to the Ansible control host
markgoddard Apr 8, 2024
f9d7fc1
Use new name of Tempest container when following logs
markgoddard Apr 8, 2024
7d2b189
Move Tempest test results to ~/tempest-artifacts
markgoddard Apr 8, 2024
a4cd243
Improve Tempest test result handling
markgoddard Apr 8, 2024
2876d42
Use SSH key defined by ssh_key_path when connecting to Ansible contro…
markgoddard Apr 10, 2024
7e90554
Skip os_capacity when deploying HAProxy for Vault
markgoddard Apr 10, 2024
2d38500
Add an example tfvars file for Leafcloud
markgoddard Apr 10, 2024
0e10965
README: Reorganise and various fixes
markgoddard Apr 10, 2024
5de4fae
Improve prechecks for ssh_key, vault_password, vxlan_vni
markgoddard Apr 10, 2024
b30a6be
Comment out required changes in Leafcloud tfvars example
markgoddard Apr 10, 2024
74ec6ad
Apply suggestions from code review
markgoddard Apr 11, 2024
4394b95
Remove the default for the prefix variable
markgoddard Apr 11, 2024
e78c068
README: Remove console type from SSH config code block
markgoddard Apr 11, 2024
afd9866
Add descriptions for all Terraform variables
markgoddard Apr 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

210 changes: 113 additions & 97 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,43 @@
Terraform Kayobe Multinode
==========================

This Terraform configuration deploys a requested amount of Instances on an OpenStack cloud, to be
used as a Multinode Kayobe test environment.
This Terraform configuration deploys a requested amount of instances on an OpenStack cloud, to be
used as a Multinode Kayobe test environment. This includes:

Usage
=====
* 1x Ansible control host
* 1x seed host
* controller hosts
* compute hosts
* Ceph storage hosts
* Optional Wazuh manager host

The high-level workflow to deploy a cluster is as follows:

* Prerequisites
* Configure Terraform and Ansible
* Deploy infrastructure on OpenStack using Terraform
* Configure Ansible control host using Ansible
* Deploy multi-node OpenStack using Kayobe

This configuration is typically used with the `ci-multinode` environment in the
`StackHPC Kayobe Configuration
<https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-yoga/contributor/environments/ci-multinode.html>`__
repository.

Prerequisites
=============

These instructions show how to use this Terraform configuration manually. They
assume you are running an Ubuntu host that will be used to run Terraform. The
machine should have network access to the environment that will be created by this
configuration.
machine should have access to the API of the OpenStack cloud that will host the
infrastructure, and network access to the Ansible control host once it has been
deployed. This may be achieved by direct SSH access, a floating IP on the
Ansible control host, or using an SSH bastion.

The OpenStack cloud should have sufficient capacity to deploy the
infrastructure, and a suitable image registered in Glance. Ideally the image
should be one of the overcloud host images defined in StackHPC Kayobe
configuration and available in `Ark <https://ark.stackhpc.com>`__.

Install Terraform:

Expand All @@ -22,21 +49,24 @@ Install Terraform:
sudo apt update
sudo apt install git terraform

Clone and initialise the Kayobe config:
Clone and initialise this Terraform config repository:

.. code-block:: console

git clone https://github.com/stackhpc/terraform-kayobe-multinode
cd terraform-kayobe-multinode


Initialise Terraform:

.. code-block:: console

terraform init

Generate an SSH keypair:
Generate an SSH keypair. The public key will be registered in OpenStack as a
keypair and authorised by the instances deployed by Terraform. The private and
public keys will be transferred to the Ansible control host to allow it to
connect to the other hosts. Note that password-protected keys are not currently
supported.

.. code-block:: console

Expand Down Expand Up @@ -94,59 +124,74 @@ Or you can source the provided `init.sh` script which shall initialise terraform
OpenStack Cloud Name: sms-lab
Password:

Generate Terraform variables:
You must ensure that you have `Ansible installed <https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html>`_ on your local machine.

.. code-block:: console

cat << EOF > terraform.tfvars
prefix = "changeme"
pip install --user ansible

ansible_control_vm_flavor = "general.v1.small"
ansible_control_vm_name = "ansible-control"
ansible_control_disk_size = 100
Install the Ansible galaxy requirements.

seed_vm_flavor = "general.v1.small"
seed_disk_size = 100
.. code-block:: console

multinode_flavor = "general.v1.medium"
multinode_image = "Rocky9-lvm"
multinode_keypair = "changeme"
multinode_vm_network = "stackhpc-ipv4-geneve"
multinode_vm_subnet = "stackhpc-ipv4-geneve-subnet"
compute_count = "2"
controller_count = "3"
compute_disk_size = 100
controller_disk_size = 100
ansible-galaxy install -r ansible/requirements.yml

ssh_public_key = "~/.ssh/changeme.pub"
ssh_user = "cloud-user"
If the deployed instances are behind an SSH bastion you must ensure that your SSH config is setup appropriately with a proxy jump.

storage_count = "3"
storage_flavor = "general.v1.small"
storage_disk_size = 100
.. code-block::

deploy_wazuh = true
infra_vm_flavor = "general.v1.small"
infra_vm_disk_size = 100
Host lab-bastion
HostName BastionIPAddr
User username
IdentityFile ~/.ssh/key

EOF
Host 10.*
ProxyJump=lab-bastion
ForwardAgent no
IdentityFile ~/.ssh/key
UserKnownHostsFile /dev/null
StrictHostKeyChecking no

Configure Terraform variables
=============================

Populate Terraform variables in `terraform.tfvars`. Examples are provided in
files named `*.tfvars.example`. The available variables are defined in
`variables.tf` along with their type, description, and optional default.

You will need to set the `multinode_keypair`, `prefix`, and `ssh_public_key`.
By default, Rocky Linux 9 will be used but Ubuntu Jammy is also supported by
changing `multinode_image` to `Ubuntu-22.04-lvm` and `ssh_user` to `ubuntu`.
Other LVM images should also work but are untested.
changing `multinode_image` to `overcloud-ubuntu-jammy-<release>-<datetime>` and
`ssh_user` to `ubuntu`.

The `multinode_flavor` will change the flavor used for controller and compute
nodes. Both virtual machines and baremetal are supported, but the `*_disk_size`
variables must be set to 0 when using baremetal host. This will stop a block
device being allocated. When any baremetal hosts are deployed, the
`multinode_vm_network` and `multinode_vm_subnet` should also be changed to
`stackhpc-ipv4-vlan-v2` and `stackhpc-ipv4-vlan-subnet-v2` respectively.
a VLAN network and associated subnet.

If `deploy_wazuh` is set to true, an infrastructure VM will be created that
hosts the Wazuh manager. The Wazuh deployment playbooks will also be triggered
automatically to deploy Wazuh agents to the overcloud hosts.

If `add_ansible_control_fip` is set to `true`, a floating IP will be created
and attached to the Ansible control host. In that case
`ansible_control_fip_pool` should be set to the name of the pool (network) from
which to allocate the floating IP, and the floating IP will be used for SSH
access to the control host.

Configure Ansible variables
===========================

Review the vars defined within `ansible/vars/defaults.yml`. In here you can customise the version of kayobe, kayobe-config or openstack-config.
Make sure to define `ssh_key_path` to point to the location of the SSH key in use by the nodes and also `vxlan_vni` which should be unique value between 1 to 100,000.
VNI should be much smaller than the officially supported limit of 16,777,215 as we encounter errors when attempting to bring interfaces up that use a high VNI.
You must set `vault_password_path`; this should be set to the path to a file containing the Ansible vault password.

Deployment
==========

Generate a plan:

.. code-block:: console
Expand All @@ -159,91 +204,62 @@ Apply the changes:

terraform apply -auto-approve

You should have requested a number of resources spawned on Openstack, and an ansible_inventory file produced as output for Kayobe.

Copy your generated id_rsa and id_rsa.pub to ~/.ssh/ on Ansible control host if you want Kayobe to automatically pick them up during bootstrap.
You should have requested a number of resources to be spawned on Openstack.

Configure Ansible control host
==============================

Using the `deploy-openstack-config.yml` playbook you can setup the Ansible control host to include the kayobe/kayobe-config repositories with `hosts` and `admin-oc-networks`.
It shall also setup the kayobe virtual environment, allowing for immediate configuration and deployment of OpenStack.

First you must ensure that you have `Ansible installed <https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html>`_ on your local machine.
Run the configure-hosts.yml playbook to configure the Ansible control host.

.. code-block:: console

pip install --user ansible

Secondly if the machines are behind an SSH bastion you must ensure that your ssh config is setup appropriately with a proxy jump
ansible-playbook -i ansible/inventory.yml ansible/configure-hosts.yml

.. code-block:: console
This playbook sequentially executes 2 other playbooks:

Host lab-bastion
HostName BastionIPAddr
User username
IdentityFile ~/.ssh/key
#. ``grow-control-host.yml`` - Applies LVM configuration to the control host to ensure it has enough space to continue with the rest of the deployment. Tag: ``lvm``
#. ``deploy-openstack-config.yml`` - Prepares the Ansible control host as a Kayobe control host, cloning the Kayobe configuration and installing virtual environments. Tag: ``deploy``

Host 10.*
ProxyJump=lab-bastion
ForwardAgent no
IdentityFile ~/.ssh/key
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
These playbooks are tagged so that they can be invoked or skipped using `tags` or `--skip-tags` as required.

Install the ansible requirements.
Deploy OpenStack
================

.. code-block:: console
Once the Ansible control host has been configured with a Kayobe/OpenStack configuration you can then begin the process of deploying OpenStack.
This can be achieved by either manually running the various commands to configure the hosts and deploy the services or automated by using the generated `deploy-openstack.sh` script.
`deploy-openstack.sh` should be available within the home directory on your Ansible control host provided you ran `deploy-openstack-config.yml` earlier.
This script will go through the process of performing the following tasks:

ansible-galaxy install -r ansible/requirements.yml
* kayobe control host bootstrap
* kayobe seed host configure
* kayobe overcloud host configure
* cephadm deployment
* kayobe overcloud service deploy
* openstack configuration
* tempest testing

Review the vars defined within `ansible/vars/defaults.yml`. In here you can customise the version of kayobe, kayobe-config or openstack-config.
However, make sure to define `ssh_key_path` to point to the location of the SSH key in use amongst the nodes and also `vxlan_vni` which should be unique value between 1 to 100,000.
VNI should be much smaller than the officially supported limit of 16,777,215 as we encounter errors when attempting to bring interfaces up that use a high VNI. You must set``vault_password_path``; this should be set to the path to a file containing the Ansible vault password.
Tempest test results will be written to `~/tempest-artifacts`.

Finally, run the configure-hosts playbook.
If you choose to opt for the automated method you must first SSH into your Ansible control host.

.. code-block:: console

ansible-playbook -i ansible/inventory.yml ansible/configure-hosts.yml

This playbook sequentially executes 4 other playbooks:

#. ``fix-homedir-ownership.yml`` - Ensures the ``ansible_user`` owns their home directory. Tag: ``fix-homedir``
#. ``add-fqdn.yml`` - Ensures FQDNs are added to ``/etc/hosts``. Tag: ``fqdn``
#. ``grow-control-host.yml`` - Applies LVM configuration to the control host to ensure it has enough space to continue with the rest of the deployment. Tag: ``lvm``
#. ``deploy-openstack-config.yml`` - Deploys the OpenStack configuration to the control host. Tag: ``deploy``
ssh $(terraform output -raw ssh_user)@$(terraform output -raw ansible_control_access_ip_v4)

These playbooks are tagged so that they can be invoked or skipped as required. For example, if designate is not being deployed, some time can be saved by skipping the FQDN playbook:
Start a `tmux` session to avoid halting the deployment if you are disconnected.

.. code-block:: console

ansible-playbook -i ansible/inventory.yml ansible/configure-hosts.yml --skip-tags fqdn

Deploy OpenStack
----------------

Once the Ansible control host has been configured with a Kayobe/OpenStack configuration you can then begin the process of deploying OpenStack.
This can be achieved by either manually running the various commands to configures the hosts and deploy the services or automated by using `deploy-openstack.sh`,
which should be available within the homedir on your Ansible control host provided you ran `deploy-openstack-config.yml` earlier.
tmux

If you choose to opt for automated method you must first SSH into your Ansible control host and then run the `deploy-openstack.sh` script
Run the `deploy-openstack.sh` script.

.. code-block:: console

ssh $(terraform output -raw ssh_user)@$(terraform output -raw ansible_control_access_ip_v4)
~/deploy-openstack.sh

This script will go through the process of performing the following tasks
* kayobe control host bootstrap
* kayobe seed host configure
* kayobe overcloud host configure
* cephadm deployment
* kayobe overcloud service deploy
* openstack configuration
* tempest testing

Accessing OpenStack
-------------------
===================

After a successful deployment of OpenStack you make access the OpenStack API and Horizon by proxying your connection via the seed node, as it has an interface on the public network (192.168.39.X).
Using software such as sshuttle will allow for easy access.
Expand All @@ -260,15 +276,15 @@ Important to node this will proxy all DNS requests from your machine to the firs
sshuttle -r $(terraform output -raw ssh_user)@$(terraform output -raw seed_access_ip_v4) 192.168.39.0/24 --dns --to-ns 192.168.39.4

Tear Down
---------
=========

After you are finished with the multinode environment please destroy the nodes to free up resources for others.
This can acomplished by using the provided `scripts/tear-down.sh` which will destroy your controllers, compute, seed and storage nodes whilst leaving your Ansible control host and keypair intact.

If you would like to delete your Ansible control host then you can pass the `-a` flag however if you would also like to remove your keypair then pass `-a -k`

Issues & Fixes
--------------
==============

Sometimes a compute instance fails to be provisioned by Terraform or fails on boot for any reason.
If this happens the solution is to mark the resource as tainted and perform terraform apply again which shall destroy and rebuild the failed instance.
Expand Down
File renamed without changes.
16 changes: 0 additions & 16 deletions ansible/add-fqdn.yml

This file was deleted.

4 changes: 0 additions & 4 deletions ansible/configure-hosts.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
---
- import_playbook: fix-homedir-ownership.yml
tags: fix-homedir
- import_playbook: add-fqdn.yml
tags: fqdn
- import_playbook: grow-control-host.yml
tags: lvm
- import_playbook: deploy-openstack-config.yml
Expand Down
20 changes: 20 additions & 0 deletions ansible/deploy-openstack-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,30 @@
- ssh_key_path != None
fail_msg: "Please provide a path to the SSH key used within the multinode environment."

- name: Verify ssh key exists
ansible.builtin.assert:
that:
- ssh_key_path | expanduser is exists
fail_msg: "Could not find SSH key at {{ ssh_key_path | expanduser }}"

- name: Verify vault password path has been set
ansible.builtin.assert:
that:
- vault_password_path != None
fail_msg: "Please provide a path to the vault password used within the multinode environment."

- name: Verify vault password exists
ansible.builtin.assert:
that:
- vault_password_path | expanduser is exists
fail_msg: "Could not find vault password at {{ vault_password_path | expanduser }}"

- name: Verify VXLAN VNI has been set
ansible.builtin.assert:
that:
- vxlan_vni != None
- vxlan_vni | int > 0
- vxlan_vni | int <= 100000
fail_msg: "Please provide a VXLAN VNI. A unique value from 1 to 100,000."

- name: Gather facts about the host
Expand Down
Loading