Skip to content

Commit 6516031

Browse files
authored
update vagrant docs to new docs structure (#54)
1 parent a1cd63d commit 6516031

File tree

1 file changed

+27
-161
lines changed

1 file changed

+27
-161
lines changed
Lines changed: 27 additions & 161 deletions
Original file line numberDiff line numberDiff line change
@@ -1,176 +1,42 @@
11
# Vagrant-Example cluster
22

3-
Provisions an environment using vagrant
3+
Provisions an environment using vagrant - this is used by Gitlab CI too.
44

5-
# Directory structure
5+
This README is supplimentary to the main readme at `<repo_root>/README.md` so only differences/additional information is noted here. Paths are relative to this environment unless otherwise noted.
66

7-
## terraform
7+
## Pre-requisites
8+
No additional comments.
89

9-
Contains terraform configuration to deploy infrastructure.
10+
## Installation on deployment host
11+
See main README and then additionally install Vagrant and a provider. For CentOS 8, you can install Vagrant + VirtualBox using:
1012

11-
## inventory
13+
sudo dnf install https://releases.hashicorp.com/vagrant/2.2.6/vagrant_2.2.6_x86_64.rpm
14+
sudo dnf config-manager --add-repo=https://download.virtualbox.org/virtualbox/rpm/el/virtualbox.repo
15+
sudo yum install VirtualBox-6.0
1216

13-
Ansible inventory for configuring the infrastructure.
17+
(Note that each Vagrant version only supports a subset of VirtualBox releases.)
1418

15-
# Setup
19+
## Overview of directory structure
20+
See main README, plus:
21+
- The vagrant configuration is contained in the `vagrant/` directory.
22+
- Scripts are provided in the `<repo_root>dev/` directory to provision and configure the environment.
1623

17-
In the repo root, run:
24+
## Creating a Slurm appliance
1825

19-
python3 -m venv venv # TODO: do we need system-site-packages?
20-
. venv/bin/activate
21-
pip install -U upgrade pip
22-
pip install requirements.txt
23-
ansible-galaxy install -r requirements.yml -p ansible/roles
24-
ansible-galaxy collection install -r requirements.yml -p ansible/collections # don't worry about collections path warning
25-
26-
# Activating the environment
27-
28-
There is a small environment file that you must `source` which defines environment
29-
variables that reference the configuration path. This is so that we can locate
30-
resources relative the environment directory.
31-
32-
. environments/vagrant-example/activate
33-
34-
The pattern we use is that all resources referenced in the inventory
35-
are located in the environment directory containing the inventory that
36-
references them.
37-
38-
# Common configuration
39-
40-
Configuarion is shared by specifiying multiple inventories. We reference the `common`
41-
inventory from `ansible.cfg`, including it before the environment specific
42-
inventory, located at `./inventory`.
43-
44-
Inventories specified later in the list can override values set in the inventories
45-
that appear earlier. This allows you to override values set by the `common` inventory.
46-
47-
Any variables that would be identical for all environments should be defined in the `common` inventory.
48-
49-
# Passwords
50-
51-
Prior to running any other playbooks, you need to define a set of passwords. You can
52-
use the `generate-passwords.yml` playbook to automate this process:
53-
54-
```
55-
cd <repo root>
56-
ansible-playbook ansible/adhoc/generate-passwords.yml # can actually be run from anywhere once environment activated
57-
```
58-
59-
This will output a set of passwords `inventory/group_vars/all/secrets.yml`.
60-
Placing them in the inventory means that they will be defined for all playbooks.
61-
62-
It is recommended to encrypt the contents of this file prior to commiting to git:
63-
64-
```
65-
ansible-vault encrypt inventory/group_vars/all/secrets.yml
66-
```
67-
68-
You will then need to provide a password when running the playbooks e.g:
69-
70-
```
71-
ansible-playbook ../ansible/site.yml --tags grafana --ask-vault-password
72-
```
73-
74-
See the [Ansible vault documentation](https://docs.ansible.com/ansible/latest/user_guide/vault.html) for more details.
75-
76-
77-
# Deploy nodes with Terraform
78-
79-
- Modify the keypair in `main.tf` and ensure the required Centos images are available on OpenStack.
80-
- Activate the virtualenv and create the instances:
81-
82-
. venv/bin/activate
83-
cd environments/vagrant-example/
84-
terraform apply
85-
86-
This creates an ansible inventory file `./inventory`.
87-
88-
Note that this terraform deploys instances onto an existing network - for production use you probably want to create a network for the cluster.
89-
90-
# Create and configure cluster with Ansible
91-
92-
Now run one or more playbooks using:
26+
To provision and configure the appliance in the same way as the CI use:
9327

9428
cd <repo root>
95-
ansible-playbook ansible/site.yml
96-
97-
This provides:
98-
- grafana at `http://<login_ip>:3000` - username `grafana`, password as set above
99-
- prometheus at `http://<login_ip>:9090`
29+
dev/vagrant-provision-example.sh
30+
dev/vagrant-example-configure.sh
10031

101-
NB: if grafana's yum repos are down you will see `Errors during downloading metadata for repository 'grafana' ...`. You can work around this using:
32+
To debug failures, activate the venv and environment and switch to the vagrant project directory:
10233

103-
ssh centos@<login_ip>
104-
sudo rm -rf /etc/yum.repos.d/grafana.repo
105-
wget https://dl.grafana.com/oss/release/grafana-7.3.1-1.x86_64.rpm
106-
sudo yum install grafana-7.3.1-1.x86_64.rpm
107-
exit
108-
ansible-playbook -i inventory monitoring.yml -e grafana_password=<password> --skip-tags grafana_install
109-
110-
# rebuild.yml
111-
112-
# FIXME: outdated
113-
114-
Enable the compute nodes of a Slurm-based OpenHPC cluster on Openstack to be reimaged from Slurm.
115-
116-
For full details including the Slurm commmands to use see the [role's README](https://github.com/stackhpc/ansible_collection_slurm_openstack_tools/blob/main/roles/rebuild/README.md)
117-
118-
Ensure you have `~/.config/openstack/clouds.yaml` defining authentication for a a single Openstack cloud (see above README to change location).
119-
120-
Then run:
121-
122-
ansible-playbook -i inventory rebuild.yml
123-
124-
Note this does not rebuild the nodes, only deploys the tools to do so.
125-
126-
# test.yml
127-
128-
This runs MPI-based tests on the cluster:
129-
- `pingpong`: Runs Intel MPI Benchmark's IMB-MPI1 pingpong between a pair of (scheduler-selected) nodes. Reports zero-size message latency and maximum bandwidth.
130-
- `pingmatrix`: Runs a similar pingpong test but between all pairs of nodes. Reports zero-size message latency & maximum bandwidth.
131-
- `hpl-solo`: Runs HPL **separately** on all nodes, using 80% of memory, reporting Gflops on each node.
132-
133-
These names can be used as tags to run only a subset of tests. For full details see the [role's README](https://github.com/stackhpc/ansible_collection_slurm_openstack_tools/blob/main/roles/test/README.md).
134-
135-
Note these are intended as post-deployment tests for a cluster to which you have root access - they are **not** intended for use on a system running production jobs:
136-
- Test directories are created within `openhpc_tests_rootdir` (here `/mnt/nfs/ohcp-tests`) which must be on a shared filesystem (read/write from login/control and compute nodes)
137-
- Generally, packages are only installed on the control/login node, and `/opt` is exported via NFS to the compute nodes.
138-
- The exception is the `slurm-libpmi-ohpc` package (required for `srun` with Intel MPI) which is installed on all nodes.
139-
140-
To achieve best performance for HPL set `openhpc_tests_hpl_NB` in [test.yml](test.yml) to the appropriate the HPL blocksize 'NB' for the compute node processor - for Intel CPUs see [here](https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/intel-math-kernel-library-benchmarks/intel-distribution-for-linpack-benchmark/configuring-parameters.html).
141-
142-
Then run:
143-
144-
ansible-playbook ../ansible/adhoc/test.yml
145-
146-
Results will be reported in the ansible stdout - the pingmatrix test also writes an html results file onto the ansible host.
147-
148-
Note that you can still use the `test.yml` playbook even if the terraform/ansible in this repo wasn't used to deploy the cluster - as long as it's running OpenHPC v2. Simply create an appropriate `inventory` file, e.g:
149-
150-
[all:vars]
151-
ansible_user=centos
152-
153-
[cluster:children]
154-
cluster_login
155-
cluster_compute
156-
157-
[cluster_login]
158-
slurm-control
159-
160-
[cluster_compute]
161-
cpu-h21a5-u3-svn2
162-
cpu-h21a5-u3-svn4
163-
...
164-
165-
And run the `test.yml` playbook as described above. If you want to run tests only on a group from this inventory, rather than an entire partition, you can
166-
use ``--limit``
167-
168-
Then running the tests passing this file as extra_vars:
169-
170-
ansible-playbook ../ansible/test.yml --limit group-in-inventory
171-
172-
# Destroying the cluster
173-
174-
When finished, run:
34+
. venv/bin/activate
35+
. environments/vagrant-example/activate
36+
cd $APPLIANCES_ENVIRONMENT_ROOT/vagrant
17537

176-
terraform destroy --auto-approve
38+
(see the main README for an explanation of environment activation). Example vagrant commands are:
39+
40+
vagrant status # list vms
41+
vagrant ssh <hostname> # login
42+
vagrant destroy --parallel # destroy all VMs in parallel **without confirmation**

0 commit comments

Comments
 (0)