Skip to content

Enable build of control images #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# StackHPC Slurm Appliance

This repository contains playbooks and configuration to define a Slurm-based HPC environment including:
- A Centos 8 and OpenHPC v2-based Slurm cluster.
- A Rocky Linux 8 and OpenHPC v2-based Slurm cluster.
- Shared fileystem(s) using NFS (with servers within or external to the cluster).
- Slurm accounting using a MySQL backend.
- A monitoring backend using Prometheus and ElasticSearch.
Expand All @@ -16,15 +16,15 @@ While it is tested on OpenStack it should work on any cloud, except for node reb
## Prerequisites
It is recommended to check the following before starting:
- You have root access on the "ansible deploy host" which will be used to deploy the appliance.
- You can create instances using a CentOS 8 GenericCloud image (or an image based on that).
- You can create instances using a Rocky 8 GenericCloud image (or an image based on that).
- SSH keys get correctly injected into instances.
- Instances have access to internet (note proxies can be setup through the appliance if necessary).
- DNS works (if not this can be partially worked around but additional configuration will be required).
- Created instances have accurate/synchronised time (for VM instances this is usually provided by the hypervisor; if not or for bare metal instances it may be necessary to configure a time service via the appliance).

## Installation on deployment host

These instructions assume the deployment host is running Centos 8:
These instructions assume the deployment host is running Centos/Rocky 8:

sudo yum install -y git python3
git clone https://github.com/stackhpc/ansible-slurm-appliance
Expand Down
8 changes: 0 additions & 8 deletions ansible/slurm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,6 @@
tags:
- openhpc
tasks:
- name: Add CentOS 8.3 Vault repo for OpenHPC hwloc dependency
# NB: REMOVE THIS once OpenHPC works on CentOS 8.4
yum_repository:
name: vault
file: CentOS-Linux-Vault8.3
description: CentOS 8.3 packages from Vault
baseurl: https://vault.centos.org/8.3.2011/BaseOS/$basearch/os/
gpgkey: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
- import_role:
name: stackhpc.openhpc

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
# Miscellaneous
ansible_user: centos
ansible_user: rocky
appliances_repository_root: "{{ lookup('env', 'APPLIANCES_REPO_ROOT') }}"
appliances_environment_root: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}"

Expand Down
6 changes: 3 additions & 3 deletions environments/vagrant-example/vagrant/Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ Vagrant.configure(2) do |config|

config.vm.define "#{cluster_name}-login-0" do |login|
login.vm.hostname = "#{cluster_name}-login-0"
login.vm.box = "centos/8"
login.vm.box = "rockylinux/8"
login.vm.network "private_network", ip: "192.168.56.10"
login.vm.provision :hosts, :sync_hosts => true
end

config.vm.define "#{cluster_name}-control-0" do |control|
control.vm.hostname = "#{cluster_name}-control-0"
control.vm.box = "centos/8"
control.vm.box = "rockylinux/8"
control.vm.network "private_network", ip: "192.168.56.11"
control.vm.provision :hosts, :sync_hosts => true
control.vm.provider "virtualbox" do |vb|
Expand All @@ -29,7 +29,7 @@ Vagrant.configure(2) do |config|
ip = "192.168.56.#{12 + i}"
config.vm.define "#{cluster_name}-compute-#{i}" do |node|
node.vm.hostname = "#{cluster_name}-compute-#{i}"
node.vm.box = "centos/8"
node.vm.box = "rockylinux/8"
node.vm.network "private_network", ip: ip
node.vm.provision :hosts, :sync_hosts => true
end
Expand Down
3 changes: 0 additions & 3 deletions packer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,7 @@ Steps:

- Build an image (using that config drive):

mkfifo /tmp/qemu-serial.in /tmp/qemu-serial.out
cd packer
PACKER_LOG=1 packer build main.pkr.hcl
# or during development:
PACKER_LOG=1 packer build --on-error=ask main.pkr.hcl

The following variables may also be set:
Expand Down
3 changes: 3 additions & 0 deletions packer/extra_vars.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"openhpc_slurm_partitions" : []
}
11 changes: 6 additions & 5 deletions packer/main.pkr.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ variable "groups" {

variable "base_img_url" {
type = string
default = "https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.4.2105-20210603.0.x86_64.qcow2"
default = "https://download.rockylinux.org/pub/rocky/8.5/images/Rocky-8-GenericCloud-8.5-20211114.2.x86_64.qcow2"
}

variable "base_img_checksum" {
type = string
default = "sha256:3510fc7deb3e1939dbf3fe6f65a02ab1efcc763480bc352e4c06eca2e4f7c2a2"
default = "sha256:c23f58f26f73fb9ae92bfb4cf881993c23fdce1bbcfd2881a5831f90373ce0c8"
}

source "qemu" "openhpc-vm" {
Expand All @@ -40,7 +40,7 @@ source "qemu" "openhpc-vm" {
disk_size = var.disk_size
disk_compression = true
accelerator = "kvm" # default, if available
ssh_username = "centos"
ssh_username = "rocky"
ssh_timeout = "20m"
net_device = "virtio-net" # default
disk_interface = "virtio" # default
Expand All @@ -50,7 +50,7 @@ source "qemu" "openhpc-vm" {
ssh_private_key_file = "~/.ssh/id_rsa"
qemuargs = [
["-monitor", "unix:qemu-monitor.sock,server,nowait"],
# NOTE: To uncomment the below, you need: mkfifo /tmp/qemu-serial.in /tmp/qemu-serial.outh
# To see the VM's console, run `mkfifo /tmp/qemu-serial.in /tmp/qemu-serial.out` then uncommment the below
# ["-serial", "pipe:/tmp/qemu-serial"],
["-m", "896M"],
["-cdrom", "config-drive.iso"]
Expand All @@ -68,7 +68,8 @@ build {
use_proxy = false # see https://www.packer.io/docs/provisioners/ansible#troubleshooting
# TODO: use completely separate inventory, which just shares common? This will ensure
# we don't accidently run anything via delegate_to.
extra_arguments = ["--limit", "builder", "-i", "./ansible-inventory.sh"]
extra_arguments = concat(["--limit", "builder", "-i", "./ansible-inventory.sh", "-e", "@extra_vars.json"])
# extra_vars is used to remove partitions to allow building control images
# TODO: Support vault password
#ansible_env_vars = ["ANSIBLE_VAULT_PASSWORD_FILE=/home/stack/.kayobe-vault-pass"]
}
Expand Down