Skip to content

Releases: stackhpc/ansible-slurm-appliance

v1.134

10 Nov 14:13
6f31af4
Compare
Choose a tag to compare

What's Changed

  • Updates to OpenHPC role and source image by @sjpb in #324
  • Development quality-of-life improvements by @sjpb in #316
  • Add support for freeipa clients by @sjpb in #241

Full Changelog: v1.133...v1.134

Deployment notes

The stackhpc.openhpc role has changed. To update this run:

dev/setup-env.sh

Image Details

New image openhpc-231027-0916-893570de

v1.133

24 Oct 09:24
f03c89f
Compare
Choose a tag to compare

What's Changed

CI changes

  • Use local container image registry for CI to avoid docker.io ratelimits by @sjpb in #318

Deployment notes

The osc.ood role has changed. To update this run:

ansible-galaxy role install --force -r requirements.yml -p ansible/roles

Image Details

New image openhpc-231020-1357-b5d8b056, requiring 10GB root disk.

New Contributors

Full Changelog: v1.132...v1.133

v1.132

26 Sep 10:05
7eb855e
Compare
Choose a tag to compare

What's Changed

  • Fix ssh ControlPath in skeleton by @sjpb in #297
  • Fix issues when using GenericCloud image by @sjpb in #313

Full Changelog: v1.131...v1.132

CI changes

  • "Fat" image build can now be done either on Arcus (using volume-backed instances -> 10GB virtual disk) or SMS-labs (using non-volume-backed instances - 12GB virtual disk)

Deployment notes

No galaxy-installed roles/collections have changed.

Image Details

Built a new image openhpc-230922-0940-434e190f

Now only requires a 10GB root disk.

v1.131

06 Sep 14:20
b13b98d
Compare
Choose a tag to compare

What's Changed

New features

  • Support for CUDA by @sjpb in #253 and #283 - see 253 for full details and configuration

Fixes and Enhancements

  • Make etc_hosts role more flexible by @sjpb in #277
  • Update prometheus-slurm-exporter version by @m-bull in #280
  • Install out of tree openstack builder plugin by @m-bull in #285
  • Remove warn parameter for ansible>=2.14 by @mkjpryor in #286
  • Fix opensearch grafana plugin at last working version by @sjpb in #292
  • Fix query type in the Slurm jobs Grafana dashboard by @mkarpiarz in #293
  • Use Python3.9 for jupyter notebook server by @sjpb in #294
  • Pin Terraform in CI to MPL licenced version by @sjpb in #302
  • Update opensearch to 2.9.0 by @sjpb in #299

CI changes

  • Make CI cloud selectable between SMSlabs and Arcus by @sjpb in #288
  • Disable EESSI tests in CI and make them debuggable by @sjpb in #295
  • Fix SMS ssh by @sjpb in #296
  • Use portal-internal network (with normal-mode ports) for Arcus CI by @sjpb in #306

Deployment notes

Galaxy roles/collection versions have been changed so use ansible-galaxy {role,collection} install -f ... after merging to force-update these.

Image Details

New Contributors

Full Changelog: v1.130...v1.131

v1.130

12 May 14:43
999cfc8
Compare
Choose a tag to compare

What's Changed

New functionallity/roles/groups

Changes to Packer build functionality

  • Allow Packer base images to be specified by either UUID or name by @m-bull in #266
  • Support attaching a floating IP to the fatimage builder instance by @m-bull in #267
  • Support using volume-backed instances for building and selecting the output image format by @m-bull in #269
  • Allow specifying the packer manifest output path by @m-bull in #268
  • Allow use of ephemeral SSH keys when building Packer images by @m-bull in #274

Other changes

  • Support changing the podman user's uid by @sjpb in #264
  • Fix to proxy role: now defaults to including localhost in no_proxy by @sjpb in #270
  • Add debug logging options for opensearch & filebeat by @sjpb in #271
  • The UCX device to use for hpctests can now be defined per partition by @sjpb in #275
  • Always delete resources on deploy failure in CI by @sjpb in #272

Full Changelog: v1.129...v1.130

Deployment notes

  • No galaxy reinstalls required since last release.

Image details

v1.129

18 Apr 11:23
0e6ef7e
Compare
Choose a tag to compare

What's Changed

  • Return pingpong sbatch output if job fails by @sjpb in #242
  • Support configuring nameservers and proxies by @sjpb in #247

Full Changelog: v1.128...v1.129

Deployment notes

  • No galaxy reinstalls required

Image details

v1.128

14 Apr 11:05
dcf2d1d
Compare
Choose a tag to compare

What's Changed

  • Environment hooks used when building "fat" image by @sjpb in #255 - allows building site-specific fat images.
  • Re-enable NetworkManager control of /etc/resolv.conf after image build by @sjpb in #258 - fixes #257.

Full Changelog: v1.127...v1.128

Deployment Notes

  • No galaxy reinstalls required

Image Info

NB: This build uses a RockyLinux 8.6 image plus updates instead of the Rocky-8-GenericCloud-Base-8.7-20221130.0.x86_64.qcow2 image, to avoid an issue with volume mounts during reboot in that image.

v1.127

14 Mar 13:11
416b440
Compare
Choose a tag to compare

What's Changed

  • Build fat image in appliance by @sjpb in #250:
    • The "fat" image containing binaries for all nodes is now built by this repo, not https://github.com/stackhpc/slurm_image_builder as previously. See section below for latest image details.
    • This removes environment-specific control and login node builds - use the fat image instead (these nodes already required the ansible site.yml playbook running after a reimage).
    • Compute node build is now intended to be performed from the latest fat image, hence removed the yum update *.
    • Various minor fixes, see above PR.

Full Changelog: v1.126...v1.127

Image Info

NB: This build uses a RockyLinux 8.6 image plus updates instead of the Rocky-8-GenericCloud-Base-8.7-20221130.0.x86_64.qcow2 image, to avoid an issue with volume mounts during reboot in that image.

v1.126

22 Feb 16:35
c130ed9
Compare
Choose a tag to compare

What's Changed

  • Provide container packages by @sjpb in #249: Clusters now have apptainer (for singularity), podman and podman-compose packages by default
  • Add script to retrieve CI inventory by @sjpb in #248

NB: This is now based on OpenHPC v2.6.1, removing the workarounds added in v1.123.

Full Changelog: v1.125...v1.126

Deployment Notes

Galaxy roles/collection versions have been changed so use ansible-galaxy {role,collection} install -f ... after merging to force-update these.

v1.125

26 Jan 13:52
bdeda03
Compare
Choose a tag to compare

What's Changed

Replaces OpenDistro with OpenSearch 2.4.0 by @sjpb in #197, including:

  • Grafana updated from 8.5.9 to 9.0.3.
  • Filebeat-OSS updated from 7.9.3 to 7.12.1.
  • Host group and role opendistro replaced with group and role opensearch.
  • elasticsearch_* and opendistro_* variables used for opendistro role replaced by opensearch_* variables.
  • Changed to host networking for containers.
  • Updated CaaS/CI image to openhpc-230110-1629.qcow2.

Full Changelog: v1.124...v1.125

Notes

  1. By default opendistro uses self-signed certs with 10 years validity. Certs are automatically updated if necessary when running the appliance.

Deployment

  1. Merge this release:

    • Change any environment-specific elasticsearch_* and opendistro_* variables to appropriate opensearch_* equivalents. Note the defaults in the common environment are changed by this release so only non-defaults require manual action.
    • Remove podman_cidr variable as this is unused for container host networking.
  2. Update galaxy roles and collections.

  3. Run site.yml. OpenDistro data in {{ appliances_state_dir | default('/usr/share') }}/elasticsearch/data will be automatically migrated to OpenSearch . Note the OpenDistro data will not automatically be deleted.