Skip to content

Fix caas zenith/hpctests/basic_users #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 2, 2025
Merged

Fix caas zenith/hpctests/basic_users #662

merged 12 commits into from
May 2, 2025

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented May 1, 2025

Fixes issues preventing updating the appliance version used for Azimuth caas clusters:

  1. 502 error when trying to connect to monitoring/ondemand:

    RockyLinux 9.5 upgraded to podman v5, which changes the default rootless network tool from slirp4netns to pasta.
    The latter doesn't allow containers to reach the host's IP - see https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/. This PR reverts the network stack for the zenith pod to slirp4netns1. It has been tested on RL9.5 only in caas. However the same option is supported in podman v4.9 used by RockyLinux 8.10 so this seems safe.

    It also bumps the container images for the zenith clients and proxies, and removes some now-unneeded zenith configuration.

  2. The "post-configuration validation" (hpctests) fails:

     hpctests : Create test root directory: 
     There was an issue creating /home/rocky as requested: [Errno 13] Permission denied: b'/home/rocky'
    

    The issue is that since Root-squash nfs exports by default #599, become when running on the login node can't be used to create directories in /home. Therefore the hpctests_user is set to azimuth. Note Default hpctests_group to hpctests_user #663 was also required (merged from main).

  3. Root-squash nfs exports by default #599 changed configuration for the basic_users role, to cope with root-squashed NFS shares. The appliance defaults are suitable for that case, so need conditional modifications for the manila case. To make this simplier, caas slurm now mounts /home on the control node when manila is in use, which makes it consistent with NFS, and will now mean the azimuth user can access the control node whichever home fileshare is in use.

No image build is required for any of these changes.

Footnotes

  1. An alternative option may be to use pasta with the --map-gw option, but https://github.com/containers/podman/issues/22771 suggests this only works properly from podman v5.1, and it is not clear without testing that the gateway address = host's "main" IP address. Therefore simply restoring the previous behaviour seems preferable at the moment.

@sjpb sjpb marked this pull request as ready for review May 1, 2025 10:38
@sjpb sjpb requested a review from a team as a code owner May 1, 2025 10:38
@sjpb sjpb changed the title Fix zenith proxies in caas Fix caas zenith/hpctests/basic_users May 1, 2025
Copy link
Member

@JohnGarbutt JohnGarbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍

@sjpb sjpb merged commit b5ff56a into main May 2, 2025
2 checks passed
@sjpb sjpb deleted the fix/zenith branch May 2, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants