Skip to content

Commit 06ae09b

Browse files
sjpbsd109
andauthored
Root-squash nfs exports by default (#599)
* WIP: root squash nfs by default - fails b/c user homedir in wrong place * WIP: run ALL userdir tasks on basic_users_homedir_host * do ssh key handling on client node to simplify finding /home/rocky * tidy basic_user defaults * README/whitespace fixes * fix ssh key location * fix bash profile etc not existing * make ssh key comment match docs * make export location definition clear * fix hpctests to work with root-squashed /home * Non-functional suggestions from code review Co-authored-by: Scott Davidson <[email protected]> * address wording comments from PR --------- Co-authored-by: Scott Davidson <[email protected]>
1 parent d5e851b commit 06ae09b

File tree

17 files changed

+226
-112
lines changed

17 files changed

+226
-112
lines changed

ansible/roles/basic_users/README.md

Lines changed: 82 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -5,64 +5,117 @@ basic_users
55
Setup users on cluster nodes using `/etc/passwd` and manipulating `$HOME`, i.e.
66
without requiring LDAP etc. Features:
77
- UID/GID is consistent across cluster (and explicitly defined).
8-
- SSH key generated and propagated to all nodes to allow login between cluster nodes.
8+
- SSH key generated and propagated to all nodes to allow login between cluster
9+
nodes.
910
- An "external" SSH key can be added to allow login from elsewhere.
10-
- Login to the control node is prevented (by default)
11+
- Login to the control node is prevented (by default).
1112
- When deleting users, systemd user sessions are terminated first.
1213

13-
Requirements
14-
------------
15-
- `$HOME` (for normal users, i.e. not `rocky`) is assumed to be on a shared
16-
filesystem. Actions affecting that shared filesystem are run on a single host,
17-
see `basic_users_manage_homedir` below.
14+
> [!IMPORTANT] This role assumes that `$HOME` for users managed by this role
15+
(e.g. not `rocky` and other system users) is on a shared filesystem. The export
16+
of this shared filesystem may be root squashed if its server is in the
17+
`basic_user` group - see configuration examples below.
1818

1919
Role Variables
2020
--------------
2121

2222
- `basic_users_users`: Optional, default empty list. A list of mappings defining information for each user. In general, mapping keys/values are passed through as parameters to [ansible.builtin.user](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/user_module.html) and default values are as given there. However:
23-
- `create_home`, `generate_ssh_key` and `ssh_key_comment` are set automatically; this assumes home directories are on a cluster-shared filesystem.
24-
- `uid` should be set, so that the UID/GID is consistent across the cluster (which Slurm requires).
25-
- `shell` if *not* set will be `/sbin/nologin` on the `control` node and the default shell on other users. Explicitly setting this defines the shell for all nodes.
23+
- `create_home` and `generate_ssh_key`: Normally set automatically. Can be
24+
set `false` if necessary to disable home directory creation/cluster ssh
25+
key creation. Should not be set `true` to avoid trying to modify home
26+
directories from multiple nodes simultaneously.
27+
- `ssh_key_comment`: Default is user name.
28+
- `home`: Set automatically based on the user name and
29+
`basic_users_homedir_host_path`. Can be overriden if required for e.g.
30+
users with non-standard home directory paths.
31+
- `uid`: Should be set, so that the UID/GID is consistent across the cluster
32+
(which Slurm requires).
33+
- `shell`: If *not* set will be `/sbin/nologin` on the `control` node to
34+
prevent users logging in to this node, and the default shell on other
35+
nodes. Explicitly setting this defines the shell for all nodes and if the
36+
shared home directories are mounted on the control node will allow the
37+
user to log in to the control node.
2638
- An additional key `public_key` may optionally be specified to define a key to log into the cluster.
2739
- An additional key `sudo` may optionally be specified giving a string (possibly multiline) defining sudo rules to be templated.
2840
- `ssh_key_type` defaults to `ed25519` instead of the `ansible.builtin.user` default of `rsa`.
2941
- Any other keys may present for other purposes (i.e. not used by this role).
3042
- `basic_users_groups`: Optional, default empty list. A list of mappings defining information for each group. Mapping keys/values are passed through as parameters to [ansible.builtin.group](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/group_module.html) and default values are as given there.
3143
- `basic_users_override_sssd`: Optional bool, default false. Whether to disable `sssd` when ensuring users/groups exist with this role. Permits creating local users/groups even if they clash with users provided via sssd (e.g. from LDAP). Ignored if host is not in group `sssd` as well. Note with this option active `sssd` will be stopped and restarted each time this role is run.
32-
- `basic_users_manage_homedir`: Optional bool, must be true on a single host to
33-
determine which host runs tasks affecting the shared filesystem. The default
34-
is to use the first play host which is not the control node, because the
35-
default NFS configuration does not have the shared `/home` directory mounted
36-
on the control node.
44+
- `basic_users_homedir_host`: Optional inventory hostname defining the host
45+
to use to create home directories. If the home directory export is root
46+
squashed, this host *must* be the home directory server. Default is the
47+
`control` node which is appropriate for the default appliance configuration.
48+
Not relevant if `create_home` is false for all users.
49+
- `basic_users_homedir_host_path`: Optional path prefix for home directories on
50+
the `basic_users_homedir_host`, i.e. on the "server side". Default is
51+
`/exports/home` which is appropriate for the default appliance configuration.
3752

3853
Dependencies
3954
------------
4055

4156
None.
4257

43-
Example Playbook
44-
----------------
58+
Example Configurations
59+
----------------------
4560

46-
```yaml
47-
- hosts: basic_users
48-
become: yes
49-
gather_facts: yes
50-
tasks:
51-
- import_role:
52-
name: basic_users
53-
```
54-
55-
Example variables, to create user `alice` and delete user `bob`:
61+
With default appliance NFS configuration, create user `alice` with access
62+
to all nodes except the control node, and delete user `bob`:
5663

5764
```yaml
5865
basic_users_users:
5966
- comment: Alice Aardvark
6067
name: alice
6168
uid: 2005
62-
public_key: ssh-rsa ...
69+
public_key: ssh-ed25519 ...
6370
- comment: Bob Badger
6471
name: bob
6572
uid: 2006
66-
public_key: ssh-rsa ...
73+
public_key: ssh-ed25519 ...
6774
state: absent
6875
```
76+
77+
Using an external share which:
78+
- does not root squash (so this role can create directories on it)
79+
- is mounted to all nodes including the control node (so this role can set
80+
authorized keys there)
81+
82+
Create user `Carol`:
83+
84+
```yaml
85+
basic_users_homedir_host: "{{ ansible_play_hosts | first }}" # doesn't matter which host is used
86+
basic_users_homedir_host_path: /home # homedir_host is client not server
87+
basic_users_user:
88+
- comment: Carol Crane
89+
name: carol
90+
uid: 2007
91+
public_key: ssh-ed25519 ...
92+
```
93+
94+
Using an external share which *does* root squash, so home directories cannot be
95+
created by this role and must already exist, create user `Dan`:
96+
97+
```yaml
98+
basic_users_homedir_host: "{{ ansible_play_hosts | first }}"
99+
basic_users_homedir_host_path: /home
100+
basic_users_users:
101+
- comment: Dan Deer
102+
create_home: false
103+
name: dan
104+
uuid: 2008
105+
public_key: ssh-ed25519 ...
106+
```
107+
108+
Using NFS exported from the control node, but mounted to all nodes (so that
109+
authorized keys applies to all nodes), create user `Erin` with passwordless sudo:
110+
111+
```yaml
112+
basic_users_users:
113+
- comment: Erin Eagle
114+
name: erin
115+
uid: 2009
116+
shell: /bin/bash # override default nologin on control
117+
groups:
118+
- adm # enables ssh to compute nodes even without a job running
119+
sudo: erin ALL=(ALL) NOPASSWD:ALL
120+
public_key: ssh-ed25519 ...
121+
```

ansible/roles/basic_users/defaults/main.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
basic_users_manage_homedir: "{{ ansible_hostname == (ansible_play_hosts | difference(groups['control']) | first) }}"
1+
basic_users_homedir_host: "{{ groups['control'] | first }}" # no way, generally, to find the nfs_server
2+
basic_users_homedir_host_path: /exports/home
3+
# _basic_users_manage_homedir: "{{ ansible_hostname == basic_users_homedir_host }}"
24
basic_users_userdefaults:
3-
state: present
4-
create_home: "{{ basic_users_manage_homedir }}"
5-
generate_ssh_key: "{{ basic_users_manage_homedir }}"
5+
state: present # need this here so don't have to add default() everywhere
6+
generate_ssh_key: true
67
ssh_key_comment: "{{ item.name }}"
78
ssh_key_type: ed25519
89
shell: "{{'/sbin/nologin' if 'control' in group_names else omit }}"

ansible/roles/basic_users/tasks/main.yml

Lines changed: 98 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -21,53 +21,116 @@
2121
ansible.builtin.group: "{{ item }}"
2222
loop: "{{ basic_users_groups }}"
2323

24-
- name: Create users and generate public keys
25-
user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() }}"
24+
- name: Create users
25+
user: "{{ basic_users_userdefaults | combine(item) | filter_user_params() | combine(_disable_homedir) }}"
2626
loop: "{{ basic_users_users }}"
2727
loop_control:
28-
label: "{{ item.name }} [{{ item.state | default('present') }}]"
29-
register: basic_users_info
28+
label: "{{ item.name }}"
29+
vars:
30+
_disable_homedir: # ensure this task doesn't touch $HOME
31+
create_home: false
32+
generate_ssh_key: false
33+
34+
- name: Write sudo rules
35+
blockinfile:
36+
path: /etc/sudoers.d/80-{{ item.name }}-user
37+
block: "{{ item.sudo }}"
38+
create: true
39+
loop: "{{ basic_users_users }}"
40+
loop_control:
41+
label: "{{ item.name }}"
42+
when:
43+
- item.state | default('present') == 'present'
44+
- "'sudo' in item"
3045

3146
- name: Restart sssd if required
3247
systemd:
3348
name: sssd
3449
state: started
3550
when: _stop_sssd is changed
3651

37-
- name: Write supplied public key as authorized for SSH access
38-
authorized_key:
39-
user: "{{ item.name }}"
40-
state: present
41-
key: "{{ item.public_key }}"
52+
# This task runs (only) on the home directory server, if in the group, so it can
53+
# handle root squashed exports
54+
- name: Create home directories
55+
# doesn't delete with state=absent, same as ansible.builtin.user
56+
ansible.builtin.copy:
57+
remote_src: true
58+
src: "{{ item.skeleton | default('/etc/skel/') }}"
59+
dest: "{{ item.home | default( basic_users_homedir_host_path + '/' + item.name ) }}"
60+
owner: "{{ item.name }}"
61+
group: "{{ item.name }}"
62+
mode: u=rwX,go=
63+
delegate_to: "{{ basic_users_homedir_host }}"
64+
run_once: true
4265
loop: "{{ basic_users_users }}"
4366
loop_control:
44-
label: "{{ item.name }} [{{ item.state | default('present') }}]"
67+
label: "{{ item.name }}"
4568
when:
4669
- item.state | default('present') == 'present'
47-
- item.public_key is defined
48-
- basic_users_manage_homedir
70+
- item.create_home | default(true) | bool
4971

50-
- name: Write generated public key as authorized for SSH access
51-
# this only runs on the basic_users_manage_homedir so has registered var
52-
# from that host too
53-
authorized_key:
54-
user: "{{ item.name }}"
55-
state: present
56-
manage_dir: no
57-
key: "{{ item.ssh_public_key }}"
58-
loop: "{{ basic_users_info.results }}"
59-
loop_control:
60-
label: "{{ item.name }}"
61-
when:
62-
- item.ssh_public_key is defined
63-
- basic_users_manage_homedir
72+
# The following tasks deliberately run on a (single) *client* node, so that
73+
# home directory paths are easily constructed, becoming each user so that root
74+
# squash doesn't matter
75+
- delegate_to: "{{ groups['basic_users'] | difference([basic_users_homedir_host]) | first }}"
76+
run_once: true
77+
block:
78+
- name: Create ~/.ssh directories
79+
file:
80+
state: directory
81+
path: ~/.ssh/
82+
owner: "{{ item.name }}"
83+
group: "{{ item.name }}"
84+
mode: u=rwX,go=
85+
become_user: "{{ item.name }}"
86+
loop: "{{ basic_users_users }}"
87+
loop_control:
88+
label: "{{ item.name }}"
89+
when:
90+
- item.state | default('present') == 'present'
6491

65-
- name: Write sudo rules
66-
blockinfile:
67-
path: /etc/sudoers.d/80-{{ item.name}}-user
68-
block: "{{ item.sudo }}"
69-
create: true
70-
loop: "{{ basic_users_users }}"
71-
loop_control:
72-
label: "{{ item.name }}"
73-
when: "'sudo' in item"
92+
- name: Generate cluster ssh key
93+
community.crypto.openssh_keypair:
94+
path: "{{ item.ssh_key_file | default('~/.ssh/id_' + _ssh_key_type )}}" # NB: ssh_key_file is from ansible.builtin.user
95+
type: "{{ _ssh_key_type }}"
96+
comment: "{{ item.ssh_key_comment | default(item.name) }}"
97+
vars:
98+
_ssh_key_type: "{{ item.ssh_key_type | default('ed25519') }}"
99+
become_user: "{{ item.name }}"
100+
loop: "{{ basic_users_users }}"
101+
loop_control:
102+
label: "{{ item.name }}"
103+
when:
104+
- item.state | default('present') == 'present'
105+
- item.generate_ssh_key | default(true) | bool
106+
register: _cluster_ssh_keypair
107+
108+
- name: Write generated cluster ssh key to authorized_keys
109+
ansible.posix.authorized_key:
110+
user: "{{ item.item.name }}"
111+
state: present
112+
manage_dir: false
113+
key: "{{ item.public_key }}"
114+
path: ~/.ssh/authorized_keys
115+
become_user: "{{ item.item.name }}"
116+
loop: "{{ _cluster_ssh_keypair.results }}"
117+
loop_control:
118+
label: "{{ item.item.name }}"
119+
when:
120+
- item.item.state | default('present') == 'present'
121+
- "'public_key' in item"
122+
123+
- name: Write supplied public key to authorized_keys
124+
ansible.posix.authorized_key:
125+
user: "{{ item.name }}"
126+
state: present
127+
manage_dir: false
128+
key: "{{ item.public_key }}"
129+
path: ~/.ssh/authorized_keys
130+
become_user: "{{ item.name }}"
131+
loop: "{{ basic_users_users }}"
132+
loop_control:
133+
label: "{{ item.name }}"
134+
when:
135+
- item.state | default('present') == 'present'
136+
- item.public_key is defined

ansible/roles/hpctests/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,10 @@ Requirements
2222

2323
Role Variables
2424
--------------
25-
26-
- `hpctests_rootdir`: Required. Path to root of test directory tree, which must be on a r/w filesystem shared to all cluster nodes under test. The last directory component will be created.
25+
- `hpctests_user`: Optional. User to run jobs as. Default is `ansible_user`.
26+
- `hpctests_rootdir`: Optional. Path to root of test directory tree. This must
27+
be a r/w filesystem shared to all cluster nodes under test. Default is
28+
`/home/{{ hpctests_user }}/hpctests`. **NB:** Do not use `~` in this path.
2729
- `hpctests_partition`: Optional. Name of partition to use, otherwise default partition is used.
2830
- `hpctests_nodes`: Optional. A Slurm node expression, e.g. `'compute-[0-15,19]'` defining the nodes to use. If not set all nodes in the selected partition are used.
2931
- `hpctests_ucx_net_devices`: Optional. Control which network device/interface to use, e.g. `mlx5_1:0`. The default of `all` (as per UCX) may not be appropriate for multi-rail nodes with different bandwidths on each device. See [here](https://openucx.readthedocs.io/en/master/faq.html#what-is-the-default-behavior-in-a-multi-rail-environment) and [here](https://github.com/openucx/ucx/wiki/UCX-environment-parameters#setting-the-devices-to-use). Alternatively a mapping of partition name (as `hpctests_partition`) to device/interface can be used. For partitions not defined in the mapping the default of `all` is used.

ansible/roles/hpctests/defaults/main.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
2-
hpctests_rootdir:
2+
hpctests_user: "{{ ansible_user }}"
3+
hpctests_rootdir: "/home/{{ hpctests_user }}/hpctests"
34
hpctests_pre_cmd: ''
45
hpctests_pingmatrix_modules: [gnu12 openmpi4]
56
hpctests_pingpong_modules: [gnu12 openmpi4 imb]

ansible/roles/hpctests/library/plot_nxnlatbw.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# Apache 2 License
66

77
from ansible.module_utils.basic import AnsibleModule
8-
import json
8+
import json, os
99

1010
ANSIBLE_METADATA = {
1111
"metadata_version": "0.1",
@@ -109,8 +109,8 @@ def run_module():
109109
module = AnsibleModule(argument_spec=module_args, supports_check_mode=True)
110110
result = {"changed": False}
111111

112-
src = module.params["src"]
113-
dest = module.params["dest"]
112+
src = os.path.expanduser(module.params["src"])
113+
dest = os.path.expanduser(module.params["dest"])
114114
nodes = module.params["nodes"]
115115
if nodes is not None:
116116
nodes = nodes.split(',')

ansible/roles/hpctests/tasks/build-hpl.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@
5252

5353
- name: Build HPL executable
5454
shell:
55-
cmd: "sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh"
55+
cmd: "bash -l -c 'sbatch --wait hpl-build-{{ hpctests_hpl_arch }}.sh'" # need login shell for module command
5656
chdir: "{{ hpctests_hpl_srcdir }}"
5757
creates: "bin/{{ hpctests_hpl_arch }}/xhpl"
58-
become: no

ansible/roles/hpctests/tasks/hpl-solo.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,7 @@
8080
cmd: "rm -f {{ hpctests_rootdir }}/hpl-solo/hpl-solo.sh.*.out"
8181

8282
- name: Run hpl-solo
83-
shell: sbatch --wait hpl-solo.sh
84-
become: no
83+
shell: bash -l -c 'sbatch --wait hpl-solo.sh' # need login shell for module command
8584
args:
8685
chdir: "{{ hpctests_rootdir }}/hpl-solo"
8786
async: "{{ 20 * 60 }}" # wait for up to 20 minutes

0 commit comments

Comments
 (0)