Use systemd for opendistro/kibana/filebeat #52

sjpb · 2021-03-24T10:22:18Z

NB: this PR is for main.

Fixes #45 - see discussion there.

Key aspects:

Use systemd to define/control container services so they start on boot.
Can't use systemd --user as systemd version in CentOS doesn't allow us to override limits for user services, so run as root service with User: directive etc.
In this configuration, podman by default creates a tmpdir on /tmp (not a tmpfs) rather than /run (a tmpfs). A tmpfs is required for podman's tmpdir so that podman can tell a reboot has happened. So use the tmpfiles daemon to create appropriate tmpdirs on /run with right permissions etc.

jovial · 2021-03-25T09:37:41Z

Kibana failing with:

Status code was 503 and not [200]: HTTP Error 503: Service Unavailable

https://github.com/stackhpc/openhpc-demo/pull/52/checks?check_run_id=2183262961#step:7:722

retrying...

sjpb · 2021-03-26T11:18:13Z

Kibana failing with:
Status code was 503 and not [200]: HTTP Error 503: Service Unavailable
https://github.com/stackhpc/openhpc-demo/pull/52/checks?check_run_id=2183262961#step:7:722

retrying...

Having finally got vagrant working and squashed some other bugs the issue is elasticsearch is refusing connections:

[vagrant@testohpc-control-0 ~]$ curl 10.109.0.140:9200
curl: (7) Failed to connect to 10.109.0.140 port 9200: Connection refused

Is this something to do with networking inside of vagrant?

jovial · 2021-03-26T11:24:44Z

Kibana failing with:
Status code was 503 and not [200]: HTTP Error 503: Service Unavailable
https://github.com/stackhpc/openhpc-demo/pull/52/checks?check_run_id=2183262961#step:7:722
retrying...
Having finally got vagrant working and squashed some other bugs the issue is elasticsearch is refusing connections:
[vagrant@testohpc-control-0 ~]$ curl 10.109.0.140:9200
curl: (7) Failed to connect to 10.109.0.140 port 9200: Connection refused
Is this something to do with networking inside of vagrant?

I've got a feeling elastic isn't starting up correctly. Do you see it listening on port 9200 on testohpc-control-0? Something like: sudo ss -nlpt should do the trick for that. Will need to check the elastic logs to see why it is unhappy.

sjpb · 2021-03-26T11:28:32Z

Sorry should have said - systemd shows opendistro is running OK, no errors. ss shows:

[vagrant@testohpc-control-0 ~]$ sudo ss -nlpt
State                 Recv-Q                Send-Q                               Local Address:Port                                Peer Address:Port                                                                                         
<snip>
LISTEN                0                     128                                              *:9200                                           *:*                    users:(("exe",pid=39009,fd=14))                                         
<snip>

sjpb · 2021-03-26T12:20:50Z

I'm not sure the IP is right:

[podman@testohpc-control-0 ~]$ podman attach kibana
{"type":"log","@timestamp":"2021-03-26T12:13:28Z","tags":["error","elasticsearch","data"],"pid":1,"message":"[ConnectionError]: connect ECONNREFUSED 10.109.0.140:9200"}

but

[podman@testohpc-control-0 vagrant]$ curl -XGET --insecure https://localhost:9200 -u admin:"<password>"
{
  "name" : "opendistro",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "WUxldXjGRJSEGUZa5fyqtQ",
  "version" : {
    "number" : "7.10.0",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
    "build_date" : "2020-11-09T21:30:33.964949Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

and

[podman@testohpc-control-0 vagrant]$ curl -XGET --insecure https://10.109.0.140:9200 -u admin:"<password>"
curl: (7) Failed to connect to 10.109.0.140 port 9200: Connection refused

jovial

Looking pretty good, nice effort.

ansible/monitoring.yml

jovial · 2021-03-26T15:30:17Z

ansible/monitoring.yml

@@ -15,6 +15,55 @@
        tasks_from: config.yml
      tags: config

+    - name: Define tmp directories on tmpfs


Not in reference to this line in particular, but would this could fit better in the podman role?

The problem is that the user we're running podman as isn't defined in the role, only at the appliance level. I agree it feels like this (and the validate and the podman_tmp_dir_root) should all really be in the podman role, so if you can see a way of achieving that let me know.

jovial · 2021-03-26T15:35:24Z

ansible/monitoring.yml

+        owner: "{{ item.name }}"
+        group: "{{ item.name }}"
+      become: yes
+      loop: "{{ appliances_local_users_podman }}"


The other possibility is to make this code into a utility role that the podman roles all use, passing the relevant user e.g opendistro_user, kibana_user, filebeat_user. That would make the roles more usable in isolation. I don't think this is critical mind...

That would be possible. I felt like this is all a bit of a hack to work around systemd/rhel/podman limits/interactions so I'd hope it disappears entirely when either a) we have user services working or b) the podman patch to remove /tmp/containers-users-* files on reboot.

jovial · 2021-03-26T15:36:01Z

ansible/monitoring.yml

@@ -15,6 +15,55 @@
        tasks_from: config.yml
      tags: config

+    - name: Define tmp directories on tmpfs
+      blockinfile:
+        path: /etc/tmpfiles.d/podman.conf


Is it worth adding this to another file so that it is easier to remove?

jovial · 2021-03-26T15:38:49Z

ansible/validate.yml

@@ -2,6 +2,23 @@

 # Fail early if configuration is invalid

+- name: Validate podman configuration


Put in podman role?

See comment above as to why all of this isn't in the role.

jovial · 2021-03-26T16:37:30Z

Closing and reopening to re-run pull_request workflow with latest code on main.

ansible/roles/podman/tasks/config.yml

jovial · 2021-03-30T16:39:33Z

ansible/roles/podman/tasks/config.yml

@@ -15,3 +15,60 @@

 - name: reset ssh connection to allow user changes to affect 'current login user'
  meta: reset_connection
+
+- name: Ensure podman users exist


Still reckon we should only do this in one place and assume that the users exist in this role, but as this will essentially be a no-op at the cost of running a few extra tasks, probably not one to bike-shed over as the overall patch looks good to me.

use systemd for opendistro/kibana/filebeat

9df8f62

move podman tmpdir onto /run to fix reboot issues

cb8a27f

sjpb added 3 commits March 26, 2021 11:19

remove incorrect podman group override in everything template

3402090

fix podman user tmp directory permissions

2e7eecd

remove confirmation when resetting podman database

81bb008

fix elasticsearch parameters in kibana unit file

d7c0dcf

sjpb requested a review from jovial March 26, 2021 14:55

jovial reviewed Mar 26, 2021

View reviewed changes

remove hardcoded podman user/group name

85a12de

jovial closed this Mar 26, 2021

jovial reopened this Mar 26, 2021

Move podman temp dir code to podman role

110cd34

jovial reviewed Mar 30, 2021

View reviewed changes

ansible/roles/podman/tasks/config.yml Outdated Show resolved Hide resolved

remove hardcoded podman username in tmpfiles config

8b92166

jovial reviewed Mar 30, 2021

View reviewed changes

jovial approved these changes Mar 30, 2021

View reviewed changes

sjpb merged commit d4cfa54 into main Mar 31, 2021

sjpb deleted the fix/containers branch March 31, 2021 06:12

sjpb mentioned this pull request Jan 16, 2024

Pull containers #351

Merged

		@@ -2,6 +2,23 @@

		# Fail early if configuration is invalid

		- name: Validate podman configuration

Use systemd for opendistro/kibana/filebeat #52

Use systemd for opendistro/kibana/filebeat #52

Uh oh!

Conversation

sjpb commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jovial commented Mar 25, 2021

Uh oh!

sjpb commented Mar 26, 2021

Uh oh!

jovial commented Mar 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjpb commented Mar 26, 2021

Uh oh!

sjpb commented Mar 26, 2021

Uh oh!

jovial left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jovial commented Mar 26, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sjpb commented Mar 24, 2021 •

edited

Loading

jovial commented Mar 26, 2021 •

edited

Loading