Skip to content

Automatically reboot instance if hung #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 6, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 45 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ Note: add `${var.ssh_key_pair}` private key to the `ssh agent`.

Include this repository as a module in your existing terraform code:

```
```terraform
module "admin_tier" {
source = "git::https://github.com/cloudposse/tf_instance.git?ref=tags/0.1.0"
source = "git::https://github.com/cloudposse/tf_instance.git?ref=master"
ansible_playbook = "${var.ansible_playbook}"
ansible_arguments = "${var.ansible_arguments}"
ssh_key_pair = "${var.ssh_key_pair}"
Expand All @@ -22,6 +22,9 @@ module "admin_tier" {
security_groups = ["${var.security_groups}"]
subnets = ["${var.subnets}"]
associate_public_ip_address = "${var.associate_public_ip_address}"
name = "${var.name}"
namespace = "${var.namespace}"
stage = "${var.stage}"
}
```

Expand All @@ -36,7 +39,7 @@ This module depends on these modules:
It is necessary to run `terraform get` to download those modules.

Now reference the label when creating an instance (for example):
```
```terraform
resource "aws_ami_from_instance" "example" {
name = "terraform-example"
source_instance_id = "${module.admin_tier.id}"
Expand All @@ -45,35 +48,47 @@ resource "aws_ami_from_instance" "example" {

## Variables

| Name | Default | Description | Required |
|:----------------------------:|:--------------:|:--------------------------------------------------------:|:---------------:|
| `namespace` | `global` | Namespace (e.g. `cp` or `cloudposse`) - required for `tf_label` module | Yes |
| `stage` | `default` | Stage (e.g. `prod`, `dev`, `staging` - required for `tf_label` module | Yes |
| `name` | `admin` | Name (e.g. `bastion` or `db`) - required for `tf_label` module | Yes |
| `ec2_ami` | `ami-cd0f5cb6` | By default it is an AMI provided by Amazon with Ubuntu 16.04 | No |
| `ssh_key_pair` | `` | SSH key pair to be provisioned on instance | Yes |
| `github_api_token` | `` | GitHub API token | Yes |
| `github_organization` | `` | GitHub organization name | Yes |
| `github_team` | `` | GitHub team | Yes |
| `ansible_playbook` | `` | Path to the playbook - required for `tf_ansible` (e.g. `./admin_tier.yml`)|Yes|
| `ansible_arguments` | [] | List of ansible arguments (e.g. `["--user=ubuntu"]`) | No |
| `instance_type` | `t2.micro` | The type of the creating instance (e.g. `t2.micro`) | No |
| `vpc_id` | `` | The id of the VPC that the creating instance security group belongs to | Yes |
| `security_groups` | [] | List of Security Group IDs allowed to connect to creating instance | Yes |
| `subnets` | [] | List of VPC Subnet IDs creating instance launched in | Yes |
| `associate_public_ip_address`| `true` | Associate a public ip address with the creating instance. Boolean value | No |
| Name | Default | Description | Required |
|:-----------------------------|:--------------------------------------------:|:---------------------------------------------------------------------------------|:--------:|
| `namespace` | `global` | Namespace (e.g. `cp` or `cloudposse`) - required for `tf_label` module | Yes |
| `stage` | `default` | Stage (e.g. `prod`, `dev`, `staging` - required for `tf_label` module | Yes |
| `name` | `admin` | Name (e.g. `bastion` or `db`) - required for `tf_label` module | Yes |
| `ec2_ami` | `ami-cd0f5cb6` | By default it is an AMI provided by Amazon with Ubuntu 16.04 | No |
| `ssh_key_pair` | `` | SSH key pair to be provisioned on instance | Yes |
| `github_api_token` | `` | GitHub API token | Yes |
| `github_organization` | `` | GitHub organization name | Yes |
| `github_team` | `` | GitHub team | Yes |
| `ansible_playbook` | `` | Path to the playbook - required for `tf_ansible` (e.g. `./admin_tier.yml`) | Yes |
| `ansible_arguments` | [] | List of ansible arguments (e.g. `["--user=ubuntu"]`) | No |
| `instance_type` | `t2.micro` | The type of the creating instance (e.g. `t2.micro`) | No |
| `vpc_id` | `` | The id of the VPC that the creating instance security group belongs to | Yes |
| `security_groups` | [] | List of Security Group IDs allowed to connect to creating instance | Yes |
| `subnets` | [] | List of VPC Subnet IDs creating instance launched in | Yes |
| `associate_public_ip_address`| `true` | Associate a public ip address with the creating instance. Boolean value | No |
| `comparison_operator` | `GreaterThanOrEqualToThreshold` | Arithmetic operation to use when comparing the specified Statistic and Threshold | Yes |
| `metric_name` | `StatusCheckFailed_Instance` | Name for the alarm's associated metric | Yes |
| `evaluation_periods` | `5` | Number of periods over which data is compared to the specified threshold | Yes |
| `metric_namespace` | `AWS/EC2` | Namespace for the alarm's associated metric | Yes |
| `applying_period` | `60` | Period in seconds over which the specified statistic is applied | Yes |
| `statistic_level` | `Maximum` | Statistic to apply to the alarm's associated metric | Yes |
| `metric_threshold` | `1` | Value against which the specified statistic is compared | Yes |
| `default_alarm_action` |`action/actions/AWS_EC2.InstanceId.Reboot/1.0`| String of action to execute when this alarm transitions into an ALARM state | Yes |


## Outputs

| Name | Decription |
|:-------------------:|:-----------------------:|
| `id` | Disambiguated ID |
| `public_hostname` | Normalized name |
| `public_ip` | Normalized namespace |
| `ssh_key_pair` | Name of used AWS SSH key|
| `security_group_id` | ID on the new AWS Security Group associated with creating instance|
| `role` | Name of AWS IAM Role associated with creating instance|


## Outputs

| Name | Description |
|:--------------------|:-------------------------------------------------------------------|
| `id` | Disambiguated ID |
| `public_hostname` | Normalized name |
| `public_ip` | Normalized namespace |
| `ssh_key_pair` | Name of used AWS SSH key |
| `security_group_id` | ID on the new AWS Security Group associated with creating instance |
| `role` | Name of AWS IAM Role associated with creating instance |
| `alarm` | CloudWatch Alarm ID |

## References
* Thanks to https://github.com/cloudposse/tf_bastion for the inspiration
* Thanks to https://github.com/cloudposse/tf_bastion for the inspiration
42 changes: 38 additions & 4 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ data "template_file" "user_data" {
template = "${file("${path.module}/user_data.sh")}"

vars {
user_data = "${join("\n", compact(concat(var.user_data, list(module.github_authorized_keys.user_data))))}"
welcome_message = "${var.welcome_message}"
ssh_user = "${var.ssh_user}"
user_data = "${join("\n", compact(concat(var.user_data, list(module.github_authorized_keys.user_data))))}"
welcome_message = "${var.welcome_message}"
ssh_user = "${var.ssh_user}"
}
}

Expand All @@ -88,7 +88,7 @@ resource "aws_instance" "default" {
user_data = "${data.template_file.user_data.rendered}"

vpc_security_group_ids = [
"${compact(concat(list(aws_security_group.default.id), var.security_groups))}"
"${compact(concat(list(aws_security_group.default.id), var.security_groups))}",
]

iam_instance_profile = "${aws_iam_instance_profile.default.name}"
Expand Down Expand Up @@ -118,3 +118,37 @@ module "ansible" {
envs = ["host=${aws_eip.default.public_ip}"]
playbook = "${var.ansible_playbook}"
}

# Restart dead or hung instance

data "aws_region" "default" {
current = true
}

data "aws_caller_identity" "default" {}

resource "null_resource" "check_alarm_action" {
triggers = {
action = "arn:aws:swf:${data.aws_region.default.name}:${data.aws_caller_identity.default.account_id}:${var.default_alarm_action}"
}
}

resource "aws_cloudwatch_metric_alarm" "default" {
alarm_name = "${module.label.id}"
comparison_operator = "${var.comparison_operator}"
evaluation_periods = "${var.evaluation_periods}"
metric_name = "${var.metric_name}"
namespace = "${var.metric_namespace}"
period = "${var.applying_period}"
statistic = "${var.statistic_level}"
threshold = "${var.metric_threshold}"
depends_on = ["null_resource.check_alarm_action"]

dimensions {
InstanceId = "${aws_instance.default.id}"
}

alarm_actions = [
"${null_resource.check_alarm_action.triggers.action}",
]
}
4 changes: 4 additions & 0 deletions outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,7 @@ output "security_group_id" {
output "role" {
value = "${aws_iam_role.default.name}"
}

output "alarm" {
value = "${aws_cloudwatch_metric_alarm.default.id}"
}
53 changes: 43 additions & 10 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ variable "associate_public_ip_address" {
}

variable "ansible_arguments" {
type = "list"
type = "list"
default = []
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove default values for name, stage and namespace variables.

Expand All @@ -33,17 +33,11 @@ variable "subnets" {
type = "list"
}

variable "namespace" {
default = "global"
}
variable "namespace" {}

variable "stage" {
default = "default"
}
variable "stage" {}

variable "name" {
default = "admin"
}
variable "name" {}

variable "ec2_ami" {
default = "ami-cd0f5cb6"
Expand All @@ -61,3 +55,42 @@ variable "ssh_user" {
variable "welcome_message" {
default = ""
}

variable "comparison_operator" {
description = "The arithmetic operation to use when comparing the specified Statistic and Threshold. Possible values are: GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanThreshold, LessThanOrEqualToThreshold."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use these descriptions in README.md.

Copy link
Contributor Author

@SweetOps SweetOps Sep 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if add it to readme the line will be too long for table and markdown'll become unreadable
screenshot 2017-09-01 18 54 22

default = "GreaterThanOrEqualToThreshold"
}

variable "metric_name" {
description = "The name for the alarm's associated metric. Possible values you can find in https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ec2-metricscollected.html ."
default = "StatusCheckFailed_Instance"
}

variable "evaluation_periods" {
description = "The number of periods over which data is compared to the specified threshold."
default = "5"
}

variable "metric_namespace" {
description = "The namespace for the alarm's associated metric. Possible values you can find in https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-namespaces.html ."
default = "AWS/EC2"
}

variable "applying_period" {
description = "The period in seconds over which the specified statistic is applied."
default = "60"
}

variable "statistic_level" {
description = "The statistic to apply to the alarm's associated metric. Possible values are: SampleCount, Average, Sum, Minimum, Maximum"
default = "Maximum"
}

variable "metric_threshold" {
description = "The value against which the specified statistic is compared."
default = "1"
}

variable "default_alarm_action" {
default = "action/actions/AWS_EC2.InstanceId.Reboot/1.0"
}