Skip to content

Automatically reboot instance if hung #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 6, 2017
Merged

Automatically reboot instance if hung #3

merged 7 commits into from
Sep 6, 2017

Conversation

SweetOps
Copy link
Contributor

@SweetOps SweetOps commented Sep 1, 2017

What

  • Automatically reboot instance if hung

Why

screenshot 2017-09-01 16 41 46

@SweetOps SweetOps self-assigned this Sep 1, 2017
README.md Outdated
@@ -8,7 +8,7 @@ Note: add `${var.ssh_key_pair}` private key to the `ssh agent`.

Include this repository as a module in your existing terraform code:

```
```terraform
module "admin_tier" {
source = "git::https://github.com/cloudposse/tf_instance.git?ref=tags/0.1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace tags/0.1.0 with master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add name, namespace and stage variables.

README.md Outdated
| `subnets` | [] | List of VPC Subnet IDs creating instance launched in | Yes |
| `associate_public_ip_address`| `true` | Associate a public ip address with the creating instance. Boolean value | No |
| Name | Default | Description | Required |
|:----------------------------:|:--------------------------------------------:|:--------------------------------------------------------------------------------:|:--------:|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix table indentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it's ok.

@@ -61,3 +61,46 @@ variable "ssh_user" {
variable "welcome_message" {
default = ""
}

variable "comparison_operator" {
description = "The arithmetic operation to use when comparing the specified Statistic and Threshold. Possible values are: GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanThreshold, LessThanOrEqualToThreshold."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use these descriptions in README.md.

Copy link
Contributor Author

@SweetOps SweetOps Sep 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if add it to readme the line will be too long for table and markdown'll become unreadable
screenshot 2017-09-01 18 54 22

main.tf Outdated

resource "aws_cloudwatch_metric_alarm" "reboot" {
alarm_name = "${module.label.id}-status-check-failed"
comparison_operator = "${var.comparison_operator}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we hardcode these variables?
@osterman what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably would have just hard coded it, but since @SweetOps has already parameterized it, I'm okay with it so long as we rename reboot to default or something more general.

main.tf Outdated

resource "null_resource" "check_alarm" {
triggers = {
action = "arn:aws:swf:${var.aws_region}:${var.aws_account_id}:${var.default_alarm_action}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a data source to retrieve aws_region and aws_account_id. No need for additional inputs.

README.md Outdated
| `applying_period` | `60` | Period in seconds over which the specified statistic is applied | Yes |
| `statistic_level` | `Maximum` | Statistic to apply to the alarm's associated metric | Yes |
| `metric_threshold` | `1` | Value against which the specified statistic is compared | Yes |
| `default_alarm_action` |`action/actions/AWS_EC2.InstanceId.Reboot/1.0`| List of actions to execute when this alarm transitions into an ALARM state | Yes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.tf Outdated
}

resource "aws_cloudwatch_metric_alarm" "reboot" {
alarm_name = "${module.label.id}-status-check-failed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this alarm_name should be default since everything is parameterized and it could mean anything.

@SweetOps
Copy link
Contributor Author

SweetOps commented Sep 5, 2017

screenshot 2017-09-05 12 52 22

README.md Outdated
## Outputs

| Name | Decription |
|:-------------------:|:------------------------------------------------------------------:|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not center justify name and description.

main.tf Outdated

# Restart dead or hung instance

data "aws_region" "current" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be called default

main.tf Outdated
current = true
}

data "aws_caller_identity" "current" {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it default

main.tf Outdated

data "aws_caller_identity" "current" {}

resource "null_resource" "check_alarm" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call it default

main.tf Outdated

resource "null_resource" "check_alarm" {
triggers = {
action = "arn:aws:swf:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:${var.default_alarm_action}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename action to check_alarm_action

osterman
osterman previously approved these changes Sep 5, 2017
@@ -15,7 +15,7 @@ variable "associate_public_ip_address" {
}

variable "ansible_arguments" {
type = "list"
type = "list"
default = []
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove default values for name, stage and namespace variables.

@const-bon
Copy link
Contributor

Tested.
1504714551599

@const-bon const-bon merged commit fd5c0b4 into master Sep 6, 2017
@const-bon const-bon deleted the alarm branch September 6, 2017 16:31
SweetOps added a commit that referenced this pull request Nov 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants