Skip to content

Commit

Permalink
Merge pull request #1 from sparkfabrik/0000-add-cloud-sql-monitors
Browse files Browse the repository at this point in the history
refs #0000 Add Cloud SQL monitoring
  • Loading branch information
andypanix authored Jun 18, 2024
2 parents 0a529b7 + 900d387 commit cd804b2
Show file tree
Hide file tree
Showing 10 changed files with 352 additions and 21 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,9 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Add support for Cloud SQL monitoring:
- CPU usage.
- Storage usage.
- Memory usage.
- Storage high growth.
55 changes: 51 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,56 @@
# Terraform Module Template
# Terraform GCP Services Monitoring Module

This project can be used as a template for the initial stub of a Terraform
module.
This module creates a set of monitoring alerts for Google Cloud Platform services.

We suggest following Terraform best practices as described in https://www.terraform-best-practices.com/code-structure.
Supported services:

- Cloud SQL
- CPU usage
- Storage usage
- Memory usage

<!-- BEGIN_TF_DOCS -->
## Providers

| Name | Version |
|------|---------|
| <a name="provider_google"></a> [google](#provider\_google) | >= 5.33 |

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.5 |
| <a name="requirement_google"></a> [google](#requirement\_google) | >= 5.33 |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_auto_close"></a> [auto\_close](#input\_auto\_close) | n/a | `string` | `"86400s"` | no |
| <a name="input_cloud_sql"></a> [cloud\_sql](#input\_cloud\_sql) | n/a | <pre>object({<br> project = optional(string, null)<br> auto_close = optional(string, null)<br> notification_channels = optional(list(string), [])<br> instances = optional(map(object({<br> cpu_utilization = optional(list(object({<br> severity = optional(string, "CRITICAL"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "120s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> severity = "WARNING",<br> threshold = 0.85,<br> duration = "1200s",<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 1,<br> duration = "300s",<br> alignment_period = "60s",<br> }<br> ])<br> memory_utilization = optional(list(object({<br> severity = optional(string, "CRITICAL"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "300s")<br> })), [<br> {<br> severity = "WARNING",<br> threshold = 0.80,<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.90,<br> }<br> ])<br> disk_utilization = optional(list(object({<br> severity = optional(string, "CRITICAL"),<br> threshold = optional(number, 0.90)<br> alignment_period = optional(string, "300s")<br> duration = optional(string, "600s")<br> })), [<br> {<br> severity = "WARNING",<br> threshold = 0.85,<br> },<br> {<br> severity = "CRITICAL",<br> threshold = 0.95, <br> }<br> ])<br> })), {})<br> })</pre> | n/a | yes |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | n/a | `list(string)` | `[]` | no |
| <a name="input_project"></a> [project](#input\_project) | n/a | `string` | `null` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_cloud_sql_cpu_utilization"></a> [cloud\_sql\_cpu\_utilization](#output\_cloud\_sql\_cpu\_utilization) | n/a |
| <a name="output_cloud_sql_disk_utilization"></a> [cloud\_sql\_disk\_utilization](#output\_cloud\_sql\_disk\_utilization) | n/a |
| <a name="output_cloud_sql_memory_utilization"></a> [cloud\_sql\_memory\_utilization](#output\_cloud\_sql\_memory\_utilization) | n/a |

## Resources

| Name | Type |
|------|------|
| [google_monitoring_alert_policy.cloud_sql_cpu_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_disk_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |
| [google_monitoring_alert_policy.cloud_sql_memory_utilization](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |

## Modules

No modules.


<!-- END_TF_DOCS -->
155 changes: 155 additions & 0 deletions cloud-sql.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# ----------------------
# CloudSQL
# ----------------------
locals {
# Use the cloud_sql project if specified, otherwise use the project.
cloud_sql_project = var.cloud_sql.project != null ? var.cloud_sql.project : var.project

# Use the cloud_sql notification channels for if not specified in the configuration.
cloud_sql_notification_channels = length(var.cloud_sql.notification_channels) > 0 ? var.cloud_sql.notification_channels : var.notification_channels

# Use the cloud_sql auto_close if specified, otherwise use the auto_close.
cloud_sql_auto_close = var.cloud_sql.auto_close != null ? var.cloud_sql.auto_close : var.auto_close

cloud_sql_cpu_utilization = {
for item in flatten(
[
for instance, instance_config in var.cloud_sql.instances : [
for cpu_utilization in instance_config.cpu_utilization :
merge(
{
"instance" : instance,
},
cpu_utilization
)
]
]
) : "${item.instance}--${item.severity}--${item.threshold}" => item
}

cloud_sql_memory_utilization = {
for item in flatten(
[
for instance, instance_config in var.cloud_sql.instances : [
for memory_utilization in instance_config.memory_utilization :
merge(
{
"instance" : instance,
},
memory_utilization
)
]
]
) : "${item.instance}--${item.severity}--${item.threshold}" => item
}

cloud_sql_disk_utilization = {
for item in flatten(
[
for instance, instance_config in var.cloud_sql.instances : [
for disk_utilization in instance_config.disk_utilization :
merge(
{
"instance" : instance,
},
disk_utilization
)
]
]
) : "${item.instance}--${item.severity}--${item.threshold}" => item
}
}

# ----------------------
# CloudSQL CPU utilization
# ----------------------
resource "google_monitoring_alert_policy" "cloud_sql_cpu_utilization" {
for_each = local.cloud_sql_cpu_utilization

display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
combiner = "OR"
severity = each.value.severity

conditions {
condition_threshold {
filter = "resource.type = \"cloudsql_database\" AND resource.labels.database_id = \"${local.cloud_sql_project}:${each.value.instance}\" AND metric.type = \"cloudsql.googleapis.com/database/cpu/utilization\""
comparison = "COMPARISON_GT"
threshold_value = each.value.threshold
duration = each.value.duration
trigger {
count = 1
}
aggregations {
alignment_period = each.value.alignment_period
per_series_aligner = "ALIGN_MEAN"
}
}
display_name = "${local.cloud_sql_project} ${each.value.instance} - CPU utilization ${each.value.severity} ${each.value.threshold * 100}%"
}
alert_strategy {
auto_close = local.cloud_sql_auto_close
}
notification_channels = local.cloud_sql_notification_channels
}

# ----------------------
# CloudSQL Memory utilization
# ----------------------
resource "google_monitoring_alert_policy" "cloud_sql_memory_utilization" {
for_each = local.cloud_sql_memory_utilization

display_name = "${local.cloud_sql_project} ${each.value.instance} - Memory utilization ${each.value.severity} ${each.value.threshold * 100}%"
combiner = "OR"
severity = each.value.severity
conditions {
display_name = "${local.cloud_sql_project} ${each.value.instance} - Memory utilization ${each.value.severity} ${each.value.threshold * 100}%"
condition_threshold {
filter = "resource.type = \"cloudsql_database\" AND resource.labels.database_id = \"${local.cloud_sql_project}:${each.value.instance}\" AND metric.type = \"cloudsql.googleapis.com/database/memory/utilization\""
duration = each.value.duration
comparison = "COMPARISON_GT"
threshold_value = each.value.threshold

aggregations {
alignment_period = each.value.alignment_period
per_series_aligner = "ALIGN_MEAN"
}
}
}

alert_strategy {
auto_close = local.cloud_sql_auto_close
}

notification_channels = local.cloud_sql_notification_channels
}

# ----------------------
# CloudSQL disk utilization
# ----------------------
resource "google_monitoring_alert_policy" "cloud_sql_disk_utilization" {
for_each = local.cloud_sql_disk_utilization

display_name = "${local.cloud_sql_project} ${each.value.instance} - Disk utilization ${each.value.severity} ${each.value.threshold * 100}%"
combiner = "OR"
severity = each.value.severity

conditions {
display_name = "${local.cloud_sql_project} ${each.value.instance} - Disk utilization ${each.value.severity} ${each.value.threshold * 100}%"
condition_threshold {
filter = "resource.type = \"cloudsql_database\" AND resource.labels.database_id = \"${local.cloud_sql_project}:${each.value.instance}\" AND metric.type = \"cloudsql.googleapis.com/database/disk/utilization\""
duration = each.value.duration
comparison = "COMPARISON_GT"
threshold_value = each.value.threshold

aggregations {
alignment_period = each.value.alignment_period
per_series_aligner = "ALIGN_MEAN"
}
}
}

alert_strategy {
auto_close = local.cloud_sql_auto_close
}
notification_channels = local.cloud_sql_notification_channels
}
46 changes: 44 additions & 2 deletions examples/main.tf
Original file line number Diff line number Diff line change
@@ -1,9 +1,51 @@
/*
# A simple example on how to use this module
*/

locals {
# Enable all Cdoud SQL monitorings on selected instances, eg.
cloud_sql = {
instances = {
(google_sql_database_instance.master.name) = {}
(google_sql_database_instance.stage.name) = {}
}
}

# Use custom Cloud SQL cpu monitoring on google_sql_database_instance.master.name
# Use all default Cloud SQL monitoring on google_sql_database_instance.stage.name
# cloud_sql = {
# instances = {
# (google_sql_database_instance.master.name) = {
# cpu_utilization = [{
# severity = "ALERT"
# threshold = 0.90
# }]
# }
# (google_sql_database_instance.stage.name) = {}
# }
# }

# Disable Cloud SQL monitoring
# cloud_sql = {
# instances = {}
# }

# Enable default Cloud SQL monitoring on instance google_sql_database_instance.master.name
# Disable cpu utilization monitoring on instance google_sql_database_instance.stage.name
# cloud_sql = {
# instances = {
# (google_sql_database_instance.master.stage) = { cpu_utilization = [] }
# (google_sql_database_instance.master.prod) = {}
# }
# }

}

module "example" {
source = "github.com/sparkfabrik/terraform-module-template"
source = "github.com/sparkfabrik/terraform-google-services-monitoring"
version = ">= 0.1.0"

name = var.name
notification_channels = var.notification_channels
project = var.project
cloud_sql = local.cloud_sql
}
7 changes: 6 additions & 1 deletion examples/test.tfvars
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
name = "SimpleExample"
project = "Simple project"

notification_channels = [
"cloud_support_email",
"slack-channel"
]
12 changes: 9 additions & 3 deletions examples/variables.tf
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
variable "name" {
type = string
description = "Describe what this variable is used for."

variable "project" {
type = string
default = ""
}

variable "notification_channels" {
type = list(string)
default = []
}
4 changes: 0 additions & 4 deletions main.tf
Original file line number Diff line number Diff line change
@@ -1,4 +0,0 @@
resource "google_storage_bucket" "example" {
name = var.name
location = "EU"
}
12 changes: 9 additions & 3 deletions outputs.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
output "example" {
value = google_storage_bucket.example.name
description = "The name of the resource."
output "cloud_sql_disk_utilization" {
value = { for k, v in google_monitoring_alert_policy.cloud_sql_disk_utilization : k => v.name }
}

output "cloud_sql_memory_utilization" {
value = { for k, v in google_monitoring_alert_policy.cloud_sql_memory_utilization : k => v.name }
}

output "cloud_sql_cpu_utilization" {
value = { for k, v in google_monitoring_alert_policy.cloud_sql_cpu_utilization : k => v.name }
}
74 changes: 71 additions & 3 deletions variables.tf
Original file line number Diff line number Diff line change
@@ -1,4 +1,72 @@
variable "name" {
type = string
description = "Describe what this variable is used for."
variable "project" {
type = string
default = null
}

variable "notification_channels" {
type = list(string)
default = []
}

variable "auto_close" {
type = string
default = "86400s" # 24h
}

variable "cloud_sql" {
type = object({
project = optional(string, null)
auto_close = optional(string, null)
notification_channels = optional(list(string), [])
instances = optional(map(object({
cpu_utilization = optional(list(object({
severity = optional(string, "CRITICAL"),
threshold = optional(number, 0.90)
alignment_period = optional(string, "120s")
duration = optional(string, "300s")
})), [
{
severity = "WARNING",
threshold = 0.85,
duration = "1200s",
},
{
severity = "CRITICAL",
threshold = 1,
duration = "300s",
alignment_period = "60s",
}
])
memory_utilization = optional(list(object({
severity = optional(string, "CRITICAL"),
threshold = optional(number, 0.90)
alignment_period = optional(string, "300s")
duration = optional(string, "300s")
})), [
{
severity = "WARNING",
threshold = 0.80,
},
{
severity = "CRITICAL",
threshold = 0.90,
}
])
disk_utilization = optional(list(object({
severity = optional(string, "CRITICAL"),
threshold = optional(number, 0.90)
alignment_period = optional(string, "300s")
duration = optional(string, "600s")
})), [
{
severity = "WARNING",
threshold = 0.85,
},
{
severity = "CRITICAL",
threshold = 0.95,
}
])
})), {})
})
}
Loading

0 comments on commit cd804b2

Please sign in to comment.