vague memory

うろ覚えを無くしていこうともがき苦しむ人の備忘録

AWS Lambda で CloudWatch Logs のログ本文をSlack通知(1)

CloudWatch Logsを使用したログ監視です。

f:id:htnosm:20160505000213p:plain

CloudWatch Logs のメトリックフィルタから Alarm を作成し、SNS メッセージをSlackへ投稿する Blueprint が提供されていますが、 通知されるメッセージだけでは Alarm が発生した事がわかるのみなので、ログ本文を通知したいと思いました。

f:id:htnosm:20160504231803p:plain

通知先(Slack)以外はAWSで完結可能な方法という所で、Lambda(Python)を使用した2パターンを試します。

f:id:htnosm:20160504231804p:plain

  • 図内黒矢印は Blueprint を使用した場合

メトリクスに過去発生頻度を残したいというのであれば(1)、 通知先固定でエラー発生がわかれば良いレベルなら(2)が使用できるのではないでしょうか。

利用サービスと設定箇所が少ないので、(2)の方がシンプルです。
更に、通知先を複数にしたいならLambdaからSNSへ飛ばしてSubscriptionさせれば実現できると思います。(SlackのEmail Subscriptionはスタンダードプラン以上が必要です。)

(1) SNSを経由してSlack通知

  • CloudWatch [Logs] -> Alarm -> SNS -> Lambda -> Slack
  • AWS提供の Slack Integration Blueprints を元にログ本文を追加

blueprint cloudwatch-alarm-to-slack-python 変更箇所

Blueprints冒頭に記載されている Slack Integration や KMS の設定については割愛します。

ライブラリをインポート

ログイベント取得時の日時指定のため、datetime、calendar ライブラリを追加します。

 import boto3
 import json
 import logging
+import datetime
+import calendar

取得条件

今回の取得条件は以下のようにしています。

  • FILTER_PATTERN
  • OUTPUT_LIMIT
    • 取得行(limit)は最大5行
  • TIME_FROM_MIN
    • Logs取得の開始時刻(timefrom)は CloudWatch Alarm の Period(今回は5分)に合わせる
+FILTER_PATTERN='{ $.log_level = "ALERT" }'
+OUTPUT_LIMIT=5
+TIME_FROM_MIN=5

ENCRYPTED_HOOK_URL = '<kmsEncyptedHookUrl>'  # Enter the base-64 encoded, encrypted key (CiphertextBlob)
SLACK_CHANNEL = '<slackChannel>'  # Enter the Slack channel to send a message to

filter_log_events

CloudWatch Logs からイベントを取得します。
ロググループ(logGroupName) はメトリクス名(MetricsName)と一致している前提にしています。

     reason = message['NewStateReason']
+    metric = message['Trigger']['MetricName']
+
+    timeto = datetime.datetime.strptime(message['StateChangeTime'][:19] ,'%Y-%m-%dT%H:%M:%S') + datetime.timedelta(minutes=1)
+    u_to = calendar.timegm(timeto.utctimetuple()) * 1000
+    timefrom = timeto - datetime.timedelta(minutes=TIME_FROM_MIN)
+    u_from = calendar.timegm(timefrom.utctimetuple()) * 1000
+    client = boto3.client('logs')
+    streams = client.describe_log_streams(logGroupName = metric, orderBy = 'LastEventTime')
+    stream = str(streams['logStreams'][0]['logStreamName'])
+    logger.info("logGroupName = " + str(metric) + ", logStreamNames = [" + str(stream) + "], filterPattern = " + str(FILTER_PATTERN) + ", startTime = " + str(u_from) + ", endTime = " + str(u_to) + ", limit = " + str(OUTPUT_LIMIT))
+    response = client.filter_log_events(logGroupName = metric, logStreamNames = [stream], filterPattern = FILTER_PATTERN, startTime = u_from, endTime = u_to, limit = OUTPUT_LIMIT)
+    log_events = response['events']
+    log_message = '[CloudWatch Log Alarm] ' + str(metric) + ' / ' + str(stream) + '\nLogMessages:'
+    for e in log_events:
+        date = datetime.datetime.fromtimestamp(int(str(e['timestamp'])[:10])) + datetime.timedelta(hours=9)
+        log_message = log_message + '\n{"Timestamp":"' + str(date) + '","Message":' + e['message'] + '}'

Slackメッセージにeventを追加

取得したイベントを通知メッセージに追加します。

     slack_message = {
         'channel': SLACK_CHANNEL,
-        'text': "%s state is now %s: %s" % (alarm_name, new_state, reason)
+        'text': "%s state is now %s: %s \n %s" % (alarm_name, new_state, reason, log_message)
     }

通知例

f:id:htnosm:20160504231805p:plain


次回 パターン(2)に続きます。 尚、Lambda の Code 全文は以下になります。

cloudwatch-alarm-to-slack-addreturn.py

'''
Follow these steps to configure the webhook in Slack:

  1. Navigate to https://<your-team-domain>.slack.com/services/new

  2. Search for and select "Incoming WebHooks".

  3. Choose the default channel where messages will be sent and click "Add Incoming WebHooks Integration".

  4. Copy the webhook URL from the setup instructions and use it in the next section.


Follow these steps to encrypt your Slack hook URL for use in this function:

  1. Create a KMS key - http://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html.

  2. Encrypt the event collector token using the AWS CLI.
     $ aws kms encrypt --key-id alias/<KMS key name> --plaintext "<SLACK_HOOK_URL>"

     Note: You must exclude the protocol from the URL (e.g. "hooks.slack.com/services/abc123").

  3. Copy the base-64 encoded, encrypted key (CiphertextBlob) to the ENCRYPTED_HOOK_URL variable.

  4. Give your function's role permission for the kms:Decrypt action.
     Example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1443036478000",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "<your KMS key ARN>"
            ]
        }
    ]
}
'''
from __future__ import print_function

import boto3
import json
import logging
import datetime
import calendar

from base64 import b64decode
from urllib2 import Request, urlopen, URLError, HTTPError

FILTER_PATTERN='{ $.log_level = "ALERT" }'
OUTPUT_LIMIT=5
TIME_FROM_MIN=5

ENCRYPTED_HOOK_URL = ''  # Enter the base-64 encoded, encrypted key (CiphertextBlob)
SLACK_CHANNEL = '#'  # Enter the Slack channel to send a message to

HOOK_URL = "https://" + boto3.client('kms').decrypt(CiphertextBlob=b64decode(ENCRYPTED_HOOK_URL))['Plaintext']

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info("Event: " + str(event))
    message = json.loads(event['Records'][0]['Sns']['Message'])
    logger.info("Message: " + str(message))

    alarm_name = message['AlarmName']
    #old_state = message['OldStateValue']
    new_state = message['NewStateValue']
    reason = message['NewStateReason']
    metric = message['Trigger']['MetricName']

    timeto = datetime.datetime.strptime(message['StateChangeTime'][:19] ,'%Y-%m-%dT%H:%M:%S') + datetime.timedelta(minutes=1)
    u_to = calendar.timegm(timeto.utctimetuple()) * 1000
    timefrom = timeto - datetime.timedelta(minutes=TIME_FROM_MIN)
    u_from = calendar.timegm(timefrom.utctimetuple()) * 1000
    client = boto3.client('logs')
    streams = client.describe_log_streams(logGroupName = metric, orderBy = 'LastEventTime')
    stream = str(streams['logStreams'][0]['logStreamName'])
    logger.info("logGroupName = " + str(metric) + ", logStreamNames = [" + str(stream) + "], filterPattern = " + str(FILTER_PATTERN) + ", startTime = " + str(u_from) + ", endTime = " + str(u_to) + ", limit = " + str(OUTPUT_LIMIT))
    response = client.filter_log_events(logGroupName = metric, logStreamNames = [stream], filterPattern = FILTER_PATTERN, startTime = u_from, endTime = u_to, limit = OUTPUT_LIMIT)
    log_events = response['events']
    log_message = '[CloudWatch Log Alarm] ' + str(metric) + ' / ' + str(stream) + '\nLogMessages:'
    for e in log_events:
        date = datetime.datetime.fromtimestamp(int(str(e['timestamp'])[:10])) + datetime.timedelta(hours=9)
        log_message = log_message + '\n{"Timestamp":"' + str(date) + '","Message":' + e['message'] + '}'

    slack_message = {
        'channel': SLACK_CHANNEL,
        'text': "%s state is now %s: %s \n %s" % (alarm_name, new_state, reason, log_message)
    }

    req = Request(HOOK_URL, json.dumps(slack_message))
    try:
        response = urlopen(req)
        response.read()
        logger.info("Message posted to %s", slack_message['channel'])
    except HTTPError as e:
        logger.error("Request failed: %d %s", e.code, e.reason)
    except URLError as e:
        logger.error("Server connection failed: %s", e.reason)