2018-07-30

Datadog win32_event_log (v6 日本語環境)

Datadog Windows

以前日本語環境のWindows上で、イベントログ送信がうまく動作しなかったので現状(v6)ではどうなっているのかを確認します。また、 Logs での連携も追加されているので併せて確認します。

DatadogAgent を Windows10(日本語環境) へインストール - vague memory
- 以前確認した Version は 5.17

~~結果 Agent Version 6 でのマルチロケーション対応はされており、そのまま利用できました。~~

2019/01/09 追記

Win32eventlog について、検証に誤りがありました。 再度確認した所、Agent Version 5 同様、日本語対応は為されていませんでした。システムアカウントの言語設定が日本語だと動作しません。 下記確認はログインユーザの言語設定のみ日本語にした状態での実施でした。

環境
サービスチェック
Logs
- Status Remap

環境

確認した環境は以下です。

Win 2012

Agent Version: 6.3.3
Go Version: 1.10.3 
Platform: Windows Server 2012 R2 Standard
Platform Version: 6.3 Build 9600

Win 2018

Agent Version: 6.3.3
Go Version: 1.10.3 
Platform: Windows Server 2016 Datacenter
Platform Version: 10.0 Build 14393

日本語化

イベントビューアーは日本語で表示されている状態です。

f:id:htnosm:20180730032744p:plain

サービスチェック

Win 32 event log

Datadog Agent Manager にてインテグレーション設定を追加します。

[Checks] -> [Manage Checks] -> [Add a Check]
- [win32_event_log]

設定ファイルの内容は以下のようにしています。

win32_event_log.d/conf.yaml

init_config:

instances:

  - log_file:
      - System
    type:
      - Error
    tags:
      - system

イベントビューアー上で "エラー" を発生させると、Datadog Events 上への通知を確認できます。

f:id:htnosm:20180730032741p:plain

Logs

Logs での取得も確認します。

Windows Event Log

設定ファイル(win32_event_log.d/conf.yaml)に logs ディレクティブを追加します。

logs:
  - type: windows_event
    channel_path: System
    source: System
    service: eventlog
    sourcecategory: windowsevent

channel_path へは LogName を指定します。 LogName は PowerShell コマンドで参照できます。

Get-WinEvent -ListLog *

Windows Event Viewer

f:id:htnosm:20180730032737p:plain

Datadog Logs

f:id:htnosm:20180730032733p:plain

Status Remap

ここまでは特に躓かずに設定できたのですが、少々違和感が残ります。

イベントビューアー上、"エラー"や"警告"であるにも関わらず、Datadog Logs での Status が全て "INFO" になってしまいます。

他に良い方法があるかもしれませんが、下記の様に設定追加を行うことで Status を認識させる事ができました。(直接Remapすれば良さそうですが、効かなかったので一段噛ませています)

facet と呼ばれる属性値を作成します。 Event.System.Level にイベントビューアー上のレベルに対応するエラーレベルがあります。

[Logs] -> [Explorer]
- 対象のログ詳細上の [ATTRIBUTES]
  - Create facet for @Event.System.Level

f:id:htnosm:20180730032730p:plain

facet 作成後にログを受信すると左ペインにフィルタ用のチェックボックスが出力されます。

f:id:htnosm:20180730032831p:plain

Pipelines

Windows Eventlog 用の Pipeline を作成します。

[Logs] -> [Pipelines] -> [New Pipeline]
- Filter: service:eventlog
- Name: windowsevent

f:id:htnosm:20180730032828p:plain

Category Processor

上記 Pipeline にカテゴリプロセッサを作成します。作成済みの facet "@Event.System.Level" に対し、それぞれ以下の値を置き換えます。

Name	MATCHING QUERY	Event Viewer EntryType
Critical	@Event.System.Level:2	Error(エラー)
Warning	@Event.System.Level:3	Warning(警告)
Info	@Event.System.Level:4	Information(情報)

f:id:htnosm:20180730032824p:plain

Status Remapper

作成したカテゴリをステータスに割り当てます。

f:id:htnosm:20180730032821p:plain

Status Remap 結果

正常に認識されると Datadog Logs 上の Status での絞り込みが可能になります。

f:id:htnosm:20180730032818p:plain

2018-07-21

Datadog Logs アーカイブ機能の追加等

Datadog

Datadog Logs のアップデートがありました。大きく以下3つの機能が追加されています。

Introducing Logging without Limits
- Limitless Logs
  - ログ取り込みとインデキシング(フィルタリング)の分離
- Archive Logs
  - ストレージ転送
- Live Tail
  - リアルタイムログストリーム

既に公式ドキュメントも公開されていますが、ポイントを整理します。

Limitless Logs
Archive Logs
Live Tail

f:id:htnosm:20180720143657p:plain

Limitless Logs

Logging without Limits

送信元でのフィルタを行わず、 Datadog 管理画面上でフィルタが可能です。フィルタにより除外されたログは、Datadog Logs 課金対象からも除外されます。後述 Archive の対象となるので、収集・保存のみ行える事になります。(保存先の料金は掛かります)

Live Tail には含まれるので、Live Tail で参照しつつ、除外条件を作成していく流れになると思います。

[Logs] -> [Pipelines] -> [INDEXES] -> [Add an Exclusion Filter] より除外フィルタを作成します。

f:id:htnosm:20180720141750p:plain

クエリによる除外フィルタ、サンプリングレートの指定と有効/無効切替が行えます。

f:id:htnosm:20180720141746p:plain

尚、送信元(Datadog Agent)側でフィルタしたい(送信対象としない)場合は、log_processing_rules で exclude_at_match や include_at_match を使用します。

Archive Logs

Archives on AWS S3

AWS S3 へのアーカイブ機能です。 AWS インテグレーションで使用している AWS アカウントとは関連無く、任意の S3 バケットへのログ転送が行えます。

転送されるログは、PIPELINES(Processor)経由後のログです。INDEXES(Exclusion Filter)有無は関係ありません。

公式ブログ上に (with support for other endpoints to come) とあるので、今後S3以外のストレージが追加されそうですが、現状は AWS S3 のみのサポートです。

設定には Datadog の管理者権限(Admin User) が必要です。管理者以外は設定の参照は可能ですが、変更はできません。

[Logs] -> [Pipelines] -> [ARCHIVES] で送信先のS3バケットを指定します。

Configure S3 Bucket

送信先S3バケットに公式記載のバケットポリシーを設定します。

https://docs.datadoghq.com/logs/s3/#create-and-configure-an-s3-bucket

Define Archive

S3 Bucket と Path を設定します。

f:id:htnosm:20180720141741p:plain

設定後から、取り込まれたログが Pipeline 経由後、S3バケットに転送されます。 15分程待つとS3バケットへの転送されていることが確認できます。

Changes have been made to this archive. It can take a few minutes before the next upload attempt.

フォーマット

/my/s3/prefix/dt=20180515/hour=14/archive_143201.1234.7dq1a9mnSya3bFotoErfxl.json.gz

gzip された JSON
日付、時間の HIVE フォーマット

以下 Nginx access.log のJSON出力例です。 attributes はパース結果が反映されます。

{
  "_id": "AWS1oRxxxxxxxxx1bnTi",
  "date": "2018-07-ddThh:mm:ss.0000",
  "service": "nginx",
  "host": "i-xxxxxxxxxxxxxxxxx",
  "attributes": {
    "http": {
      "status_code": 200,
      "referer": "-",
      "useragent": "Datadog Agent/0.0.0",
      "method": "GET",
      "url": "/",
      "version": "1.1",
      "url_details": {
        "path": "/"
      },
・・・・
  },
  "source": "nginx",
  "message": "127.0.0.1 - - [dd/Jul/2018:hh:mm:ss.0000] \"GET / HTTP/1.1\" 200 396 \"-\" \"Datadog Agent/0.0.0\"",
  "status": "ok"
}

画面上に Main Archive とあるので、複数指定が可能なのかと思いましたが、現状で設定できるのは単一送信先のようです。(契約プラン等で変わるのかもしれません)

また、faset(tag)によるパスやファイルの振り分けは行われません。S3上へ転送したログを検索する際は Athena 等を利用する必要がありそうです。

Live Tail

Live Tail

(ほぼ)リアルタイムにログイベントを参照できます。

PIPELINES(Processor)経由後、INDEXES(Exclusion Filter)前
ストリーム表示の一時停止は可能
過去へ遡る事はできない

リアルタイム版 Explorer のような感じです。

2018-05-14

Slack ユーザメンションの仕様変更 (Datadog例)

Slack Datadog

結構前からアナウンスはされているのですが、すぐ忘れるので残しておきます。
Datadog だけでは無いですが、Slack側仕様変更により、Slack上の個別ユーザへメンションする際の指定の仕方が変わります。

A lingering farewell to the username | Slack

The undocumented approach to mentioning users via the API — <@username> — will no longer function after September 12, 2018. Please reference with the user ID format (<@U123>) instead

既に始まっていて、使えなくなっている状況もあるようです。

Slack APIでユーザー宛のメンションができなくなったので対策した - Qiita

2023/03/06 グループIDの取得方法と公式リンクを追記しました。

Datadog Monitor → Slack

Slack Integration を設定することでSlackへの通知が可能です。 @slack-〜でSlackチャンネルへの通知設定を入れている状態で以下のようにするとユーザメンションになります。

旧

不等号(<>) で囲って @ユーザ名

<@slackbot>

新

不等号(<>) で囲って @ユーザID

<@USLACKBOT>

username_like_string とやらも使えるようですが、試した所特に Alias としては機能していないようでした。Slackとしては非推奨のようなので利用しない方が良いかと思います。

いずれも Slack 上では @slackbot として見える

<@USLACKBOT|slackbot> 
<@USLACKBOT|hoge>

グループ

`<!subteam^ID>`

アナウンスメンション

@here や @channel での通知設定も可能です。

<!here>
<!channel>

チャンネルIDのリンク

#general のようなチャンネルのリンクを通知に載せたい場合は #にチャンネルIDです。
存在しないIDを指定すると #unknown-channel と変換されました。

<#channelid>

Slack の ID

APIで取得するか、後述する方法で取得できます。

ユーザID (member ID)

クライアントからは以下のように取得が可能です。

[View profile] -> [Copy member ID]

チャンネルID

対象のチャンネルを右クリック -> [Copy URL]

# XXXXXXXXX = Channel ID
https://*****.slack.com/messages/XXXXXXXXX

グループID

[Peaple & user groups] -> [User groups] -> 対象のグループを右クリック -> [Copy]

別途お知らせはあると思いますが、 <@username> 形式は 2018/09/12 で廃止されるとのことなのでお気をつけください。

公式ドキュメント: Formatting text for app surfaces | Slack

2018-05-04

Terraform Datadog Provider を使用した Monitor のテンプレート化

Datadog

Datadog Monitor の定義を Terraform で管理できます。

Provider: Datadog

が、Datadog側がJSONで定義されており少々書き難いのと、 Monitor毎に同じ記載を繰り返す部分(通知本文や通知先)をテンプレート化できないものかと思い、考えてみた結果をメモ。

Terraform の Template Provider で実現します。バージョンによっては動作しないですし、もっと良い方法・書き方が有りそうではあります。

.
├── datadog_key.auto.tfvars      # APIKey定義
├── datadog_monitor.auto.tfvars  # 通知先定義
├── datadog_monitor.tf           # provider定義
├── datadog_monitor_template.tf  # template定義
├── ec2.tf
├── templates                    # テンプレートファイル群
│   ├── message.tmpl             # 通知本文用
│   └── notify.tmpl              # 通知先用
├── terraform.tfstate
└── terraform.tfstate.backup

以下、上から順にファイル内容の説明です。

datadog_key.auto.tfvars

Datadog の API Key、 Application Key 値を設定します。 git 管理する場合等は .gitignore に入れる候補になるかと思います。

datadog_api_key=""
datadog_app_key=""

datadog_monitor.auto.tfvars

通知先のリストを設定します。 @slack-〜 が通知先です。(例ではSlackのみですがメールアドレス等や他インテグレーションでも良いです)
アラートレベルにより通知先が変更できるようにします。

# all
notify_all = [
  "@slack-alert0",
  "@slack-alert-all",
]
# only alert
notify_is_alert = [
  "@slack-alert1",
  "@slack-alert-only",
]
notify_is_alert_recovery = []
・・・

datadog_monitor.tf

Datadog Provider を定義します。公式Doc通りです。

Provider: Datadog - Terraform by HashiCorp

# Variables
variable "datadog_api_key" {}
variable "datadog_app_key" {}

# Configure the Datadog provider
provider "datadog" {
  version = "~> 1.0"
  api_key = "${var.datadog_api_key}"
  app_key = "${var.datadog_app_key}"
}

datadog_monitor_template.tf

今回の肝になるテンプレート定義です。定義した通知先リスト変数を並べて、テンプレートに渡します。

# 通知先リスト変数定義
variable "notify_all" { type = "list" }
variable "notify_is_alert" { type = "list" }
variable "notify_is_alert_recovery" { type = "list" }
・・・
# 通知先リストを文字列置換
locals {
  notify_all_join = "${ length(var.notify_all) == 0 ? "" : join(" ", var.notify_all) }"
  notify_all = " ${local.notify_all_join} "
  # is
  notify_is_alert_join = "${ length(var.notify_is_alert) == 0 ? "" : join(" ", var.notify_is_alert) }"
  notify_is_alert = " {{#is_alert}} ${local.notify_is_alert_join} {{/is_alert}} "
  notify_is_alert_recovery_join = "${ length(var.notify_is_alert) == 0 ? "" : join(" ", var.notify_is_alert) }"
  notify_is_alert_recovery = " {{#is_alert_recovery}} ${local.notify_is_alert_recovery_join} {{/is_alert_recovery}} "
・・・
}
# テンプレートファイルを定義
## 変数渡し無し
data "template_file" "message" {
  template = "${file("./templates/message.tmpl")}"
}
## 変数渡し有り
data "template_file" "notify" {
  template = "${file("./templates/notify.tmpl")}"
  vars {
    notify_all = "${ local.notify_all_join == "" ? "" : local.notify_all }"
    # is
    notify_is_alert = "${ local.notify_is_alert_join == "" ? "" : local.notify_is_alert }"
    notify_is_alert_recovery = "${ local.notify_is_alert_recovery_join == "" ? "" : local.notify_is_alert_recovery }"
・・・
  }
}

templates/

例では2つしか置いてませんが、定義を増やす事でテンプレートは増やせます。

message.tmpl

通知本文用のテンプレート、変数引渡し無しverの例になります。

Metric Value: {{value}} {{comparator}} threshold: {{#is_warning}}{{warn_threshold}}{{/is_warning}}{{#is_warning_recovery}}{{warn_threshold}}{{/is_warning_recovery}}{{#is_alert}}{{threshold}}{{/is_alert}}{{#is_alert_recovery}}{{threshold}}{{/is_alert_recovery}}

- Host: {{host.name}}

notify.tmpl

通知先用のテンプレート例です。変数引渡しを行い、アラートレベルにより通知先リストを設定させます。

More information: [Ops Guide](http://example.com)

Notify:${notify_all}${notify_is_alert}${notify_is_alert_recovery}${notify_is_warning}${notify_is_warning_recovery}${notify_is_recovery}${notify_is_no_data}${notify_is_not_alert}${notify_is_not_alert_recovery}${notify_is_not_warning}${notify_is_not_warning_recovery}${notify_is_not_recovery}${notify_is_not_no_data}

作成例

ec2.tf に datadog_monitor の resource 定義をします。
message、escalation_message にテンプレートを設定(data.template_file.〜.rendered 部分)します。

例では2つの resource を定義し、message に個別の記述と、テンプレート記述設定をしています。

resource "datadog_monitor" "ec2_cpuutilization" {
  type = "query alert"
  name = "[TEST] EC2 CPU Utilization"
  query = "max(last_5m):max:aws.ec2.cpuutilization{name:test-instance-al2-0} by {host,name,region} > 99"
  message = "# [TEST] EC2 CPU Utilization\n${data.template_file.message.rendered}${data.template_file.notify.rendered}"
  escalation_message = "**Renotify**\n# [TEST] EC2 CPU Utilization\n${data.template_file.message.rendered}${data.template_file.notify.rendered}"
・・・
}

resource "datadog_monitor" "ec2_status_check_failed" {
  type = "query alert"
  name = "[TEST] EC2 StatusCheckFailed"
  query = "min(last_15m):max:aws.ec2.status_check_failed{name:test-instance-al2-0} by {host,name,region} > 0"
  message = "# [TEST] EC2 StatusCheckFailed\n${data.template_file.message.rendered}${data.template_file.notify.rendered}"
  escalation_message = "**Renotify**\n# [hoge] EC2 StatusCheckFailed\n${data.template_file.message.rendered}${data.template_file.notify.rendered}"
・・・
}

作成結果

f:id:htnosm:20180503180952p:plain f:id:htnosm:20180503180953p:plain

通知本文の一部を共通化し、一括更新を行うことができるようになりました。
懸念は Terraform の更新頻度が高いため、仕様変更により動作しなくなる可能性でしょうか。
Datadog側標準機能でテンプレート化、通知先グループの設定などを実装して欲しい所です。

2018-05-01

Nagios ntpチェック

Nagios

Nagios を利用して時刻同期の監視を行う場合にプラグインが複数有り、ヘルプのみだと腑に落ちなかったので簡単にまとめます。

f:id:htnosm:20180430175450j:plain

公式プラグイン集をベースに確認します。

Nagios Plugins | The home of the official Nagios® Plugins

check_ntp_peer = ntpサーバの正常性チェック
check_ntp_time = ntpプロトコルを利用して時刻同期のチェック
となります。
check_time というプラグインもありますが、こちらは timeプロトコルを利用して時刻同期のチェックを行うようです。 (timeサービスでの時刻同期を行っている環境に出会ったことが無いので今回は割愛)

check_ntp_peer

nagios-plugins/check_ntp_peer.c at master · nagios-plugins/nagios-plugins
- NTPサーバの正常性をチェック
- localhost(監視対象ホスト)とNTPサーバ間の時刻はチェックしない

/usr/lib64/nagios/plugins/check_ntp_peer -H localhost -w 1 -c 2

-H には監視対象のntpdが動作しているホストを指定します。上記の場合 localhost 上で動作している ntpd の同期状態をチェックします。
ntpd が動作していること が前提です。ntpdが同期対象としているNTPサーバとの比較になります。

独自NTPサーバを参照している等で不正な値(実際の時刻とはずれている)を返してきている場合でも、参照しているNTPサーバとの同期が取れている状態であれば正常と判断されます。

chrony 未サポート

chrony は未サポートです。 check_ntp_peer は mode 6 で実装されており、 chronyd は mode 6 をサポートしません。
現在公式プラグインでは chronyd の正常性チェックプラグインは無いようです。

          +-------------------+-------------------+------------------+
          |  Association Mode | Assoc. Mode Value | Packet Mode Value|
          +-------------------+-------------------+------------------+
          | Symmetric Active  |         1         | 1 or 2           |
          | Symmetric Passive |         2         | 1                |
          | Client            |         3         | 4                |
          | Server            |         4         | 3                |
          | Broadcast Server  |         5         | 5                |
          | Broadcast Client  |         6         | N/A              |
          +-------------------+-------------------+------------------+

check_ntp_time (check_ntp)

nagios-plugins/check_ntp_time.c at master · nagios-plugins/nagios-plugins
- localhost(監視対象ホスト)とNTPサーバ間の時刻差をチェック

/usr/lib64/nagios/plugins/check_ntp_time -H ntp.nict.jp -w 1 -c 2

-H には監視対象ホストと比較するNTPサーバを指定します。 localhost を指定した場合は自分自身のNTPサーバとの比較となるため、殆ど意味を成しません。

監視対象ホスト上で時刻同期サービス(ntpd、chronyd、etc...)の起動有無は問いません。

参考URL

2018-04-16

Datadog AWSインテグレーション用 CloudFormationテンプレート

Datadog AWS

ありそうでなかったので作成。(見つけられないだけでしょうか)

github.com

雑記

Datadog AWS Integration 設定(IAM Role) - Qiita の焼き直しです。権限部分をコピペして作れるようにしたかったのと、更新箇所把握しておきたかったのでリポジトリ化。ほぼ自分用です。

AWS、Datadog双方の都合で付与権限は変わるようなので、それなりの頻度で権限部分の更新が入ります。公式でドキュメント更新だけでなく、テンプレートなりポリシードキュメントなり配布するようになると良いと思います。

CloudFormation YAMLの関数名は短縮形の構文が使用できますが、サードパーティ系のツールが非対応の物があるため敢えて使っていません。短縮形だと想定した動作にならず小一時間悩みました。

2018-04-14

Datadog で AWS SNS を受け取る (RDS/ElastiCacheイベント)

Datadog AWS

AWS の SNS トピックを Datadog で直接サブスクライブできます。ドキュメント通りなんですが、どのような感じで通知されるのかを残しておきます。

公式 AWS SNS

一応受信用 Email を払い出して受信することもできます。

過去記事 AWS RDSイベント通知を受け取る - vague memory

設定
- SNS
Event設定例
- RDS Events
- ElastiCache Events
受信例
- Event Monitor から Slack への通知例

設定

前提
- Datadog上でAWSインテグレーションが設定済みであること
  - https://app.datadoghq.com/account/settings#integrations/amazon_web_services

SNS

AWS SNS で Topic と Subscription を作成します。
Endpoint には Datadog の Webhook URL を指定します。

https://app.datadoghq.com/intake/webhook/sns?api_key=<API KEY>

API Key は Datadog の [Integrations]->[APIs] で取得
- https://app.datadoghq.com/account/settings#api

f:id:htnosm:20180413151001p:plain

Event設定例

RDS/ElastiCache の Event を飛ばしてみます。 (Datadog AWSインテグレーションで既に Events に通知されていますが、明示的に SNS -> Datadog への通知を行います)

f:id:htnosm:20180413151002p:plain

RDS Events

[Event subscriptions] に Topic を設定します。

f:id:htnosm:20180413151003p:plain

ElastiCache Events

RDSと異なり全体の通知設定は無く、各 Cluster 個別に Topic を設定します。

f:id:htnosm:20180413151004p:plain

受信例

f:id:htnosm:20180413152131p:plain

Event Monitor から Slack への通知例

Monitor

確認用に変数埋め込んでいますが、無い方が見易いです。 f:id:htnosm:20180413151006p:plain

f:id:htnosm:20180413151007p:plain

ElastiCache

f:id:htnosm:20180413151008p:plain

環境