Kube-Eventer的開掛操作
本文轉載自微信公眾號「運維開發故事」,作者沒有文案的夏老師。轉載本文請聯系運維開發故事公眾號。
離線事件告警
kube-eventer是由阿里開源的k8s離線事件收集器,開源地址
https://github.com/AliyunContainerService/kube-eventer/blob/master/docs/en/webhook-sink.md
在Kubernetes中,事件分為兩種,一種是Warning事件,表示產生這個事件的狀態轉換是在非預期的狀態之間產生的;另外一種是Normal事件,表示期望到達的狀態,和目前達到的狀態是一致的。
我們以NPD的event來講解。事件影響節點的臨時性問題,但是它是對于系統診斷是有意義的。NPD就是利用kubernetes的上報機制,通過檢測系統的日志(例如centos中journal),把錯誤的信息上報到kuberntes的node上。這些日志(例如內核日志)中噪音信息太多,NPD會提取其中有價值的信息,可以將這些信息生成離線事件。這樣我就可以得到node上的時間,及時進行處理。
一個標準的Kubernetes事件有如下幾個重要的屬性,通過這些屬性可以更好地診斷和告警問題。Namespace:產生事件的對象所在的命名空間。
Kind:綁定事件的對象的類型,例如:Node、Pod、Namespace、Componenet等等。
Timestamp:事件產生的時間等等。
Reason:產生這個事件的原因。Message: 事件的具體描述。
目前的sinks支持大致如下:
Sink Name | Description |
---|---|
dingtalk | sink to dingtalk bot |
sls | sink to alibaba cloud sls service |
elasticsearch | sink to elasticsearch |
honeycomb | sink to honeycomb |
influxdb | sink to influxdb |
kafka | sink to kafka |
mysql | sink to mysql database |
sink to wechat |
今天主要帶來webhook的開掛技巧。首先看支持的參數:
- level - Level of event (optional. default: Warning. Options: Warning and Normal)
- namespaces - Namespaces to filter (optional. default: all namespaces,use commas to separate multi namespaces, namespace filter doesn't support regexp)
- kinds - Kinds to filter (optional. default: all kinds,use commas to separate multi kinds. Options: Node,Pod and so on.)
- reason - Reason to filter (optional. default: empty, Regexp pattern support). You can use multi reason fields in query.
- method - Method to send request (optional. default: GET)
- header - Header in request (optional. default: empty). You can use multi header field in query.
- custom_body_configmap - The configmap name of request body template. You can use Template to customize request body. (optional.)
- custom_body_configmap_namespace - The configmap namespace of request body template.
如果每個項目namespace與負責人是一一對應的,就可以根據configmap與sink關聯起來。變更上線部署是最容易出現事件的時候,通過事件是可以快速的發現上線的鏡像tag錯誤,鏡像配置錯誤等問題。
首先configmap,通過custom_body_configmap的值來選擇不同的配置文件。可以簡單修飾一下,使其變得更加清晰。
添加加Cluster:name可以知道是哪個集群的event。
添加加"mentioned_list":["wangqin","@all"]可以@對應的負責人。
- ---
- apiVersion: v1
- data:
- content: >-
- {"msgtype": "text","text": {"content": "Cluster:name\nEventType:{{ .Type }}\nEventNamespace:{{ .InvolvedObject.Namespace }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventObject:{{ .InvolvedObject.Name }}\nEventReason:{{ .Reason }}\nEventTime:{{ .LastTimestamp }}\nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
- kind: ConfigMap
- metadata:
- name: custom-webhook-body
- namespace: nameapce
命令部分的技巧
sink是一個數組,可以加很多條。
主要說明用webhook向企業微信的的通知。注意reason是可以支持正則表達式的。通過configmap就一起完成了k8s機器的事件告警。
- command:
- - "/kube-eventer"
- - "--source=kubernetes:https://kubernetes.default"
- ## .e.g,dingtalk sink demo
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
案列:
創建一個企業微信群的機器人。比如:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx。
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- labels:
- name: kube-eventer
- name: kube-eventer
- namespace: namespace
- spec:
- replicas: 1
- selector:
- matchLabels:
- app: kube-eventer
- template:
- metadata:
- labels:
- app: kube-eventer
- annotations:
- scheduler.alpha.kubernetes.io/critical-pod: ''
- spec:
- dnsPolicy: ClusterFirstWithHostNet
- serviceAccount: kube-eventer
- containers:
- - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun
- name: kube-eventer
- command:
- - "/kube-eventer"
- - "--source=kubernetes:https://kubernetes.default"
- ## .e.g,dingtalk sink demo
- - --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
- #- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
- #- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
- env:
- # If TZ is assigned, set the TZ value as the time zone
- - name: TZ
- value: "Asia/Shanghai"
- volumeMounts:
- - name: localtime
- mountPath: /etc/localtime
- readOnly: true
- - name: zoneinfo
- mountPath: /usr/share/zoneinfo
- readOnly: true
- resources:
- requests:
- cpu: 200m
- memory: 100Mi
- limits:
- cpu: 500m
- memory: 250Mi
- volumes:
- - name: localtime
- hostPath:
- path: /etc/localtime
- - name: zoneinfo
- hostPath:
- path: /usr/share/zoneinfo
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- name: kube-eventer
- rules:
- - apiGroups:
- - ""
- resources:
- - events
- - configmaps
- verbs:
- - get
- - list
- - watch
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- name: kube-eventer
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: kube-eventer
- subjects:
- - kind: ServiceAccount
- name: kube-eventer
- namespace: namespace
- ---
- apiVersion: v1
- kind: ServiceAccount
- metadata:
- name: kube-eventer
- namespace: namespace
- ---
- apiVersion: v1
- data:
- content: >-
- {"msgtype": "text","text": {"content": "Cluster:name\nEventType:{{ .Type }}\nEventNamespace:{{ .InvolvedObject.Namespace }}\nEventKind:{{ .InvolvedObject.Kind }}\nEventObject:{{ .InvolvedObject.Name }}\nEventReason:{{ .Reason }}\nEventTime:{{ .LastTimestamp }}\nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
- kind: ConfigMap
- metadata:
- name: custom-webhook-body
- namespace: nameapce
這樣就可以完成向誰告警,誰進行處理的簡單分配。有了事件告警,可以及時發現服務問題與集群問題并進行修復。