Prometheus
- Prometheus
- SRE运维体系:https://zhuanlan.zhihu.com/p/303003637
-
基础:快速掌握部分
- Prometheus基础的综述
- Prometheus的数据模型、时间序列模型、PrmQL。
- 自定义监控告警规则及AlertManager
- 常用的Exporter的使用场景以及使用方法,Java/Golang实现自定义
-
高级用法
- "You can't fix what you can't see"。可视化监控Grafana
- Prometheus自动的发现那些需要监控的资源和服务(云平台、容量化)
- Prometheus高可用、扩展能力
- 通过Prometheus构建容器云监控系统,应用程序的弹性伸缩。
-
prometheus.yml
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"]
-
prometheus配置详解:https://www.jianshu.com/p/efa8b46f46c6
# 此片段指定的是prometheus的全局配置, 比如采集间隔,抓取超时时间等. global: # 抓取间隔 [ scrape_interval: <duration> | default = 1m ] # 抓取超时时间 [ scrape_timeout: <duration> | default = 10s ] # 评估规则间隔 [ evaluation_interval: <duration> | default = 1m ] # 外部一些标签设置 external_labels: [ <labelname>: <labelvalue> ... ] # File to which PromQL queries are logged. # Reloading the configuration will reopen the file. [ query_log_file: <string> ] # 此片段指定报警规则文件, prometheus根据这些规则信息,会推送报警信息到alertmanager中。 rule_files: [ - <filepath_glob> ... ] # 此片段指定抓取配置,prometheus的数据采集通过此片段配置。 scrape_configs: [ - <scrape_config> ... ] # 此片段指定报警配置, 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。 alerting: alert_relabel_configs: [ - <relabel_config> ... ] alertmanagers: [ - <alertmanager_config> ... ] # 指定后端的存储的写入api地址。 remote_write: [ - <remote_write> ... ] # 指定后端的存储的读取api地址。 remote_read: [ - <remote_read> ... ]
Prometheus+Grafana安装及配置
- Prometheus安装及配置: https://www.k8stech.net/post/prometheus-monitor-2/
讲解如何安装配置使用Grafana图形展示,并接入Email、Dingtalk、Wechat警报,Prometheus已经完美的支持Email、Slack、Dingtalk、Wechat警报。
-
系统/软件版本
System:Ubuntu 18.04 Prometheus 2.13.0 Node_Exporter 1.18.0 Alaermanager 1.18.0 Dingtalk-webhook 0.3.0 Grafana 6.4.0
-
二进制安装
PROM_PATH='/data/prometheus' mkdir -p ${PROM_PATH} mkdir -p ${PROM_PATH}/{data,conf,logs,bin} useradd prometheus cd /usr/local/src wget https://github.com/prometheus/prometheus/releases/download/v2.13.0/prometheus-2.13.0.linux-amd64.tar.gz tar -xvf prometheus-2.13.0.linux-amd64.tar.gz cd prometheus-2.13.0.linux-amd64/ cp prometheus promtool ${PROM_PATH}/bin/ cp prometheus.yml ${PROM_PATH}/config/ chown -R prometheus.prometheus /data/prometheus # Setting Variables cat >> /etc/profile <<EOF PATH=/data/prometheus/bin:$PATH:$HOME/bin EOF
-
创建Systemd Prometheus服务
cat >>/etc/systemd/system/prometheus.service <<EOF [Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/data/prometheus/bin/prometheus --config.file=/data/prometheus/conf/prometheus.yml --storage.tsdb.path=/data/prometheus/data --web.external-url=http://prom.k8stech.net --storage.tsdb.retention=90d Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl enable prometheus.service systemctl start prometheus.service systemctl status prometheus.service netstat -antup|grep 9090
-
Prometheus配置文件
# Alertmanager Rule 目录 与 文件 CONF_PATH='/data/prometheus/conf' # 目录必须提前创建,否则Prometheus服务会无法启动 mkdir -p ${CONF_PATH}/rule/{op,ssl,prod} mkdir -p ${CONF_PATH}/prod/domain_config # prometheus conf file cat > /data/prometheus/conf/prometheus.yml << EOF # https://prometheus.io/docs/prometheus/latest/configuration/configuration/ # 全局配置 global: scrape_interval: 30s # 每15秒抓取一次数据,默认值为1分钟 scrape_timeout: 30s evaluation_interval: 60s # 每15分钟检测一次可用性,默认值为1分钟 #scrape_timeout: 60s # 全局设置超时时间,这个注掉了。 # Alertmanager配置,需要在targets添加ip和端口,也可以使用主机名和域名 alerting: alertmanagers: - static_configs: - targets: ['127.0.0.1:9093'] # 根据全局文件 'evaluation_interval' 的时间,根据 rule 文件进行检查,可配置多个。 rule_files: - "/data/prometheus/conf/rule/prod/*.yml" - "/data/prometheus/conf/rule/op/*.yml" - "/data/prometheus/conf/rule/ssl/*.yml" # - "second_rules.yml" # 抓取配置配置 scrape_configs: - job_name: 'blackbox' metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. scrape_interval: 30s file_sd_configs: - files: - /data/prometheus/conf/prod/domain_config/*.yml relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port. - job_name: 'prom' #honor_labels: true scrape_interval: 10s static_configs: - targets: ['172.26.42.229:9100'] labels: op_region: 'cn-north-1' app: 'Prometheus' env: 'Server' EOF
-
Nginx配置Basic_Auth访问
# 安装 Apache工具包 apt install apache2-utils htpasswd -bc /etc/nginx/.prom_htpasswd admin admin # nginx conf cat > /etc/nginx/conf.d/prom.conf <<EOF server { listen 80; server_name prom.k8stech.net; auth_basic "Please input password"; auth_basic_user_file /etc/nginx/.prom_htpasswd; location / { try_files $uri @prom; } location @prom { internal; proxy_pass http://localhost:9090; } } EOF
-
使用浏览器访问: http://prom.k8stech.net
- user:admin
- pass:admin
-
二进制安装Node_exporter
# prom server 安装 NODE_PATH='/data/prometheus/node_exporter/' cd /usr/local/src/ mkdir -p ${NODE_PATH} wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gz && tar xvf node_exporter-0.18.0.linux-amd64.tar.gz cp node_exporter-0.18.0.linux-amd64/node_exporter ${NODE_PATH} chown -R prometheus.prometheus ${NODE_PATH} # node节点安装 NODE_PATH='/data/prometheus/node_exporter/' useradd prometheus && mkdir -p ${NODE_PATH} cd /usr/local/src/ wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gz && tar xvf node_exporter-0.18.0.linux-amd64.tar.gz cp node_exporter-0.18.0.linux-amd64/node_exporter ${NODE_PATH} chown -R prometheus.prometheus ${NODE_PATH}
-
创建Systemd Node_exporter服务
# 创建配置文件 Centos7 路径是/usr/lib/systemd/ cat > /lib/systemd/system/node_exporter.service <<EOF [Unit] Description=node_exporter Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/data/prometheus/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target EOF # 开机启动并运行服务 systemctl enable node_exporter.service systemctl start node_exporter.service systemctl status node_exporter.service # 查看端口是否正常 netstat -anplt|grep 9100 tcp 0 0 172.26.42.229:58364 172.26.42.229:9100 ESTABLISHED 32220/prometheus tcp6 0 0 :::9100 :::* LISTEN 972/node_exporter tcp6 0 0 172.26.42.229:9100 172.26.42.229:58364 ESTABLISHED 972/node_exporter
Node Metrics查看 访问9100端口即可。
- blackbox_exporter的使用:https://www.jianshu.com/p/4425cb6a48df