Prometheus

Prometheus+Grafana安装及配置

deploy_study_2024

Zabbix: https://www.zabbix.com/documentation/6.4/zh/manual/installation/containers

https://github.com/zabbix/zabbix-docker
docker compose方式部署Zabbix 7.0 LTS: https://blog.csdn.net/islandstar/article/details/140105767

Zabbix在Docker中的应用和监控:https://www.cnblogs.com/evescn/p/12374756.html

Zabbix server 5.0 (tags: alpine-5.0-latest, ubuntu-5.0-latest, ol-5.0-latest)
Zabbix server 5.0.* (tags: alpine-5.0.*, ubuntu-5.0.*, ol-5.0.*)
Zabbix server 6.0 (tags: alpine-6.0-latest, ubuntu-6.0-latest, ol-6.0-latest)
Zabbix server 6.0.* (tags: alpine-6.0.*, ubuntu-6.0.*, ol-6.0.*)
Zabbix server 6.4 (tags: alpine-6.4-latest, ubuntu-6.4-latest, ol-6.4-latest)
Zabbix server 6.4.* (tags: alpine-6.4.*, ubuntu-6.4.*, ol-6.4.*)
Zabbix server 7.0 (tags: alpine-7.0-latest, ubuntu-7.0-latest, ol-7.0-latest, alpine-latest, ubuntu-latest, ol-latest, latest)
Zabbix server 7.0.* (tags: alpine-7.0.*, ubuntu-7.0.*, ol-7.0.*)
Zabbix server 7.2 (tags: alpine-trunk, ubuntu-trunk, ol-trunk)

docker pull zabbix/zabbix-server-mysql:alpine-6.4-latest
zabbix/zabbix-web-nginx-mysql

docker run --name some-zabbix-server-mysql 
-e DB_SERVER_HOST="some-mysql-server" \
-e MYSQL_USER="some-user" \
-e MYSQL_PASSWORD="some-password" \
--init -d \
zabbix/zabbix-server-mysql:alpine-6.4-latest

Prometheus
- https://www.prometheus.wang/
- https://www.k8stech.net
SRE运维体系:https://zhuanlan.zhihu.com/p/303003637

基础:快速掌握部分
- Prometheus基础的综述
- Prometheus的数据模型、时间序列模型、PrmQL。
- 自定义监控告警规则及AlertManager
- 常用的Exporter的使用场景以及使用方法，Java/Golang实现自定义
高级用法
- "You can't fix what you can't see"。可视化监控Grafana
- Prometheus自动的发现那些需要监控的资源和服务(云平台、容量化)
- Prometheus高可用、扩展能力
- 通过Prometheus构建容器云监控系统，应用程序的弹性伸缩。

prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

prometheus配置详解:https://www.jianshu.com/p/efa8b46f46c6

# 此片段指定的是prometheus的全局配置， 比如采集间隔，抓取超时时间等.
global:
  # 抓取间隔
  [ scrape_interval: <duration> | default = 1m ]

  # 抓取超时时间
  [ scrape_timeout: <duration> | default = 10s ]

  # 评估规则间隔
  [ evaluation_interval: <duration> | default = 1m ]

  # 外部一些标签设置
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

# 此片段指定报警规则文件， prometheus根据这些规则信息，会推送报警信息到alertmanager中。
rule_files:
  [ - <filepath_glob> ... ]

# 此片段指定抓取配置，prometheus的数据采集通过此片段配置。
scrape_configs:
  [ - <scrape_config> ... ]

# 此片段指定报警配置， 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# 指定后端的存储的写入api地址。
remote_write:
  [ - <remote_write> ... ]

# 指定后端的存储的读取api地址。
remote_read:
  [ - <remote_read> ... ]

Prometheus+Grafana安装及配置

Prometheus安装及配置: https://www.k8stech.net/post/prometheus-monitor-2/

讲解如何安装配置使用Grafana图形展示，并接入Email、Dingtalk、Wechat警报，Prometheus已经完美的支持Email、Slack、Dingtalk、Wechat警报。

系统/软件版本

System：Ubuntu 18.04
Prometheus 2.13.0
Node_Exporter 1.18.0
Alaermanager 1.18.0
Dingtalk-webhook 0.3.0
Grafana 6.4.0

二进制安装

PROM_PATH='/data/prometheus'
mkdir -p ${PROM_PATH}
mkdir -p ${PROM_PATH}/{data,conf,logs,bin}
useradd prometheus
cd /usr/local/src
wget https://github.com/prometheus/prometheus/releases/download/v2.13.0/prometheus-2.13.0.linux-amd64.tar.gz
tar -xvf prometheus-2.13.0.linux-amd64.tar.gz
cd prometheus-2.13.0.linux-amd64/
cp prometheus promtool ${PROM_PATH}/bin/
cp prometheus.yml ${PROM_PATH}/config/
chown -R prometheus.prometheus /data/prometheus
# Setting Variables
cat >> /etc/profile <<EOF
PATH=/data/prometheus/bin:$PATH:$HOME/bin
EOF

创建Systemd Prometheus服务

cat >>/etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/data/prometheus/bin/prometheus --config.file=/data/prometheus/conf/prometheus.yml --storage.tsdb.path=/data/prometheus/data --web.external-url=http://prom.k8stech.net --storage.tsdb.retention=90d
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl enable prometheus.service
systemctl start prometheus.service
systemctl status prometheus.service

netstat -antup|grep 9090

Prometheus配置文件

# Alertmanager Rule 目录 与 文件
CONF_PATH='/data/prometheus/conf'
# 目录必须提前创建，否则Prometheus服务会无法启动
mkdir -p ${CONF_PATH}/rule/{op,ssl,prod}
mkdir -p ${CONF_PATH}/prod/domain_config

# prometheus conf file
cat > /data/prometheus/conf/prometheus.yml << EOF
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/
# 全局配置
global:
  scrape_interval:     30s # 每15秒抓取一次数据，默认值为1分钟
  scrape_timeout: 30s
  evaluation_interval: 60s # 每15分钟检测一次可用性，默认值为1分钟
  #scrape_timeout: 60s # 全局设置超时时间，这个注掉了。

# Alertmanager配置，需要在targets添加ip和端口，也可以使用主机名和域名
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['127.0.0.1:9093']
    
# 根据全局文件 'evaluation_interval' 的时间，根据 rule 文件进行检查，可配置多个。
rule_files:
  - "/data/prometheus/conf/rule/prod/*.yml"
  - "/data/prometheus/conf/rule/op/*.yml"
  - "/data/prometheus/conf/rule/ssl/*.yml"
  # - "second_rules.yml"
# 抓取配置配置
scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]  # Look for a HTTP 200 response.
    scrape_interval: 30s
    file_sd_configs:
      - files:
        - /data/prometheus/conf/prod/domain_config/*.yml
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115  # The blackbox exporter's real hostname:port.
  - job_name: 'prom'
    #honor_labels: true
    scrape_interval: 10s
    static_configs:
    - targets: ['172.26.42.229:9100']
      labels:
         op_region: 'cn-north-1'
         app:   'Prometheus'
         env:  'Server'
EOF

Nginx配置Basic_Auth访问

# 安装 Apache工具包
apt install apache2-utils
htpasswd -bc /etc/nginx/.prom_htpasswd admin admin
# nginx conf
cat > /etc/nginx/conf.d/prom.conf <<EOF
server {
    listen       80;
    server_name  prom.k8stech.net;
        auth_basic "Please input password";
        auth_basic_user_file /etc/nginx/.prom_htpasswd;
    location / {
            try_files $uri @prom;
    }
    location @prom {
            internal;
            proxy_pass http://localhost:9090;
    }
}
EOF

使用浏览器访问: http://prom.k8stech.net
- user:admin
- pass:admin

二进制安装Node_exporter

# prom server 安装
NODE_PATH='/data/prometheus/node_exporter/'
cd /usr/local/src/
mkdir -p ${NODE_PATH}
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gz && tar xvf node_exporter-0.18.0.linux-amd64.tar.gz
cp node_exporter-0.18.0.linux-amd64/node_exporter ${NODE_PATH}
chown -R prometheus.prometheus ${NODE_PATH}

# node节点安装
NODE_PATH='/data/prometheus/node_exporter/'
useradd prometheus && mkdir -p ${NODE_PATH}
cd /usr/local/src/
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gz && tar xvf node_exporter-0.18.0.linux-amd64.tar.gz
cp node_exporter-0.18.0.linux-amd64/node_exporter ${NODE_PATH}
chown -R prometheus.prometheus ${NODE_PATH}

创建Systemd Node_exporter服务

# 创建配置文件 Centos7 路径是/usr/lib/systemd/
cat > /lib/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=prometheus
ExecStart=/data/prometheus/node_exporter/node_exporter
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF

# 开机启动并运行服务
systemctl enable node_exporter.service
systemctl start node_exporter.service
systemctl status node_exporter.service

# 查看端口是否正常
netstat -anplt|grep 9100
tcp        0      0 172.26.42.229:58364     172.26.42.229:9100      ESTABLISHED 32220/prometheus
tcp6       0      0 :::9100                 :::*                    LISTEN      972/node_exporter
tcp6       0      0 172.26.42.229:9100      172.26.42.229:58364     ESTABLISHED 972/node_exporter

Node Metrics查看访问9100端口即可。

blackbox_exporter的使用:https://www.jianshu.com/p/4425cb6a48df