Nagios利用NRPE監控Linux主機
一、簡介
1、NRPE介紹
NRPE是Nagios的一個功能擴展,它可在遠程Linux/Unix主機上執行插件程序。通過在遠程服務器上安裝NRPE插件及Nagios插件程序來向Nagios監控平臺提供該服務器的本地情況,如CPU負載,內存使用,磁盤使用等。這里將Nagios監控端稱為Nagios服務器端,而將遠程被監控的主機稱為Nagios客戶端。
Nagios監控遠程主機的方法有多種,其方式包括SNMP,NRPE,SSH,NCSA等。這里介紹其通過NRPE監控遠程Linux主機的方式。
NRPE(Nagios Remote Plugin Executor)是用于在遠端服務器上運行監測命令的守護進程,它用于讓Nagios監控端基于安裝的方式觸發遠端主機上的檢測命令,并將檢測結果返回給監控端。而其執行的開銷遠低于基于SSH的檢測方式,而且檢測過程不需要遠程主機上的系統賬號信息,其安全性也高于SSH的檢測方式。
2、NRPE的工作原理
NRPE有兩部分組成
check_nrpe插件:位于監控主機上
nrpe daemon:運行在遠程主機上,通常是被監控端agent
注意:nrpe daemon需要Nagios-plugins插件的支持,否則daemon不能做任何監控
詳細的介紹NRPE的工作原理
當Nagios需要監控某個遠程Linux主機的服務或者資源情況時:
首先:Nagios會運行check_nrpe這個插件,告訴它要檢查什么;
其次:check_nrpe插件會連接到遠程的NRPE daemon,所用的方式是SSL;
然后:NRPE daemon 會運行相應的Nagios插件來執行檢查;
最后:NRPE daemon 將檢查的結果返回給check_nrpe 插件,插件將其遞交給nagios做處理。
二、被監控端安裝Nagios-plugins插件和NRPE
1、添加nagios用戶
- [root@ClientNrpe ~]# useradd -s /sbin/nologin nagios
2、安裝nagios-plugins,因為NRPE依賴此插件
- [root@ClientNrpe ~]# yum -y install gcc gcc-c++ make openssl openssl-devel
- [root@ClientNrpe ~]# tar xf nagios-plugins-2.0.3.tar.gz
- [root@ClientNrpe ~]# cd nagios-plugins-2.0.3
- [root@ClientNrpe nagios-plugins-2.0.3]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
- [root@ClientNrpe nagios-plugins-2.0.3]# make && make install
- #注意:如何要監控mysql 需要添加 --with-mysql
3、安裝NRPE
- [root@ClientNrpe ~]# tar xf nrpe-2.15.tar.gz
- [root@ClientNrpe ~]# cd nrpe-2.15
- [root@ClientNrpe nrpe-2.15]# ./configure --with-nrpe-user=nagios \
- > --with-nrpe-group=nagios \
- > --with-nagios-user=nagios \
- > --with-nagios-group=nagios \
- > --enable-command-args \
- > --enable-ssl
- [root@ClientNrpe nrpe-2.15]# make all
- [root@ClientNrpe nrpe-2.15]# make install-plugin
- [root@ClientNrpe nrpe-2.15]# make install-daemon
- [root@ClientNrpe nrpe-2.15]# make install-daemon-config
4、配置NRPE
- [root@ClientNrpe ~]# grep -v '^#' /usr/local/nagios/etc/nrpe.cfg |sed '/^$/d'
- log_facility=daemon
- pid_file=/var/run/nrpe.pid
- server_port=5666 #監聽的端口
- nrpe_user=nagios
- nrpe_group=nagios
- allowed_hosts=192.168.0.105 #允許的地址通常是Nagios服務器端
- dont_blame_nrpe=0
- allow_bash_command_substitution=0
- debug=0
- command_timeout=60
- connection_timeout=300
- command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
- command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
- command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
- command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
- command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
5、啟動NRPE
- #以守護進程的方式啟動
- [root@ClientNrpe ~]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- [root@ClientNrpe ~]# netstat -tulpn | grep nrpe
- tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
- tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
有兩種方式用于管理nrpe服務,nrpe有兩種運行模式:
- -i # Run as a service under inetd or xinetd
- -d # Run as a standalone daemon
可以為nrpe編寫啟動腳本,使得nrpe以standard alone方式運行:
- [root@ClientNrpe ~]# cat /etc/init.d/nrped
- #!/bin/bash
- # chkconfig: 2345 88 12
- # description: NRPE DAEMON
- NRPE=/usr/local/nagios/bin/nrpe
- NRPECONF=/usr/local/nagios/etc/nrpe.cfg
- case "$1" in
- start)
- echo -n "Starting NRPE daemon..."
- $NRPE -c $NRPECONF -d
- echo " done."
- ;;
- stop)
- echo -n "Stopping NRPE daemon..."
- pkill -u nagios nrpe
- echo " done."
- ;;
- restart)
- $0 stop
- sleep 2
- $0 start
- ;;
- *)
- echo "Usage: $0 start|stop|restart"
- ;;
- esac
- exit 0
- [root@ClientNrpe ~]# chmod +x /etc/init.d/nrped
- [root@ClientNrpe ~]# chkconfig --add nrped
- [root@ClientNrpe ~]# chkconfig nrped on
- [root@ClientNrpe ~]# service nrped start
- Starting NRPE daemon... done.
- [root@ClientNrpe ~]# netstat -tnlp
- Active Internet connections (only servers)
- Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
- tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1031/sshd
- tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1108/master
- tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
- tcp 0 0 :::22 :::* LISTEN 1031/sshd
- tcp 0 0 ::1:25 :::* LISTEN 1108/master
- tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
三、監控端安裝NRPE
1、安裝NRPE
- [root@Nagios ~]# tar xf nrpe-2.15.tar.gz
- [root@Nagios ~]# cd nrpe-2.15
- [root@Nagios nrpe-2.15]# ./configure
- > --with-nrpe-user=nagios \
- > --with-nrpe-group=nagios \
- > --with-nagios-user=nagios \
- > --with-nagios-group=nagios \
- > --enable-command-args \
- > --enable-ssl
- [root@Nagios nrpe-2.15]# make all
- [root@Nagios nrpe-2.15]# make install-plugin
- #安裝完成后,會在Nagios安裝目錄的libexec下生成check_nrpe的插件
- [root@Nagios ~]# cd /usr/local/nagios/libexec/
- [root@Nagios libexec]# ll -d check_nrpe
- -rwxrwxr-x. 1 nagios nagios 76769 9月 28 08:07 check_nrpe
2、check_nrpe的用法
- [root@Nagios libexec]# ./check_nrpe -h
- NRPE Plugin for Nagios
- Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
- Version: 2.15
- Last Modified: 09-06-2013
- License: GPL v2 with exemptions (-l for more info)
- SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
- Usage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
- Options:
- -n = Do no use SSL
- -u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
- <host> = The address of the host running the NRPE daemon
- <bindaddr> = bind to local address
- -4 = user ipv4 only
- -6 = user ipv6 only
- [port] = The port on which the daemon is running (default=5666)
- [timeout] = Number of seconds before connection times out (default=10)
- [command] = The name of the command that the remote daemon should run
- [arglist] = Optional arguments that should be passed to the command. Multiple
- arguments should be separated by a space. If provided, this must be
- the last option supplied on the command line.
- Note:
- This plugin requires that you have the NRPE daemon running on the remote host.
- You must also have configured the daemon to associate a specific plugin command
- with the [command] option you are specifying here. Upon receipt of the
- [command] argument, the NRPE daemon will run the appropriate plugin command and
- send the plugin output and return code back to *this* plugin. This allows you
- to execute plugins on remote hosts and 'fake' the results to make Nagios think
- the plugin is being run locally.
- check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
- [root@Nagios libexec]# ./check_nrpe -H 192.168.0.81
- NRPE v2.15
3、定義命令
- [root@Nagios ~]# cd /usr/local/nagios/etc/objects/
- [root@Nagios objects]# vim commands.cfg
- #增加到末尾行
- define command{
- command_name check_nrpe
- command_line $USER1$/check_nrpe -H "$HOSTADDRESS$" -c "$ARG1$"
- }
#p#
4、定義服務
- [root@Nagios objects]# cp windows.cfg linhost.cfg
- [root@Nagios objects]# grep -v '^#' linhost.cfg |sed '/^$/d'
- define host{
- use linux-server
- host_name linhost
- alias My Linux Server
- address 192.168.0.81
- }
- define service{
- use generic-service
- host_name linhost
- service_description CHECK USER
- check_command check_nrpe!check_users
- }
- define service{
- use generic-service
- host_name linhost
- service_description Load
- check_command check_nrpe!check_load
- }
- define service{
- use generic-service
- host_name linhost
- service_description SDA1
- check_command check_nrpe!check_hda1
- }
- define service{
- use generic-service
- host_name linhost
- service_description Zombie
- check_command check_nrpe!check_zombie_procs
- }
- define service{
- use generic-service
- host_name linhost
- service_description Total procs
- check_command check_nrpe!check_total_procs
- }
這里重點說下,Nagios服務端定義服務的命令完全是根據被監控端NRPE中內置的監控命令,如下圖所示
5、啟動所定義的命令和服務
- [root@Nagios ~]# vim /usr/local/nagios/etc/nagios.cfg
- #增加一行
- cfg_file=/usr/local/nagios/etc/objects/linhost.cfg
6、配置文件語法檢查
- [root@Nagios ~]# service nagios configtest
- Nagios Core 4.0.7
- Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
- Copyright (c) 1999-2009 Ethan Galstad
- Last Modified: 06-03-2014
- License: GPL
- Website: http://www.nagios.org
- Reading configuration data...
- Read main config file okay...
- Read object config files okay...
- Running pre-flight check on configuration data...
- Checking objects...
- Checked 20 services.
- Checked 3 hosts.
- Checked 2 host groups.
- Checked 0 service groups.
- Checked 1 contacts.
- Checked 1 contact groups.
- Checked 26 commands.
- Checked 5 time periods.
- Checked 0 host escalations.
- Checked 0 service escalations.
- Checking for circular paths...
- Checked 3 hosts
- Checked 0 service dependencies
- Checked 0 host dependencies
- Checked 5 timeperiods
- Checking global event handlers...
- Checking obsessive compulsive processor commands...
- Checking misc settings...
- Total Warnings: 0
- Total Errors: 0
- Things look okay - No serious problems were detected during the pre-flight check
- Object precache file created:
- /usr/local/nagios/var/objects.precache
7、重新啟動nagios服務
- [root@Nagios ~]# service nagios restart
- Running configuration check...
- Stopping nagios: done.
- Starting nagios: done.
8、打開Nagios web監控頁面
1)首先點擊【Hosts】查看監控主機狀態是否為UP
2)其次點擊【Services】查看各監控服務的狀態是否為OK
注意:在監控新添加的主機linhost;出現狀態為CRITICAL,提示沒有那個文件或目錄。下面是解決辦法
在監控Linhost主機時出現一個CRITICAL的警告,查找解決辦法
- ###被監控端修改NRPE配置文件并重啟NRPE服務
- [root@ClientNrpe etc]# vim nrpe.cfg
- command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
- [root@ClientNrpe etc]# service nrped restart
- ###監控端修改linhost.cfg配置文件并重啟nagios和httpd服務
- [root@Nagios objects]# vim linhost.cfg
- #注釋:原來這里是hda1,現在修改成sda1
- define service{
- use generic-service
- host_name linhost
- service_description SDA1
- check_command check_nrpe!check_sda1
- }
- [root@Nagios ~]# service nagios restart
- Running configuration check...
- Stopping nagios: done.
- Starting nagios: done.
- [root@Nagios ~]# service httpd restart
- 停止 httpd: [確定]
- 正在啟動 httpd: [確定]
再次點擊【services】即為刷新頁面,查看如下圖所示: