nagios監控網絡服務器和網絡服務配置篇
nagios配置
1:配置web接口
假設你已經運行了apache,如果沒有,請參考:
http://localhost/upload/blog.php?do-showone-tid-18.html
vi /usr/local/apache2/conf/httpd.conf
添加如下內容:
- ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
- <Directory "/usr/local/nagios/sbin">
- Options ExecCGI
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
- </Directory>
- Alias /nagios /usr/local/nagios/share
- <Directory "/usr/local/nagios/share">
- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
- </Directory>
修改完畢,保存文件,并重啟apache:
/usr/local/apahce2/bin/apachectl restart
2:配置apache的BASIC認證:
生成認證密碼:
/usr/local/apache2/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.users nagios nagios
apache接口配置完成。
開始配置nagios:
cd /usr/local/nagios/etc/
在/usr/local/nagios/etc下是nagios的配置模板文件-sample,把.cfg-sample文件全部拷貝成.cfg
例如:cp nagios.cfg-sample nagios.cfg
全部拷貝完成即可.
vi minimal.cfg
注釋所有command:
注釋的方法是在每一個定義語句前面添加”#“
修改cgi.cfg
修改use_authentication=1為use_authentication=0,即不用驗證.不然有一些頁面不會顯示。
現在檢查配置文件是否有語法錯誤:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果正確,會顯示以下結果:
Total Warnings: 0
Total Errors: 0
否則,需要根據提示進行修改配置文件。配置文件等會再弄。現在啟動nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
為了使nagios異常中斷,我們使用daemontools啟動:
安裝daemontool:
- mkdir -p /package
- chmod 1755 /package
- cd /package
- fetch http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
- cd admin/daemontools-0.76/
- package/install
檢查svscan進程是否啟動:
- ps aux | grep svscan
- root 376 0.0 0.0 1636 0 con- IW - 0:00.00 /bin/sh /command/svscanboot
- root 411 0.0 0.0 1224 208 con- S 8Jul06 0:42.50 svscan /service
ok,啟動正常了。
- cd /service
- mkdir nagios
- chmod 1755 nagios
- touch ./run
- chmod 755 ./run
- vi run
- PATH=/usr/local/bin:/usr/bin:/bin
- export PATH
- exec env - PATH=$PATH \
- /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- mkdir log
- cd log
- touch ./run
- chmod 755 ./run
- vi ./run
- #!/bin/sh
- exec setuidgid logadmin multilog t s1000000 n100 ./main
- mkdir main
- chmod 777 main
- chown nagios.nagios main
- touch status
- chown nagios.nagios status
- svc -u /service/nagios/
- svstat /service/nagios/
- root@## ps auxww | grep nagios
- root 23276 0.0 0.1 1176 488 ?? I 5:00PM 0:01.71 supervise nagios
- nagios 34251 0.0 0.3 2316 1552 ?? S 6:06PM 0:00.10 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- root@##
ok,現在把nagios服務做成自動啟動的服務了。通過svc命令可以啟動或者停止服務。
- ---------------------------------------------------------------------------------
- svc opts services
- opts is a series of getopt-style options. services consists of any number of arguments, each argument naming a directory used by supervise.
- -u: Up. If the service is not running, start it. If the service stops, restart it.
- -d: Down. If the service is running, send it a TERM signal and then a CONT signal. After it stops, do not restart it.
- -o: Once. If the service is not running, start it. Do not restart it if it stops.
- -p: Pause. Send the service a STOP signal.
- -c: Continue. Send the service a CONT signal.
- -h: Hangup. Send the service a HUP signal.
- -a: Alarm. Send the service an ALRM signal.
- -i: Interrupt. Send the service an INT signal.
- -t: Terminate. Send the service a TERM signal.
- -k: Kill. Send the service a KILL signal.
- -x: Exit. supervise will exit as soon as the service is down. If you use this option on a stable system, you're doing something wrong; supervise is designed to run forever.
- ---------------------------------------------------------------------------------
比如:
停止nagios--svc -d /service/nagios/
重啟nagios--svc -t /service/nagios/
啟動nagios--svc -u /service/nagios/
當然,你也可以使用inited的方式進行:
/usr/local/etc/rc.d/nagios start/stop
好了,反正daemontools很強大,現在打開網頁:http://localhost/nagios/,一定會讓你大吃一驚,呵呵,我的服務器和服務狀態都清楚的看到了。現在我們的nagios中只有一個,那就是它自己,localhost,呵呵,等會我們添加別的主機和主機服務。
#p#
nagios的廬山真面目
1)為主機添加一個服務
為localhost主機添加qmail服務的監控,方法如下:
- vi minimal.cfg
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!20%!10%!/
- }
可以直接拷貝原有的進行修改,我這個就是拷貝的原有的check_local_disk進行的。修改host_name,service_description,check_command等
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!20%!10%!/
- }
照貓畫虎的進行修改,然后去修改:
- vi checkcommands.cfg
- #'check_qmail' command definition
- define command{
- command_name check_qmail
- command_line $USER1$/check_smtp -H 127.0.0.1
- }
- define command{
- command_name check_pop3
- command_line $USER1$/check_pop -H 127.0.0.1
- }
保存,然后檢查配置文件:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果沒有錯誤會顯示:
Total Warnings: 0
Total Errors: 0
如果有錯誤,請根據提示進行錯誤的修正。
重啟nagios
svc -d /service/nagios/ && svc -u /service/nagios/
通過web頁面檢查nagios的結果:
http://10.5.1.153/nagios/
點擊“Service Detail”
#p#
2)添加主機并添加服務
我們會監控這臺主機的負載、磁盤等一些沒有通過端口方式啟動的服務器狀態,以及它的服務,比如:apache、mysql、qmail和ntp等等吧。那么沒有端口的nagios直接能監控到嗎?答案是不行。所以我們必須在兩臺主機上安裝nrpe,nrpe可以啟動5666端口,把檢測的信息源源不斷的傳給監控中心的主機。
ok,我們把apache、mysql、qmail和ntp先加上,這回我們把監控的主機和服務新建一個文件:
- cd /usr/local/nagios/etc/
- touch 10_5_1_156.cfg
- vi nagios.cfg
- cfg_file=/usr/local/nagios/etc/10_5_1_156.cfg
- vi 10_5_1_156.cfg
定義一個主機:
- define host{
- use generic-host ; Name of host template to use
- host_name test_nrpe
- alias client
- address 10.5.1.156
- check_command check-host-alive
- max_check_attempts 1
- check_period 24x7
- notification_interval 120
- notification_period 24x7
- notification_options d,r
- contact_groups admins
- }
定義主機需要檢查的服務:
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description PING
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ping!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description apache
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_http!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description mysql
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_mysql!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description ntp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ntp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!100.0,20%!500.0,60%
- }
nagios配置中,服務就定義完了。此時是不是多了一個主機和它下面的服務呢?那是肯定的。如果這個過程中出現添加主機和服務可能出現的問題該怎么解決?請閱讀:概念篇、安裝篇和故障解決篇