Ansible for Server Monitoring

More Ansible goodness this week. We’ve been working on a basic playbook to set up the innovatively-named Monit monitoring tool to keep an eye on our webservers and give them a kick up the backside if they’re misbehaving.

It’s based on a very useful Ansible Galaxy role, pgolm’s Monit, which installs and configures the tool. However, the role’s documentation doesn’t necessarily make it obvious how to get the best from Monit, so here’s an example playbook for monitoring PHP, MySQL and Nginx to get things started:

- hosts: hosts 
  roles:
    - pgolm.monit
  vars:
    monit_cycle: 60
    monit_webinterface_enabled: true
    monit_webinterface_port: 2812
    monit_services:
      - name: php5-fpm
        type: process
        target: /var/run/php5-fpm.pid
        start: "/etc/init.d/php5-fpm start"
        stop: "/etc/init.d/php5-fpm stop"
      - name: nginx
        type: process
        target: /var/run/nginx.pid
        start: /etc/init.d/nginx start
        stop: /etc/init.d/nginx stop
        rules:
          -  "if totalcpu > 80% for 3 cycles then alert"
          -  "if totalcpu > 80% for 6 cycles then restart"
      - name: mysql
        type: process
        target: /var/run/mysqld/mysqld.pid
        start: "/etc/init.d/mysql start"
        stop: "/etc/init.d/mysql stop"
        rules:
          -  "if totalcpu > 80% for 3 cycles then alert"
          -  "if totalcpu > 80% for 6 cycles then restart"

A quick explanation of the variables:

monit_cycle: time between monit runs (in seconds)
monit_webinterface_enabled: monit comes with a GUI which can be used to manage tasks
monit_webinterface_port: the port used to access the GUI
monit_services:
- name: the name of the package
- type: packages come under four types - process, system, host, and filesystem. We’re running monit on the servers themselves rather than monitoring them remotely at the moment, so we’re only using process in this example.
- target: the pid file of a process
- start: the command to start the process
- stop: the command to stop the process
- rules: the important bit. This is where actions are triggered when the process encounters a problem. In this case we’ve instructed monit to return an alert when the process is utilising more than 80% of the CPU for 3 consecutive cycles, and restart the process when it has utilised it for 6 cycles.

Rules can be written with the assistance of any of the following noise keywords (not actually read by the machine, but they really help to structure your code): if, and, with(in), has, us(ing|e), on(ly), then, for, of.

Hopefully that should be enough to get started using pgolm’s Ansible Monit role. We have, however, only scratched the surface of what can be achieved using Monit. Other interesting areas include monitoring checksums of important files to detect changes, and testing the outputs of scripts. Ansible makes it beautifully simple to do all of this stuff.

Ansible for Server Monitoring

Tags

Get in contact.