Running Nagios as a System Service on RancherOS


Nagios is a fantastic monitoring tool, and I wanted to see if I could get the agent to run as a system container on RancherOS, in order to monitor the host and any Docker containers running on it. It turned out to be incredibly easy. In this blog post, I’ll walk through how to launch the Nagios agent as system container in RancherOS. Specifically, I’ll use two vagrant boxes to cover:

  1. Provisioning a server with the Rancher control plane
  2. Adding a second server running Rancher OS
  3. Installing a Nagios agent as system container on the second server
  4. Connecting the Nagios agent to the Nagios management server

System Containers in RancherOS

First, for anyone who isn’t familiar with RancherOS, it is a minimal distribution of Linux designed specifically to run Docker. RancherOS runs a Docker daemon as PID 1, a role typically occupied by the init system or systemd in most distributions. This daemon runs essential system services like SSH, syslog or NTP as containers, and is called system docker.

A second Docker daemon, called user docker, is launched as a container. This is where any new containers started by the user are created, as well as containers placed by Rancher or other management services.

To give the Nagios agent access to all of the data from the server, as well as the system and user containers, it should run in the system docker instance. I will run this setup in 2 Vagrant virtual machines.

[]Set up Rancher

Even though we could monitor RancherOS with Nagios directly, I’m going to set up Rancher in this deployment to manage the containers we create. The Rancher team provides a Vagrantfile to run RancherOS in a VM here: https://github.com/rancher/os-vagrant and another Vagrantfile for Rancher here: https://github.com/rancher/rancher But, since I want to have both in one Vagrant setup, I merged both Vagrantfiles into one and added the option to run multiple RancherOS instances in one.

You can find my new Vagrant file here: [https://github.com/buster/rancher-tutorial]{.c10}

The first step (after installing Vagrant, of course) is to clone this repository and edit the Vagrantfile to match your IP addresses in the lines:

# The number of VMs will be added to the following string,
# so Rancher will be on 192.168.0.200, the first RancherOS instance on 192.168.0.201, etc.
$rancher_ip_start = "192.168.0.20"
$rancherui_ip = $rancher_ip_start + "0"
# the number of rancher instances
$n_rancher = 1

* *

Leave \$n_rancher at 1 for now.

After editing this file, run `vagrant up’.

Vagrant will now first setup the Rancher VM, which means Vagrant will download the Virtualbox image, start it and Docker will then download and run the Rancher Server and the Rancher Agent. Afterwards, the second VM, which will host our RancherOS instance, will be started and the RancherOS instance will register itself at the Rancher Server.

When finished, browse to the Rancher IP (http://192.168.0.200:8080/ in my case) and observe your new and shiny VMs:

[]Adding a System Container to Rancher

The next task is to set up the Nagios Agent on the RancherOS instance.

For that you will need to log in to the server, which you do by running `vagrant ssh rancher1`.

There you will have access to the user docker (by calling `docker`) and to the system docker by calling `sudo system-docker`.

A system container is not different from your usual docker container, except that it is run by the system docker and that has no networking by default. Thus, it needs to inherit the network of the host (--net=host parameter):

sudo system-docker run -d --net=host --name nagios-agent buster/nagios-agent

This nagios agent container comes with a minimal configuration to check the load on the second RancherOS instance.

[]Deploying the Nagios Server to Rancher

In order for the Nagios agent to make any sense, we will also need a Nagios Server which polls the Nagios Agent.

This is as easy as any other Rancher deployment, by clicking on “Add Container” in the Rancher UI.

There we will make use of the already existing Nagios Server docker container from https://registry.hub.docker.com/u/cpuguy83/nagios/ Also don’t forget to go to the `Ports` tab and map port 80 to port 8081 so that you can login on nagios.

Add this container and after a while, the Nagios Server will be up and running! Browse to http://192.168.0.200:8081/ and observe the Nagios UI running. The default username is nagiosadmin and the password is nagios.

[]Configure Nagios Server

The Nagios Server only knows itself right now, so we will need to configure it to poll the Nagios Agent.

This can be done in /opt/nagios/etc/conf.d/rancher1.cfg, for example.

Rancher offers a very nice terminal into the running containers, which you can reach by click on the container and afterwards on the “execute shell” url:

Now, you can edit the config file by running `nano /opt/nagios/etc/conf.d/rancher1.cfg`.

Add the following lines to the file:

define command{
 command_name check_nrpe
 command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
 }
define host{
 use linux-server
 host_name rancher1
 address 192.168.0.201
 }
define service{
 use linux-server
 host_name rancher1
 service_description Current Users
 check_command check_nrpe!check_users
 }
define service{
     use generic-service
     host_name rancher1
     service_description Current Load
     check_command check_nrpe!check_load
     }

Afterwards you can check if the configuration file format is correct by running `nagios -v /opt/nagios/etc/nagios.cfg`.

To check that the nrpe server on the second host is running you can also run a check manually: `/opt/nagios/libexec/check_nrpe -H 192.168.0.201 -c check_load`

After you have verified the working Nagios setup you simply need to restart the Nagios Server container by clicking on the symbol:

Now, you can login to Nagios again and see the Nagios Plugins doing their work:

[]Conclusion

Using Nagios to monitor multiple RancherOS servers is as easy as running a preconfigured publicly available Docker container from https://registry.hub.docker.com

Starting a system docker container requires a few additional steps compared to running a user container, but hopefully we’ve explained them clearly here.

In the next few weeks RancherOS will ship 0.3, which includes support for predefined system services. That will make configuration of new agents in the Nagios server as easy as executing a docker run command.

If you’d like to get started with RancherOS, you can download it from GitHub here. Also, we’re always demoing new features and answering lots of questions at each months Rancher Meetup, which you can find a link to for below.

Sebastian Schulze is a Technology Consultant from Germany, with experience in Linux, Solaris, Docker, and Vagrant. You can contact him via github at: https://github.com/buster

快速开启您的Rancher之旅