Implementing service monitoring in Rancher with Prometheus


Screen Shot 2015-05-25 at 10.27.18
PMI have blogged about monitoring docker deployments a couple times now (here & here), however, up to this point we have been monitoring container stats without looking at the bigger picture. How do these containers fit into a larger unit and how we get insights into the deployment as a whole rather than individual containers. In this post I will cover leveraging docker labels and Rancher’s projects and services support to provide monitoring information that understands the deployment structure. I will be using Prometheus service monitoring because in our earlier survey we found it to be the best self-hosted solution. However, you could use the same approach of inspecting labels to integrate with any monitoring system.

Deployment Setup

Before we start monitoring a deployment we have to bring up a deployment to monitor. For this purpose I will be creating a mock deployment with a Web Service, and a Database Service. The web service consists of nginx containers behind an load balancer and the database service will be a set of mysql containers. We will create two Projects; development and production, which both contain the web and database services . Lets get started by provisioning a Rancher management container. First, create the Rancher server using the command shown below. Note that we will be using a tagged version (0.21.5-rc1) of Rancher as the labels support is not yet available in the default version of Rancher. Keep in mind, Rancher is still in alpha and is changing quickly, this version has an updated UI, and a handful of new features.

docker run -d --restart=always -p 8080:8080 rancher/server:v0.21.5-rc1

After bringing up your Rancher server (and optionally securing it using Github OAuth) browse to the hosts tab and select the Infrastructure tab, the hosts sub tab and click the Add Host button to add Rancher compute nodes. Choose from one of the supported cloud infrastructure providers or select custom to create a new compute node. Create at least two compute nodes in order to spread your containers and their associated workload.

Screen Shot 2015-05-20 at 11.01.36
PM

[Once your hosts have come up and are updated in the Rancher Server UI we can define and launch our services on top of the hosts. To create our services, we must first create the Project into which they will be launched. Click the Services tab, Projects sub-tab and Add Project. Create a project called Development. Repeat this process again to create a second project called Production. These two projects help to segment our services in monitoring as well as the Rancher UI. ]

Screen Shot 2015-05-25 at 10.27.18
PM

[Now we are going to launch services into our projects. Click the Add Service button in the Development project in order to define a new service. We will call our first service Web], scale it to two containers and specify that we will be using the nginx image for this service. Similarly we are going to a launch a second service called DB with a scale of two instances and mysql as the image.

Screen Shot 2015-05-20 at 11.11.40
PMAdditionally we need to create a Load Balancer to route traffic to our web service. We can add Load Balancers (LBs) by clicking the Add Load Balancer button. We will name our balancer WebLB, define that we want two containers to handle the balancing, specify that we want to route to port 80 (default port of nginx container) of the web service. We can now hit Create and our Lb should be ready in a few moments. Back in Development section of the services tab you can now start your two services and the load balancer in order to bring up your containers.

Screen Shot 2015-05-20 at 11.15.38
PM

You should repeat these steps in the production to create and launch the containers in that project as well. Once you have launched services in both projects you can go to the hosts tab and see your ten or so containers running. They should be evenly distributed across both your agents.

Screen Shot 2015-05-25 at 10.44.12 PM

Rancher Labels

Now that we have our Rancher containers up, we can inspect the labels that Rancher has added to the containers. Select one of the containers, pull up the menu and select View in API to show information about it. Look for the labels section, each container will be tagged with the following labels. These labels are also available to inspect using the Docker Remote API via the container/[id] endpoint, and the docker command line interface using the docker inspect command. You can retrieve this information from either of these sources and feed it into the monitoring system of your choice.

"labels": {
    "io.rancher.container.ip": "11.42.78.168/16",
    "io.rancher.service.deployment.unit": "5710823b-0823-40b9-a004-feef6ad5f074",
    "io.rancher.scheduler.affinity:container_label:io.rancher.service.....": "",
    "io.rancher.container.uuid": "92ebab6d-2c9a-4854-86d7-d3be128c9f27",
    "io.rancher.service.name": "WebLB",
    "io.rancher.container.system": "LoadBalancerAgent",
    "io.rancher.environment.name": "Development",
}

Starting Monitoring Agents

For this article we will be using a modified Prometheus agent (Container Exporters) to collect metrics. Prometheus should be familiar to you if you have read my earlier articles about docker motioning. I am using Prometheus because in our investigation it was the best of the self-managed monitoring solution. To start collecting metrics launch the container-exporter on all your Rancher compute nodes using the the usman/container-exporter image. This is a modified version of the prom/container-exporter image which has been updated to capture the docker labels shown above and use them to tag Prometheus metrics. You need to expose the port 9104 using the Port Map section. Also click Advanced Options > Volumes tab and mount the following two files into the container: cgoups file (/sys/fs/cgroup:/cgroup) and the docker socket file (/var/run/docker.sock:/var/run/docker.sock). The settings required to launch the container are shown below.

Screen Shot 2015-05-23 at 11.11.09 AM

After launching the agent container browse to http://RANCHER_COMPUTE_HOST_IP:9104/metrics, where RANCHER_COMPUTE_HOST_IP is the publicly accessible IP of the host running Rancher compute agent. In the example above the two IPs are 104.236.124.240 and 45.55.228.18 I have copied an example of the full output from my deployments here but you can see a snippets below. The first snippet shows the memory usage for a container belonging to the Database (DB) service running in the Development environment. You can also see that this is a container of type *user. *

container_memory_max_usage_bytes{
    component="none",
    container_type="user",
    environment="Development",
    id="ff7b1b6e5b62a9dfcd95717faec45a7eca50ad3661812eaff89188c324088c2b",
    image="mysql:latest",
    name="98403716-9dc2-4995-b35c-e67c526832ad",
    service="DB"
},970752

The second snippet shows the same metric for a system container, i.e. one of the containers launched by Rancher rather than by the user themselves. In addition to the same fields as before (environment, service and container type) we also see that the component field is set to LoadBalancerAgent which means that this one of the proxy instances that provide load balancing to our cluster. All system containers will have a component value that defines which Rancher component launched the container. Other possible containers include: rancher-agent and network-agent.

container_memory_max_usage_bytes{
 component="LoadBalancerAgent",
 container_type="system",
 environment="Production",
 id="249f11521e0f9a160272a7e5942c906b5bfadc010812811c050fa0f593ddc3eb",
 image="rancher/agent-instance:v0.3.1",
 name="07262e6d-43ea-4d4f-96d7-5495cf6e9868",
 service="WebLb"
},3.702784e+06

###

Starting Prometheus Server

Now that we have our agents setup all that is needed is to launch our Prometheus server to aggregate the metrics and provide a graphical interface for the metrics. For this we can follow the steps outlined in our earlier monitoring article. Create a file called prometheus.conf and add the following text inside it. This specifies that our Prometheus server will scrape metrics every 5s from the container-exporters running on the two rancher compute nodes.

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  labels:
    monitor: rancher-metrics

rule_files:

scrape_configs:
- job_name: prometheus
  scrape_interval: 5s

  target_groups:
    # These endpoints are scraped via HTTP.
    - targets: ['RANCHER_COMPUTE1_HOST:9104/metrics','RANCHER_COMPUTE2_HOST_IP:9104/metrics']

Now launch a prometheus-server container Mount this file into the container using the command below.

docker run -d --name prometheus-server -p 9090:9090 \
    -v $PWD/prometheus.conf:/prometheus.conf prom/prometheus \
    -config.file=/prometheus.conf

Since there is no easy way to create files on the Rancher Compute node I have created a wrapper container which makes it easy to specify your targets from the Rancher UI. Use the usman/prometheus image and specify a field called TARGETS with a space separated list of Prometheus agent URLs, e.g. http://RANCHER_COMPUTE1_HOST_IP:9104/metrics http://RANCHER_COMPUTE2_HOST_IP:9104/metrics. The complete configuration set is shown below. Once you launch this container you can browse to http://RANCHER_COMPUTE_HOST_IP:9090/, where RANCHER_COMPUTE_HOST_IP is the public IP of the host on which you launched Pormetheus Server.

Screen Shot 2015-05-23 at 8.40.56 PMMonitoring Rancher Containers

Now we have our metrics agent up and reporting metrics to our Prometheus server we can finally start looking at graphs. Bring up the Prometheus Web UI click through to the Graphs tab. Using this setup we can filter data on more than just container names and images as we did in the previous article about Prometheus monitoring. The graphs below show the amount of memory used by system containers and the number of load balancer containers in the production deployment. We can retrieve these graph by using the queries:

# Memory used by system containers
sum(container_memory_usage_bytes{container_type="system"})

# The count of Production load balancers
count(container_last_seen{environment="Production", component="LoadBalancerAgent"})

Screen Shot 2015-05-23 at 8.57.36
PM Screen Shot 2015-05-23 at 9.00.36
PM

We have already covered how to setup Prometheus dashboards and Alerts in the earlier article hence I will not go over those topics again. In addition for more details about the query syntax and available functions see the Prometheus Documentation. With the additional labels that we get from our custom Prometheus agent we have can now divide our metrics into logical sets rather than just images. For example you may want to use MySQL databases for two separate projects running on the same Rancher cluster. Prior to today you would not be able to separate metrics from containers belonging to each project. Similarly, you would not be able to group metrics by environment or service and hence not be able to setup alerts on these groups. Similarly if you setup alert to notify you if you have less than one web (nginx) container you certainly want to limit the scope of that alert to your production containers. Currently Rancher tags containers with Environment, Service, Type and component (for system containers) however, as more labels get integrated into Rancher we can filter our metrics with more precision.

To see how service discovery and application management are being used in Rancher, please join us for our upcoming online meetup: Deploying your first Application with Rancher.

Usman is a server and infrastructure engineer, with experience in building large scale distributed services on top of various cloud platforms. You can read more of his work at techtraits.com, or follow him on twitter @usman_ismailor onGitHub.

快速开启您的Rancher之旅