Lessons Learned Building a Deployment Pipeline with Docker, Docker Compose and Rancher (Part 4)

In this post, we’ll discuss how we implemented consul for service discovery with Rancher. John Patterson (@cantrobot) and Chris Lunsford run This End Out, an operations and infrastructure services company. You can find them online at* https://www.thisendout.com *and follow them on twitter @thisendout. ** If you haven’t already, please read the previous posts in this series: Part 1: Getting started with CI/CD and Docker Part 2: Moving to Compose blueprints Part 3: Adding Rancher for Orchestration In this final post of the series on building a deployment pipeline, we will explore some of the challenges we faced when transitioning to Rancher for cluster scheduling. In the previous article, we removed the operator from the process of choosing where a container would run by allowing Rancher to perform the scheduling. With this new scheme, we must address how the rest of our environment knows where the scheduler places these services and how they can be reached. We will also talk about manipulating the scheduler with labels to adjust where containers are placed and avoid port binding conflicts. Lastly, we will optimize our upgrade process by taking advantage of Rancher’s rollback capability. Before the introduction of Rancher, our environment was a fairly static one. We always deployed containers to the same hosts, and deploying to a different host meant that we would need to update a few config files to reflect the new location. For example, if we were to add one additional instance of the ‘java-service-1’ application, we would also need to update the loadbalancer to point to the IP of the additional instance. Now that we employ a scheduler, we lose predictability of where our containers get deployed and need to make our environment configuration dynamic, adapting to changes automatically. To do this, we make use of service registration and discovery. A service registry provides us a single source of truth about where our applications are in the environment. Rather than hard-code service locations, our applications can query the service registry through an API and automatically reconfigure themselves when there is a change in our environment. Rancher provides service discovery out of the box using the Rancher DNS and metadata services (there is a good write-up on the Rancher blog on service discovery here). However, having a mix of Docker and non-Docker applications, we couldn’t rely purely on Rancher to handle service discovery. We needed an independent tool to track the locations of all our services, and consul fit that bill. We won’t detail how to setup Consul in your environment, however, we’ll briefly describe the way we use Consul at ABC Inc. In each environment, we have a Consul cluster deployed as containers. On each host in the environment, we deploy a Consul agent, and if the host is running Docker, we also deploy a registrator container. Registrator monitors the Docker events API for each daemon and automatically updates Consul during lifecycle events. For example, after a new container is deployed, registrator automatically registers the service in Consul. When the container is removed, registrator deregisters it. Consul Service Listing Having all of our services registered in Consul, we can run consul-template in our loadbalancer to dynamically populate a list of upstreams based on the service data stored in Consul. For our NGINX loadbalancer, we can create a template for populating the backends for the ‘java-service-1’ application:

# upstreams.conf
upstream java-service-1 {
{{range _, $element := service "java-service-1"}}
       server {{.Address}}:{{.Port}};
{{else}}
       server 127.0.0.1:65535; # force a 502{{end}} }

This template looks for a list of services registered in Consul as ‘java-service-1’. It will then loop through that list adding a service line with the IP address and port of that particular application instance. If there aren’t any ‘java-service-1’ applications registered in Consul, we default to throwing a 502 to avoid an error in NGINX. We can run consul-template in daemon mode, causing it to monitor Consul for changes, re-render the template when a change occurs, and then reload NGINX to apply the new configuration.

TEMPLATE_FILE=/etc/nginx/upstreams.conf.tmpl
RELOAD_CMD=/usr/sbin/nginx -s reload
consul-template -consul consul.stage.abc.net:8500 \
       -template "${TEMPLATE_FILE}:${TEMPLATE_FILE//.tmpl/}:${RELOAD_CMD}"

With our loadbalancer setup to dynamically change as the rest of the environment changes, we can fully rely on the Rancher scheduler to make the complex decisions about where our services should run. However, our ‘java-service-1’ application binds TCP port 8080 on the Docker host and if more than one of the application containers were to be scheduled on the same host, it would result in a port binding conflict and ultimately fail. To avoid this situation, we can manipulate the scheduler by way of scheduling rules. Rancher gives us a way to manipulate the scheduler by imposing conditionals using container labels in our docker-compose.yml file. Conditionals can include affinity rules, negation, and even \“soft\” enforcement (meaning avoid if possible). In our case with the ‘java-service-1’ application, we know only one can run on a host at a given time, so we can set an anti-affinity rule based on the container name. This will cause the scheduler to look for a Docker host that isn’t running a container with the name ‘java-service-1’. Our docker-compose.yml file then looks like the following:

java-service-1:
   image: registry.abc.net/java-service-1:${VERSION}
   container_name: java-service-1
   ports:
         - 8080:8080
    labels:
         io.rancher.scheduler.affinity:container_label_ne: io.rancher.stack_service.name=java-service-1

Notice the introduction of the \“labels\” key. All scheduling rules are added as labels. Labels can be added to Docker hosts and containers. When we register our hosts in Rancher, we have the ability to associate labels with them, which we can later key off of for scheduling deployments. For example, if we had a set of Docker hosts that were storage-optimized with SSD drives, we could add the host label storage=ssd. Rancher Host Labels Containers needing to take advantage of the optimized storage hosts can then add a label to force the scheduler to only deploy them on hosts that match. We’ll update our ‘java-service-1’ application to only deploy on the storage optimized hosts:

java-service-1:
    image: registry.abc.net/java-service-1:${VERSION}
    container_name: java-service-1
    ports:
         - 8080:8080
    labels:
         io.rancher.scheduler.affinity:container_label_ne: io.rancher.stack_service.name=java-service-1
         io.rancher.scheduler.affinity:host_label: storage=ssd

Using labels, we can finely tune where our applications are deployed, allowing us to think in terms of desired capacity rather than individual hosts running a specific container set. Labels also give you the ability to switch to Rancher for all cluster scheduling even if you still have applications that must be run on specific hosts. Lastly, we can optimize our service upgrades by utilizing Rancher’s rollback capability. In our deployment workflow, a service is deployed by calling rancher-compose which instructs Rancher to perform an upgrade on that service stack. The upgrade process roughly looks like the following:

Upgrade starts by pulling a new image for the service
One by one, existing containers are stopped and new containers are started
The upgrade is complete when the deployer logs into the UI and selects \“Finish Upgrade\”
The old, stopped service containers are removed

Rancher Upgrade This workflow is alright when there are very few deployments taking place for a given service. However, when a service is in the \“upgraded\” state (before the deployer selects \“Finish Upgrade\“), any new upgrades to the same service will be blocked until \“Finish Upgrade\” or \“Rollback\” is selected. The rancher-compose utility gives us the option to programmatically select which action to perform instead of requiring action on behalf of the deployer. For example, if you have automated testing of your services, you can call such tests after the rancher-compose upgrade returns. Depending on the status of those tests, rancher-compose can be called again, this time telling the stack to either \“Finish Upgrade\” or \“Rollback.\” A primitive example with our deployment Jenkins job could look the following:

# for the full job, see part 3 of this series
/usr/local/bin/rancher-compose --verbose \
    -f ${docker_dir}/docker-compose.yml \
    -r ${docker_dir}/rancher-compose.yml \
    up -d --upgrade
JAVA_SERVICE_1_URL=http://java-service-1.stage.abc.net:8080/api/v1/status
if curl -s ${JAVA_SERVICE_1_URL} | grep -q "OK"; then

# looks good, confirm or "finish" the upgrade
    /usr/local/bin/rancher-compose --verbose \
         -f ${docker_dir}/docker-compose.yml \
         -r ${docker_dir}/rancher-compose.yml \
         up --confirm-upgrade
else

     # looks like there's an error, rollback the containers
     # to the previously deployed version
     /usr/local/bin/rancher-compose --verbose \
        -f ${docker_dir}/docker-compose.yml \
        -r ${docker_dir}/rancher-compose.yml \
        up --rollback
fi

This logic will call our application endpoint to perform a simple status check. If \“OK\” is in the output, then we finish the upgrade, otherwise we need to rollback to the previously deployed version. If you do not have automated testing, another option is to simple always finish or \“confirm\” the upgrade.

# for the full job, see part 3 of this series
/usr/local/bin/rancher-compose --verbose \
    -f ${docker_dir}/docker-compose.yml \
    -r ${docker_dir}/rancher-compose.yml \
    up -d --upgrade --confirm-upgrade

If later down the road, you determine that a rollback is necessary, then simply redeploy the previous version using the same deployment job. This is not quite as friendly as the Rancher upgrade and rollback capabilities, but it unblocks future upgrades by not leaving the stack in the \“Upgraded\” state. When a service is rolled back in Rancher, the containers are redeployed at the previous version. This can have unintended consequences when deploying services with generic tags like ‘latest’ or ‘master’. For example, let’s assume the ‘java-service-1’ application was previously deployed with the tag ‘latest’. A change is made to the image, pushed to the registry and the Docker tag ‘latest’ is updated to point to this new image. We proceed with an upgrade, using the tag ‘latest’, and after testing it is decided the application needs to be rolled back. Rolling the stack with Rancher would still redeploy the newest image, because the tag ‘latest’ hasn’t been updated to point to the previous image. The rollback may be successful in purely technical terms, but the intended effect to deploy the last known working copy is missed entirely. We avoid this at ABC Inc. by always using specific tags that correlate with the version of the application. So instead of deploying our ‘java-service-1’ application using the tag ‘latest’, we can use the version tag ‘1.0.1-22-7e56158’. This guarantees that rollbacks will always point to the last working deployment of our application in an environment. We hope sharing our experience t ABC Inc. has been helpful. It was helpful for us to take a methodical journey to adopt Docker, steadily improving our processes, and allowing our team to get comfortable with the concepts. Making incremental changes towards a more automated deployment workflow allows for the organization to realize benefits in automation sooner and deployment teams to make more pragmatic decisions about what they need in a pipeline. Our journey led us to implementing Rancher, which proved to be one of the biggest wins for visibility, automation, and even team collaboration. We hope that sharing these lessons learned from our Docker-adoption process will help you in your own process of adoption. We wish you luck on your journey! All four parts of the series are now live, you can find them here: Part 1: Getting started with CI/CD and Docker Part 2: Moving to Compose blueprints Part 3: Adding Rancher for Orchestration Part 4: Completing the Cycle with Service Discovery Please also download your free copy of \”Continuous Integration and Deployment with Docker and Rancher\” a detailed eBook that walks through leveraging containers throughout your CI/CD process. John Patterson (@cantrobot) and Chris Lunsford run This End Out, an operations and infrastructure services company. You can find them online at* https://www.thisendout.com *and follow them on twitter @thisendout. **