RancherVM Live Migration with Shared Storage


With the latest release of RancherVM, we’ve added the ability to schedule virtual machines (guests) to specific Kubernetes Nodes (hosts).

This declarative placement (in Kubernetes terms: required node affinity) can be modified at any time. For stopped VMs, no change will be observed until the VM starts. For running VMs, the VM will enter a migrating state. RancherVM will then migrate the running guest machine from old to new host. Upon completion, the VM returns to running state and the old host’s VM pod is deleted. Active NoVNC sessions will be disconnected for a few seconds before auto-reconnecting. Secure shell (SSH) sessions will not disconnect; a sub-second pause in communication may be observed.

Shared Storage

Migration of guest machines (live or offline) requires some form of shared storage. Since we make use of virtio-blk-pci para-virtualized I/O block device driver which writes virtual block devices as files to the host filesystem, NFS will work nicely.

Note: You are welcome to install RancherVM before configuring shared storage, but do not create any VM Instances yet. If you already created some instances, delete them before proceeding.

Install/Configure NFS server

Let’s walk through NFS server installation and configuration on an Ubuntu host. This can be a dedicated host or one of the Nodes in your RancherVM cluster.

Install the required package:

sudo apt-get install -y nfs-kernel-server

Create the directory that will be shared:

sudo mkdir -p /var/lib/rancher/vm-shared

Append the following line to /etc/exports:

/var/lib/rancher/vm-shared      *(rw,sync,no_subtree_check,no_root_squash)

This allows any host IP to mount the NFS share; if your machines are public facing, you may want to restrict * to an internal subnet such as 192.168.100.1/24 or add firewall rules.

The directory will now be exported during the boot sequence. To export the directory without rebooting, run the following command:

exportfs -a

From one of the RancherVM nodes, query for registered RPC programs. Replace <nfs_server_ip> with the (private) IP address of your NFS server:

rpcinfo -p <nfs_server_ip>

You should see program 100003 (NFS service) present, for example:

program vers proto   port  service
 100000    4   tcp    111  portmapper
 100000    3   tcp    111  portmapper
 100000    2   tcp    111  portmapper
 100000    4   udp    111  portmapper
 100000    3   udp    111  portmapper
 100000    2   udp    111  portmapper
 100005    1   udp  47321  mountd
 100005    1   tcp  33684  mountd
 100005    2   udp  47460  mountd
 100005    2   tcp  45270  mountd
 100005    3   udp  34689  mountd
 100005    3   tcp  51773  mountd
 100003    2   tcp   2049  nfs
 100003    3   tcp   2049  nfs
 100003    4   tcp   2049  nfs
 100227    2   tcp   2049
 100227    3   tcp   2049
 100003    2   udp   2049  nfs
 100003    3   udp   2049  nfs
 100003    4   udp   2049  nfs
 100227    2   udp   2049
 100227    3   udp   2049
 100021    1   udp  49239  nlockmgr
 100021    3   udp  49239  nlockmgr
 100021    4   udp  49239  nlockmgr
 100021    1   tcp  45624  nlockmgr
 100021    3   tcp  45624  nlockmgr
 100021    4   tcp  45624  nlockmgr

The NFS server is now ready to use. Next we’ll configure RancherVM nodes to mount the exported file system.

Install/Configure NFS clients

On each host participating as a RancherVM node, the following procedure should be followed. This includes the NFS server if the machine is also a node in the RancherVM cluster.

Install the required package:

sudo apt-get install -y nfs-common

Create the directory that will be mounted:

sudo mkdir -p /var/lib/rancher/vm

Be careful to use this exact path. Append the following line to /etc/fstab. Replace <nfs_server_ip> with the (private) IP address of your NFS server:

<nfs_server_ip>:/var/lib/rancher/vm-shared        /var/lib/rancher/vm     nfs     auto    0       0

The exported directory will now be mounted to /var/lib/rancher/vm during the boot sequence. To mount the directory without rebooting, run the following command:

mount -a

This should return quickly without output. Verify the mount succeeded by checking the mount table:

mount | grep /var/lib/rancher/vm

If an error occurred, refer to the rpcinfo command in the previous section, then check the firewall settings on both NFS server and client.

Let’s ensure we can read/write to the shared directory. On one client, touch a file:

touch /var/lib/rancher/vm/read-write-test

On another client, look for the file:

ls /var/lib/rancher/vm | grep read-write-test

If the file exists, you’re good to go.

Live Migration

Now that shared storage is configured, we are ready to create and migrate VM instances. Install RancherVM into your Kubernetes cluster if you haven’t already.

Usage

You will need at least two ready hosts with sufficient resources to run your instance.

Hosts

We create a Ubuntu Xenail server instance with 1 vCPU and 1GB RAM and explicitly assign it to node1.

Create Instance

After waiting a bit, our instance enters running state and is assigned an IP address.

Instance Running

Now, let’s trigger the live migration by clicking the dropdown under Node Name column. To the left is the requested node, to the right is the currently scheduled node.

Instance Node Dropdown

Our instance enters migrating state. This does not pause execution; the migration is mostly transparent to the end user.

Instance Migrating

Once migration completes, the instance returns to running state. The currently scheduled node now reflects node2 which matches the desired node.

Instance Migrated

That’s all there is to it. Migrating instances off of a node for maintenance or decommissioning is now a breeze.

How It Works

Live migration is a three step process:

  1. Start the new instance on the desired node and configure an incoming socket to expect memory pages from the old instance.
  2. Initiate the transfer of memory pages, in order, from the old to new instance. Changes in already transferred memory pages are tracked and sent after the current sequential pass completes. This process repeats until we have sufficient bandwidth to stream the final memory pages within a configurable expected time period (300ms by default).
  3. Stop the old instance, transfer the remaining memory pages and start the new instance. The migration is complete.

Moving Forward

We’ve covered manually configuring a shared filesystem and demonstrated the capability to live migrate guest virtual machines from one node to another. This brings us one step closer to achieving a fault tolerant, maintainable virtual machine cloud.

Next up, we plan to integrate RancherVM with Project Longhorn, a distributed block storage system that runs on Kubernetes. Longhorn brings performant, replicated block devices to the table and includes valuable features such as snapshotting. Stay tuned!

James Oliver
github
James Oliver
Tools and Automation Engineer
Prior to Rancher, James' first exposure to cluster management was writing frameworks on Apache Mesos predating the release of DC/OS. Self-proclaimed jack of all trades, James loves reverse engineering complex software solutions as well as building systems at scale. Proponent of FOSS, it is his personal goal to automate the complexities of creating, deploying, and maintaining scalable systems to empower hobbyists and corporations alike. James has a B.S. in Computer Engineering from University of Arizona.
快速开启您的Rancher之旅