关注微信公众号
第一手干货与资讯
加入官方微信群
获取免费技术支持
Like many of us in the Kubernetes space, I’m excited to check out the shiny new thing. To be fair, we’re all working with an amazing product that is younger than my pre-school aged daughter. The shiny new thing at KubeCon Europe was a new container runtime authored by Google named gVisor. Like a cat to catnip, I had to check this out and share it with you.
gVisor is a sandboxed container runtime, that acts as a user-space kernel. During KubeCon Google announced that they open-sourced it to the community. Its goal is to use paravirtualization to isolate containerized applications from the host system, without the heavy weight resource allocation that comes with virtual machines.
No. If you’re running production workloads, don’t even think about it! Right now, this is a metaphorical science experiment. That’s not to say you may not want to use it as it matures. I don’t have any problem with the way it’s trying to solve process isolation and I think it’s a good idea. There are also alternatives you should take the time to explore before adopting this technology in the future.
That being said, if you want to learn more about it, when you’ll want to use it, and the problems it seeks to solve, keep reading.
As an operator, you’ll want to use gVisor to isolate application containers that aren’t entirely trusted. This could be a new version of an open source project your organization has trusted in the past. It could be a new project your team has yet to completely vet or anything else you aren’t entirely sure can be trusted in your cluster. After all, if you’re running an open source project you didn’t write (all of us), your team certainly didn’t write it so it would be good security and good engineering to properly isolate and protect your environment in case there may be a yet unknown vulnerability.
Sandboxing is a software management strategy that enforces isolation between software running on a machine, the host operating system, and other software also running on the machine. The purpose is to constrain applications to specific parts of the host’s memory and file-system and not allow it to breakout and affect other parts of the operating system.
The virtual machine (VM) is a great way to isolate applications from the underlying hardware. An entire hardware stack is virtualized to protect applications and the host kernel from malicious applications.
As stated before, the problem is that VMs are heavy. The require set amounts of memory and disk space. If you’ve worked in enterprise IT, I’m sure you’ve noticed the resource waste.
Some projects are looking to solve this with lightweight OCI-compliant VM implementations. Projects like Kata containers are bringing this to the container space on top of runV, a hypervisor based runtime.
Microsoft is using a similar technique to isolate workloads using a very-lightweight Hyper-V virtual machine when using Windows Server Containers with Hyper-V isolation.
This feels like a best-of-both worlds approach to isolation. Time will tell. Most of the market is still running docker engine under the covers. I don’t see this changing any time soon. Open containers and container runtimes certainly will begin taking over a share of the market. As that happens, adopting multiple container runtimes will be an option for the enterprise.
gVisor intends to solve this problem. It acts as a kernel in between the containerized application and the host kernel. It does this through various mechanisms to support syscall limits, file system proxying, and network access. These mechanisms are a paravirtualization providing a virtual-machine like level of isolation, without the fixed resource cost of each virtual machine.
gVisor the runtime is a binary named runsc (run sandboxed container) and is an alternative to runc or runv if you’ve worked with kata containers in the past.
gVisor isn’t the only way to isolate your workloads and protect your infrastructure. Technologies like SELinux, seccomp and Apparmor solve a lot of these problems (as well as others). It would behoove you as an operator and an engineer to get well acquainted with these technologies. It’s a lot to learn. I’m certainly no expert, although I aspire to be. Don’t be a lazy engineer. Learn your tools, learn your OS, do right by your employer and your users. If you want to know more go read the man pages and follow Jessie Frazelle. She is an expert in this area of computing and has written a treasure trove on it.
As docker supports multiple runtimes, it will work with runsc. To use it one must build and install the runsc container runtime binary and configured docker’s /etc/docker/daemon.json file to support the gVisor runtime. From there a user may run a container with the runsc runtime by utilizing the –runtime flag of the docker run command.
docker run –runtime=runsc hello-world
Kubernetes support for gVisor is experimental and implemented via the CRI-O CRI implementation. CRI-O is an implementation of the Kubernetes Container Runtime Interface. Its goal is to allow Kubernetes to use any OCI compliant container runtime (such as runc and runsc). To use this one must install runsc on the Kubernetes , then configure cri-o to use runsc to run untrusted workloads in cri-o’s /etc/crio/crio.conf file. Once configured, any pod without the io.kubernetes.cri-o.TrustedSandbox annotation (or the annotation set to false), will be run with runsc. This would be as an alternative to using the Docker engine powering the containers inside Kubernetes pods.
It depends. Currently gVisor only supports single-container pods. Here is a list of known working applications that have been tested with gVisor.
Ultimately support for any given application will depend on whether the syscalls used by the application are supported.
Again, this depends. gVisor’s “Sentry” process is responsible for limting syscalls and requires a platform to implement context switching and memory mapping. Currently gVisor supports Ptrace and KVM, which implement these functions differently, are configured differently, and support different node configurations to operate effectively. Either would affect performance differently than the other.
The architecture of gVisor suggests it would be able to enable greater application density over VMM based configurations but may suffer higher performance penalties in sycall-rich applications.
A quick note about network access and performance. Network access is achieved via an L3 userland networking stack subproject called netstack. This functionality can be bypassed in favor of the host network to increase performance.
Rancher currently cannot be used to provision CRI-O backed Kubernetes clusters as it relies heavily on the docker engine. However, you certainly manage CRI-O backed clusters with Rancher. Rancher will manage any Kubernetes server as we leverage the Kubernetes API and our components are Kubernetes Custom Resources.
We’ll continue to monitor gVisor as it matures. As such, we’ll add more support for gVisor with Rancher as need arises. Like the evolution of Windows Server Containers in Kubernetes, soon this project will become part of the fabric of Kubernetes in the Enterprise.