关注微信公众号
第一手干货与资讯
加入官方微信群
获取免费技术支持
Last time we talked about PV, PVC, Storage Class and Provisioner.
To quickly recap:
Originally PV was designed to be a piece of storage pre-allocated by administrator. Though after the introduction of Storage Class and Provisioner, users are able to dynamically provision PVs now.
PVC is a request for a PV. When used with Storage Class, it will trigger the dynamic provisioning of a matching PV.
PV and PVC are always one to one mapping.
Provisioner is a plugin used to provision PV for users. It helps to remove the administrator from the critical path of creating a workload that needs persistent storage.
Storage Class is a classification of PVs. The PV in the same Storage Class can share some properties. In most cases, while being used with a Provisioner, it can be seen as the Provisioner with predefined properties. So when users request it, it can dynamically provision PVs with those predefined properties.
But those are not the only ways to use persistent storage in Kubernetes.
From overlay networking and SSL to ingress controllers and network security policies, we've seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.
In the previous article, I mentioned that there is also a concept of Volume in Kubernetes. In order to differentiate Volume from Persistent Volume, people sometimes call it In-line Volume, or Ephemeral Volume.
Volume
Persistent Volume
In-line Volume
Ephemeral Volume
Let me quote the definition of Volume here:
A Kubernetes volume … has an explicit lifetime - the same as the Pod that encloses it. Consequently, a volume outlives any Containers that run within the Pod, and data is preserved across Container restarts. Of course, when a Pod ceases to exist, the volume will cease to exist, too. Perhaps more importantly than this, Kubernetes supports many types of volumes, and a Pod can use any number of them simultaneously. At its core, a volume is just a directory, possibly with some data in it, which is accessible to the Containers in a Pod. How that directory comes to be, the medium that backs it, and the contents of it are determined by the particular volume type used.
A Kubernetes volume … has an explicit lifetime - the same as the Pod that encloses it. Consequently, a volume outlives any Containers that run within the Pod, and data is preserved across Container restarts. Of course, when a Pod ceases to exist, the volume will cease to exist, too. Perhaps more importantly than this, Kubernetes supports many types of volumes, and a Pod can use any number of them simultaneously.
At its core, a volume is just a directory, possibly with some data in it, which is accessible to the Containers in a Pod. How that directory comes to be, the medium that backs it, and the contents of it are determined by the particular volume type used.
One important property of Volume is that it has the same lifecycle as the Pod it belongs to. It will be gone if the Pod is gone. That’s different from Persistent Volume, which will continue to exist in the system until users delete it. Volume can also be used to share data between containers inside the same Pod, but this isn’t the primary use case, since users normally only have one container per Pod.
So it’s easier to treat Volume as a property of Pod, instead of as a standalone object. As the definition said, it represents a directory inside the pod, and Volume type defines what’s in the directory. For example, Config Map Volume type will create configuration files from the API server in the Volume directory; PVC Volume type will mount the filesystem from the corresponding PV in the directory, etc. In fact, Volume is almost the only way to use storage natively inside Pod.
It’s easy to get confused between Volume, Persistent Volume and Persistent Volume Claim. So if you can imagine that there is a data flow, it will look like this: PV -> PVC -> Volume. PV contains the real data, bound to PVC, which used as Volume in Pod in the end.
However, Volume is also confusing in the sense that besides PVC, it can be backed by pretty much any type of storage supported by Kubernetes directly.
Remember we already have Persistent Volume, which supports different kinds of storage solutions. We also have Provisioner, which supports the similar (but not exactly the same) set of solutions. And we have different types of Volume as well.
So, how are they different? And how to choose between them?
Take AWS EBS for example. Let’s start counting the ways of persisting data in Kubernetes.
awsElasticBlockStore is a Volume type.
awsElasticBlockStore
You can create a Pod, specify a volume as awsElasticBlockStore, specify the volumeID, then use your existing EBS volume in the Pod.
The EBS volume must exist before you use it with Volume directly.
AWSElasticBlockStore is also a PV type.
AWSElasticBlockStore
So you can create a PV that represents an EBS volume (assuming you have the privilege to do that), then create a PVC bound to it. Finally, use it in your Pod by specifying the PVC as a volume.
Similar to Volume Way, EBS volume must exist before you create the PV.
kubernetes.io/aws-ebs is also a Kubernetes built-in Provisioner for EBS.
kubernetes.io/aws-ebs
You can create a Storage Class with Provisioner kubernetes.io/aws-ebs, then create a PVC using the Storage Class. Kubernetes will automatically create the matching PV for you. Then you can use it in your Pod by specifying the PVC as a volume.
In this case, you don’t need to create EBS volume before you use it. The EBS Provisioner will create it for you.
All the options listed above are the built-in options of Kubernetes. There are also some third-party implementations of EBS in the format of Flexvolume driver, to help you hook it up to Kubernetes if you’re not yet satisfied by any options above.
And there are CSI drivers for the same purpose if Flexvolume doesn’t work for you. (Why? More on this later.)
If you’re using StatefulSet, congratulations! You now have one more way to use EBS volume with your workload – VolumeClaimTemplate.
VolumeClaimTemplate is a StatefulSet spec property. It provides a way to create matching PVs and PVCs for the Pod that Statefulset created. Those PVCs will be created using Storage Class so they can be created automatically when StatefulSet is scaling up. When a StatefulSet has been scaled down, the extra PVs/PVCs will be kept in the system. So when the StatefulSet scales up again, they will be used again for the new Pods created by Kubernetes. We will talk more on StatefulSet later.
As an example, let’s say you created a StatefulSet named www with replica 3, and a VolumeClaimTemplate named data with it. Kubernetes will create 3 Pods, named www-0, www-1, www-2 accordingly. Kubernetes will also create PVC www-data-0 for Pod www-0, www-data-1 for www-1, and www-data-2 for www-2. If you scale the StatefulSet to 5, Kubernetes will create www-3, www-data-3, www-4 and www-data-4 accordingly. Then you scale the StatefulSet down to 1, all www-1 to www-4 will be deleted, but www-data-1 to www-data-4 will remain in the system. So when you decide to scale up to 5 again, Pod www-1 to www-4 will be created, and PVC www-data-1 will still serve Pod www-1, www-data-2 for www-2, etc. That’s because the identity of Pod are stable in StatefulSet. The name and relationship are predictable when using StatefulSet.
www
data
www-0
www-1
www-2
www-data-0
www-data-1
www-data-2
www-3
www-data-3
www-4
www-data-4
VolumeClaimTemplate is important for the block storage solutions like EBS and Longhorn. Because those solutions are inherently ReadWriteOnce, you cannot share it between the Pods. Deployment won’t work well with them if you have more than one Pod running with persistent data. So VolumeClaimTemplate provides a way for the block storage solution to scale horizontally for a Kubernetes workload.
As you see, there are built-in Volume types, PV types, Provisioner types, plus external plugins using Flexvolume and/or CSI. The most confusing part is that they just provide largely the same but also slightly different functionality.
I thought, at least, there should be a guideline somewhere on how to choose between them.
But I cannot find it anywhere.
So I’ve plowed through codes and documents, to bring you the comparison matrix, and the guideline that makes the most sense to me. Comparison of Volume, Persistent Volume and Provisioner
Here I only covered the in-tree support from Kubernetes. There are some official out-of-tree Provisioners you can use as well.
As you see here, Volume, Persistent Volume and Provisioner are different in some nuanced ways.
* A side note about EmptyDir with PV:
Back in 2015, there was an issue raised by Clayton Coleman to support EmptyDir with PV. It can be very helpful for the workloads needing persistent storage but only have local volumes available. But it didn’t get much traction. Without scheduler supports, it was too hard to do it at the time. Now, in 2018, scheduler and PV node affinity support have been added for Local Volume in Kubernetes v1.11. But there is still no EmptyDir PV. And Local Volume feature is not exactly what I expected since it doesn’t have the ability to create new volumes with new directories on the node. So I’ve written Local Path Provisioner, which utilized the scheduler and PV node affinity changes, to dynamically provision Host Path type PV for the workload.
So which way should users choose?
In my opinion, users should stick to one principle:
Choose Provisioner over Persistent Volume, Persistent Volume over Volume when possible.
To elaborate:
The rationale behind this guideline is simple. While operating inside Kubernetes, an object (PV) is easier to manage than a property (Volume), and creating PV automatically (Provisioner) is much easier than creating it manually.
There is an exception: if you prefer to operate storages outside of Kubernetes, it’s better to stick with Volume. Though in this way, you will need to do creation/deletion using another set of API. Also, you will lose the ability to scale storage automatically with StatefulSet due to the lack of VolumeClaimTemplate. I don’t think it will be the choice for most Kubernetes users.
This question was one of the first things that came to my mind when I started working with Kubernetes storage. The lack of consistent and intuitive design makes Kubernetes storage look like an afterthought. I’ve tried to research the history behind those design decisions, but it’s hard to find anything before 2016.
In the end, I tend to believe those are due to a few initial design decision made very early, which may be combined with the urgent need for vendor support, resulting in Volume gets way more responsibility than it should have. In my opinion, all those built-in volume plugins duplicated with PV shouldn’t be there.
While researching the history, I realized dynamic provisioning was already an alpha feature in Kubernetes v1.2 release in early 2016. It took two release cycles to become beta, another two to become stable, which is very reasonable.
There is also a huge ongoing effort by SIG Storage (which drives Kubernetes storage development) to move Volume plugins to out of tree using Provisioner and CSI. I think it will be a big step towards a more consistent and less complex system.
Unfortunately, I don’t think different Volume types will go away. It’s kinda like the flipside of Silicon Valley’s unofficial motto: move fast and break things. Sometimes, it’s just too hard to fix the legacy design left by a fast-moving project. We can only live with them, work around them cautiously, and don’t herald them in a wrong way.
We will talk about the mechanism to extend Kubernetes storage system in the next part of the series, namely Flexvolume and CSI. A hint: as you may have noticed already, I am not a fan of Flexvolume. And it’s not storage subsystem’s fault.
[To be continued]
[You can join the discussion here]