Provisioning stateful apps in Kubernetes

Tomáš Sapák
4 min readFeb 6, 2022

“Wait a minute Tom. Stateful? Are you serious? My app is storing lots of data to the disk, is complicated to provision, can’t be dockerized, have no time ….”

If you’ve ever tried to convince an old-timer to start deploying to Kubernetes, I bet you’ve heard some of the excuses above. To be honest, Kubernetes was initially focused on stateless applications but adopted many features over time, making it not only good but great for stateful services today.

Deploying dockerized stateful application usually consist of these steps (ideally done by some configuration management tool like Ansible):

  1. Copy secrets and configuration files to the target host.
  2. Start the dockerized service and map the local disk to a docker container to get persistent storage.
  3. Provision initial resources (LDAP accounts, content of the database, initial data in the application).

Let’s have a look at how we can handle it in Kubernetes. Since the first step is the same for stateless applications, we can skip it.

Persistent storage in Kubernetes

When you need to store data in the container persistently to the disk, the first thing you probably try is mounting the local folder from the host the same way you would do with Docker. Although it is possible with hostPath type, there are several reasons to avoid it:

  • the documentation¹ does not recommend it
  • you will need to deal with planning your pods to run on a specific node in the cluster
  • you will lose the ability to automatically failover your pod to a different node in the cluster
  • you will lose your data in case the hosting node goes down

Even though explaining to your boss why all the company data are down the toilet sounds like fun Monday morning, you will probably need something better.

Luckily Kubernetes integrates with many storage classes² that mount remote persistent storage to the pod, making the pod independent of the running node. Usually best choice is to use a platform-native solution (gcePersistentDisk for GCP, azureDisk for Azure …). If you deploy Kubernetes locally, you can leverage the open-source project Longhorn³, which creates highly available persistent storage for you on your Kubernetes cluster. Opposite to deploying on local storage directly, you can easily replace Longhorn with public cloud-native storage when deploying your application elsewhere by replacing the driver in the configuration.

Provisioning initial resources

Even a properly configured application might not be usable after the first start. Imagine pointing your devs to an empty Git server or giving them an empty database server without any accounts. You can expect a crowd of angry developers with pitchforks at your doors the next day.

Luckily, as usual, Kubernetes has a few tricks up its sleeves.

Container Lifecycle Hooks⁴

Mechanism to run a code during a specific container lifecycle period:

  • after container start — PostStart
  • before container stops — PreStop

We can utilize the PostStart hook to run either shell commands or execute HTTP requests. In the case of the shell executor, all commands run inside the started container immediately after a container is created. The situation might get tricky if a hook does some conflicting actions with the entrypoint script, which might still be running during the execution of the PostStart hook.

Chart Hooks⁵

Chart Hooks are not a feature of Kubernetes, but its package manager Helm. Helm has nine hooks, but the most relevant for provisioning applications is the post-install hook, which runs immediately after your Helm Chart is installed. Helm hooks are not associated with containers like Container Lifecycle Hooks, but with Kubernetes resources, allowing you to create Kubernetes objects during specific Helm release lifecycle period (in this case, during Chart installation). There are some advantages and some limitations compared to Container Lifecycle Hooks:

  • the hook is executed just once after Chart installation — you don’t need to check if the resources are already deployed because this hook will run only once
  • you can run the container from a different image containing tools that your application image might not have (curl to access API, psql client to connect database …)

Which one is better?

It depends on the case. If you need to run some actions every time you start a new container, you have to use Container Lifecycle Hooks. For the rest, Chart hooks are a better option:

  • you don’t need to deal with conflicts with already existing resources
  • you can use different docker images containing tools you need for provisioning
  • code is executed just once upon installation, making services available quicker during container restarts

Conclusion

This article demonstrated Kubernetes tools for storing persistent data and bootstrapping stateful applications. If you still have a question, if your application is too complicated to completely get rid of your favorite configuration management tool and leave everything on Kubernetes with Helm, I recommend you to check openstack-helm project⁶. Openstack-helm deploys OpenStack cloud platform on top Kubernetes solely via Helm. And there are not many as complicated beasts as OpenStack is.

--

--

Tomáš Sapák

DevOps engineer, automation, and orchestration enthusiast. Love working with AWS and OpenStack.