Deployment vs. StatefulSet for stateful applications

3 min readMay 16, 2022

Our application consists of multiple stateless microservices and a set of legacy applications used mostly as various kinds of data storage. Not reading the documentation properly, I’ve automatically expected to use Deployment for stateless services and StatefulSet for stateful applications. And boy, was I wrong.

Deployment vs. StatefulSet

Before diving in, let’s have a look at the main differences:

Deployment

all replicas are interchangeable — all pods has random DNS names and are unable to hold unique data on persistent storage
common persistent storage for all replicas — all replicated pods use the same PVC and the same volume
PVC needs to be created for the deployment
load-balancing service is necessary to access pods
implemented with ReplicaSet — when upgrading, a new ReplicaSet is created, and pods are scaled up/down in new/old ReplicaSet based on the selected strategy. Rollback is supported by switching to old ReplicaSet

StatefulSet

all replicas have specific name — {StatefulSet name}-index
unique persistent storage for each replica
PVC is auto-created for each replica but is not autodeleted (well, this feature is alpha in Kubernetes 1.23)
headless service is necessary to create a stable DNS name for each pod
As opposed to the Deployment, the StatefulSet creates pods directly. Due to this issue¹ automatic rollback in case of failed upgrade is not possible.
upgrades/terminations are done sequentially from the pod with the biggest index number to the pod with index number 0

Deployment use cases

Stateless services

This one is obvious and is here just for completeness. I don’t believe there is a use case for stateless applications using StatefulSet.

Stateful services with a single replica

There are multiple reasons why you would run a single replica pod:

you don’t run a production-grade workload
the application can’t be replicated (almost any legacy app)
the application can be replicated, but replication causes more outages than a single replica scenario (I remember some fun outages with Shibboleth and Teracotta back in the days)

You won’t gain anything useful from the StatefulSet controller with single replica workloads. Deployment has better handling of upgrades with the usage of ReplicaSet and actually working rollback. Just remember to switch deployment strategy to recreate. All pods in the Deployment share the same storage, and you probably don’t want two pods to access the same storage simultaneously, even with a non-production PostgreSQL instance.

StatefulSet use case

The last man standing, Stateful services with multiple replicas, is that one case where the StatefulSet shines. A typical use case is a legacy application that supports clustering, where each replica needs to hold unique data. Good examples are:

SQL databases, continuously making all replicas the same
The storage systems like Elasticsearch or CEPH, spreading copies of data across the cluster, making each node unique.

These applications additionally need a unique node identity provided via headless service and lifecycle events done in a strict sequence (upgrades, nodes addition/deletion).

Conclusion

Choosing between Deployment and StatefulSet is pretty straightforward. If you have any doubts, check those two schemas above, and you should have a clear picture even without reading the rest of the article.

PS: Thanks to https://medium.com/@andy1609 for recommending https://github.com/mingrammer/diagrams to me. Both schemas from this article were coded easily like this:

[1] https://github.com/kubernetes/kubernetes/issues/67250