Discover how containerisation with orchestrators like Kubernetes handles persistent storage and data protection. Plus what key vendors offer to help with enterprise-grade stateful workloads
Containerisation offers the promise of rapid application development and deployment. But it was not originally a technology that played well with persistent storage and ways of protecting data that enterprise users need. Those challenges have largely been solved now and containerisation – often but not limited to the Kubernetes orchestration environment – is in use in most large organisations, and some use it exclusively for cloud-native applications.
Here we look at containerisation, its challenges in storage and data protection, how we get persistent storage and enterprise data protection for containers, and what container management tools are offered by the key storage players.
What is containerisation?
Containerisation is a form of application virtualisation that has become central to cloud-native development. Unlike traditional server virtualisation (for example via hypervisors such as VMware ESXi or Nutanix AHV) which creates virtual machines atop hardware via a hypervisor layer, containers strip away that layer and instead leverage the host operating system directly. A container packages an application together with its dependencies, libraries and runtime environment, enabling rapid creation, scaling, cloning and termination.
Containers are “lighter” than virtual machines since they avoid the overhead of full OS stacks for each instance. They are highly portable across on-premises, cloud and hybrid environments. They are naturally suited to workloads exhibiting massive demand spikes (for example web services) because of their fast spin-up and tear-down characteristics. Moreover, containers map very well to microservices architecture: small, discrete components of application logic interact via APIs instead of large, monolithic applications. This paradigm fits neatly with DevOps methodologies, enabling continuous integration/continuous delivery (CI/CD), frequent updates and agile services.
How does Kubernetes fit into containerisation?
Kubernetes (K8s) is a leading container orchestration platform. Although it is not the only option (others include Apache Mesos, Docker Swarm, HashiCorp Nomad and cloud-native services such as AWS ECS or Azure Kubernetes Service), Kubernetes holds the overwhelming market share—some estimates indicate a 97 %+ role in orchestration.
As a container orchestrator, Kubernetes handles creation, deployment, management, automation, scaling, load balancing and the relationships between containers and underlying infrastructure (including compute, network and storage). In Kubernetes terms, a pod is a group of one or more containers that share resources and networking. Kubernetes thereby becomes the engine managing containers in nested structures, orchestrating real-world applications.
How is Kubernetes organised?
Understanding Kubernetes requires knowing its core abstractions: containers, pods, nodes, clusters and control-plane components.
- A container houses the application runtime plus dependencies. Containers are stateless by default—they should not assume they retain prior data between restarts. This strong portability is a benefit but also introduces a challenge around stateful workloads.
- A pod encapsulates one or more containers on a node; containers within a pod share local networking and storage resources.
- A node is either a physical server or virtual machine that hosts pods. Nodes are marked as worker (which run application workloads) or master (which run the control plane).
- The control plane (on master nodes) includes components such as the API server (which is the external interface for the cluster), the scheduler (which decides what pods run on which nodes), the controller-manager (which enforces desired state, e.g., number of replicas) and etcd (a key-value store holding cluster state).
- Worker nodes host components such as “kubelet” (agent that communicates with master), “kube-proxy” (for networking) and the container runtime (which actually runs containers).
Clusters group one or more nodes into a unified environment, enabling workloads to be scheduled, managed and scaled across infrastructure.
What is the challenge with storage in Kubernetes?
While Kubernetes excels at stateless workloads, many enterprise applications require persistent storage—data that lives beyond the lifetime of an individual container/pod. Out-of-the-box Kubernetes volumes are typically ephemeral: data stored inside a container or pod is lost when the pod is terminated or rescheduled. This ephemeral nature works for stateless web services, but not for databases, analytics, or stateful systems.
The challenge, then, is how to provide storage that is persistent, meaning durable across pod lifecycles, portable across nodes, and integrated with Kubernetes abstractions such as pods and containers. In practical terms, enterprise installations demand block, file and object-based storage for persistent application state, and structured methods to manage this.
How does Kubernetes provide persistent storage?
Kubernetes deals with persistent storage via objects called PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs), decoupling application code from underlying infrastructure.
- A PV is a resource in the cluster representing real storage: capacity, performance class, volume plugin, access modes, paths, etc.
- A PVC is an application-centric claim for storage (size, performance, access mode) and is bound to an available PV.
- Storage classes define categories of storage (e.g., “fast SSD”, “archive”, “cloud disk”) and can map to underlying plugins or arrays. A default storage class may be defined so that pods automatically obtain storage if not explicitly specified.
- To support pods that require persistent storage via PVCs, Kubernetes uses the abstraction of stateful workloads (for example via a StatefulSet) that preserves identity and storage across rescheduling.
Together, the PV/PVC/storage class model allows Kubernetes to allocate and manage persistent storage without hard-coding infrastructure details in application definitions.
What is the Container Storage Interface (CSI)?
The Container Storage Interface (CSI) is an API specification that allows third-party storage systems to integrate with Kubernetes (and other container orchestration systems) as persistent storage providers.
Before CSI, storage plugins were “in-tree” (i.e., part of Kubernetes codebase), making vendor development slow and tightly coupled to the Kubernetes release cycle. CSI shifted this by allowing storage providers to ship independent drivers (CSI drivers) that expose storage capabilities via standardized gRPC APIs.
With CSI, storage arrays and cloud providers can deliver drivers to support dynamic provisioning, mounting/unmounting, snapshots, expansions, cloning and more—all without modifying Kubernetes core code.
Today there are 130+ CSI drivers for file, block and object storage across hardware and cloud.
What do storage suppliers offer for Kubernetes storage and data protection?
Major storage vendors have embraced container-native storage by building management platforms, CSI drivers and data protection services tailored for Kubernetes. Below are several representative examples:
- Dell Technologies offers Container Storage Modules (CSMs) built on CSI drivers that enable automation and advanced data services—replication, observability, encryption, snapshots—directly from storage arrays.
- IBM, via its acquisition of Red Hat OpenShift, provides storage support via PVCs and CSI drivers across on-prem and cloud environments (AWS EBS, Azure Managed Disks, Google Persistent Disk, etc.).
- Hewlett Packard Enterprise (HPE) offers Ezmeral Runtime Enterprise which supports Kubernetes at scale across bare-metal, virtualised and cloud settings; its data fabric includes persistent container storage, HA, backup/restore and edge support.
- Hitachi Vantara introduces Hitachi Kubernetes Service (HKS) which manages container storage in hybrid and public-cloud environments via CSI drivers and integrates compute + storage under Kubernetes control.
- NetApp’s Astra platform includes Astra Control, Astra Trident (CSI for provisioning) and supports Kubernetes application lifecycle, data management, hybrid and multi-cloud workloads.
- Pure Storage offers Portworx, providing container-native storage, connectivity and performance configuration for Kubernetes clusters, supporting block/file/object, backup, DR, migration and scalability.
These examples illustrate how vendors are delivering turnkey storage stacks—with advanced data protection, automation and policy-based management—for Kubernetes deployments.
The takeaway
Containerisation and Kubernetes have revolutionised how applications are built and deployed. Yet stateful workloads – with persistent data, protection, scaling and mobility requirements – pose additional storage challenges.
The persistent-volume model in Kubernetes combined with the Container Storage Interface (CSI) creates the foundation to connect containerised applications with enterprise-grade storage systems.
Storage suppliers have responded by offering container-native platforms, CSI drivers and orchestrated data services that make it easier to deliver persistent, policy-driven storage in Kubernetes.
For organisations moving beyond stateless microservices, the right storage strategy is essential: pick the right class of storage, integrate PCI/PVC abstractions via CSI, and use vendor-platforms that support automation, protection and hybrid cloud operations.
Read more about data management
Data protection: Snapshots, replication and backups explained. Discover how snapshots, replication and backups work together to protect your data. Learn the benefits, limitations and best practices for a layered data protection strategy.
Backup: Don’t leave it to hope. Build a solid data protection strategy. Discover how modern backup strategies protect against ransomware, cover cloud and container environments, and ensure business continuity with RPO and RTO.