Storage
Understanding Kubernetes storage concepts, volumes, and persistence
Kubernetes Storage
Kubernetes provides various storage options to manage data persistence for containers and pods. Storage management in Kubernetes is designed to abstract the underlying storage infrastructure, allowing applications to consume storage resources without being tightly coupled to the specific storage technology.
The Kubernetes storage architecture is built around the concept of volumes, which provide a way for containers to access and persist data beyond their lifecycle. This architecture addresses key challenges in containerized environments, including data persistence, sharing data between containers, and managing the lifecycle of storage resources.
Storage Concepts
Volumes
- Basic storage abstraction in Kubernetes
- Pod-level storage that exists for the pod's lifetime
- Multiple types available (emptyDir, hostPath, configMap, secret, etc.)
- Lifecycle tied to pod (created when pod is created, deleted when pod is deleted)
- Can be shared between containers in the same pod
- Supports various backend storage systems
- Used for both ephemeral and persistent storage needs
- Defined in the pod specification
Persistent Volumes (PV)
- Cluster-level storage resource
- Independent of pods and their lifecycle
- Admin provisioned or dynamically created via StorageClass
- Reusable resources that can be claimed and reclaimed
- Defined by capacity, access modes, storage class, reclaim policy
- Represents actual physical storage in the infrastructure
- Supports different volume plugins (AWS EBS, Azure Disk, NFS, etc.)
- Can exist before and after pods that use them
Persistent Volume Claims (PVC)
- Storage requests made by users
- User abstraction that hides storage implementation details
- Binds to PV that meets the requirements
- Pod storage interface that pods use to request storage
- Defined by storage capacity, access modes, and storage class
- Acts as an intermediary between pods and PVs
- Can request specific storage class or amount of storage
- Enables separation of concerns between users and administrators
Volume Types
EmptyDir
An empty directory that's created when a Pod is assigned to a node and exists as long as the Pod runs on that node. When a Pod is removed, the data in the emptyDir is deleted permanently.
Use cases for emptyDir:
- Scratch space for temporary files
- Checkpoint data for long computations
- Sharing files between containers in a pod
- Cache space for application data
- Holding data processed by one container and used by another
HostPath
Mounts a file or directory from the host node's filesystem into your Pod. This type of volume presents significant security risks and should be used with caution.
Important considerations for hostPath:
- Data is not portable between nodes
- Pods using the same path on different nodes will see different data
- Security risk due to potential access to host filesystem
- Often used for system-level pods that need access to host resources
- Not suitable for most applications; prefer PVs for persistent data
Persistent Volumes
PV Definition
PVC Definition
Using PVC in Pod
PV and PVC Lifecycle
- Provisioning: Static (admin creates PVs) or Dynamic (via StorageClass)
- Binding: PVC is bound to a suitable PV
- Using: Pod uses the PVC as a volume
- Reclaiming: When PVC is deleted, PV is reclaimed according to policy
- Retain: PV and data are kept (admin must clean up manually)
- Delete: PV and associated storage are deleted
- Recycle: Basic scrub (rm -rf) before reuse (deprecated)
Storage Classes
StorageClasses enable dynamic provisioning of Persistent Volumes. They abstract the underlying storage provider details and allow administrators to define different "classes" of storage with varying performance characteristics, reclaim policies, and other parameters.
Different provisioner examples:
Access Modes
Available access modes:
- ReadWriteOnce (RWO)
- Volume can be mounted as read-write by a single node
- Multiple pods on the same node can use the volume
- Most common for block storage like AWS EBS, Azure Disk, GCE PD
- Example use case: Database storage where only one instance needs access
- ReadOnlyMany (ROX)
- Volume can be mounted as read-only by many nodes simultaneously
- Multiple pods across different nodes can read from the volume
- Useful for shared configuration or static content
- Supported by NFS, CephFS, some cloud providers
- Example use case: Configuration data or static website content
- ReadWriteMany (RWX)
- Volume can be mounted as read-write by many nodes simultaneously
- Multiple pods across different nodes can read/write to the volume
- Less commonly supported; typically available with NFS, CephFS, GlusterFS
- Not supported by most cloud block storage (EBS, Azure Disk)
- Example use case: Shared media storage or development environments
- ReadWriteOncePod (RWOP) (Kubernetes 1.22+)
- Volume can be mounted as read-write by only one pod
- Stricter than RWO which allows multiple pods on same node
- Ensures exclusive access for a single pod
- Example use case: Critical workloads requiring exclusive access
Volume Snapshots
Volume snapshots provide a way to create point-in-time copies of persistent volumes. This feature is particularly useful for backup, disaster recovery, and creating copies of data for testing or development environments.
Volume Snapshot Class
Restoring from a Snapshot
Snapshot operations require:
- The VolumeSnapshot CRD (CustomResourceDefinition)
- The snapshot controller
- A CSI driver that supports snapshots
- VolumeSnapshotClass configuration
Dynamic Provisioning
Storage Class
- Defines storage type and characteristics
- Enables automatic provisioning of storage resources
- Supports different storage providers (AWS, GCP, Azure, on-premises)
- Configures quality of service parameters (IOPS, throughput)
- Can be set as the default for the cluster
- Defines volume binding mode (immediate or wait for consumer)
- Configures reclaim policy for created PVs
- Enables or disables volume expansion
Automatic PV Creation
- Triggered based on PVC request
- Uses StorageClass to determine how to provision storage
- Provider-specific parameters control the resulting storage
- Handles resource management automatically
- Creates appropriately sized volumes based on PVC request
- Labels and annotations from StorageClass are applied to PV
- Example flow:
- User creates PVC with storageClassName
- Dynamic provisioner watches for new PVCs
- Provisioner creates actual storage in infrastructure
- Provisioner creates PV object in Kubernetes
- Kubernetes binds PVC to newly created PV
- Pod can now use the PVC
Example Dynamic Provisioning
Best Practices
- Use PVs and PVCs for persistence
- Separate storage concerns from application deployment
- Leverage the abstraction provided by PVCs to decouple apps from storage implementation
- Use standardized PVC requests across applications and environments
- Consider using Helm charts or operators to manage related resources together
- Example:
- Implement proper backup strategies
- Use volume snapshots for point-in-time backups
- Implement application-consistent backups where possible
- Automate backup processes with CronJobs
- Store backups in multiple locations/regions
- Regularly test restore procedures
- Consider using Velero or other Kubernetes-native backup solutions
- Example CronJob for database backup:
- Consider storage performance requirements
- Match storage class to application performance needs
- Use SSD-backed storage for I/O intensive workloads
- Consider IOPS, throughput, and latency requirements
- Test with realistic workloads before production
- Use different storage classes for different requirements
- Monitor I/O metrics to detect bottlenecks
- Example storage class for high-performance workloads:
- Plan capacity requirements carefully
- Start with accurate capacity estimates
- Implement monitoring for storage utilization
- Configure alerts for high usage thresholds
- Use volume expansion for growing needs
- Consider storage quotas per namespace
- Document capacity planning process
- Example ResourceQuota:
- Use appropriate access modes
- Choose access modes based on application requirements
- Understand the limitations of your storage provider
- Use ReadWriteOnce for most stateful applications
- Use ReadWriteMany only when truly needed
- Document access mode decisions
- Example:
- Monitor storage usage and health
- Set up Prometheus monitoring for storage metrics
- Monitor PV/PVC status and events
- Track storage utilization trends
- Monitor latency and throughput
- Set up alerts for storage-related issues
- Regularly audit unused PVs and PVCs
- Example Prometheus query for PVC usage:
- Implement proper security measures
- Use encryption for sensitive data
- Implement appropriate RBAC for storage resources
- Consider using Security Contexts with fsGroup
- Secure storage provider credentials
- Use network policies to protect storage services
- Example security context:
- Consider disaster recovery scenarios
- Design for zone and region failures
- Implement cross-region backup strategies
- Document and test recovery procedures
- Consider using storage replication where available
- Define Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
Common Storage Providers
Cloud Providers
- AWS EBS (Elastic Block Store)
- Block storage for AWS EC2 instances
- Provides ReadWriteOnce access mode
- Multiple volume types (gp3, io2, st1, sc1)
- Supports snapshots and encryption
- Region-specific and availability zone bound
- Example:
- Azure Disk
- Block storage for Azure
- Provides ReadWriteOnce access
- Multiple SKUs (Standard_LRS, Premium_LRS, UltraSSD_LRS)
- Supports managed and unmanaged disks
- Example:
- Google Persistent Disk
- Block storage for GCP
- pd-standard (HDD) and pd-ssd options
- Regional or zonal deployment
- Automatic encryption
- Example:
- OpenStack Cinder
- Block storage for OpenStack
- Various volume types based on configuration
- Integration with OpenStack authentication
- Example:
On-Premise
- NFS (Network File System)
- Provides ReadWriteMany access
- Good for shared file access
- Widely supported in enterprise environments
- Simple to set up and manage
- Example:
- iSCSI (Internet Small Computer Systems Interface)
- Block storage protocol
- Widely supported in enterprise storage
- Provides ReadWriteOnce access
- Requires iSCSI initiator configuration
- Example:
- Ceph
- Distributed storage system
- Provides block (RBD), file (CephFS), and object storage
- Highly scalable and resilient
- Supports ReadWriteMany with CephFS
- Example RBD (Rados Block Device):
- GlusterFS
- Distributed file system
- Provides ReadWriteMany access
- Scales horizontally
- Good for large file storage
- Example:
CSI Drivers (Container Storage Interface)
- Standardized interface for storage providers
- Enables third-party storage systems to work with Kubernetes
- Dynamically provisioned volumes
- Support for advanced features like snapshots
- Plugin model for extending storage capabilities
- Examples include:
- Dell EMC PowerFlex
- NetApp Trident
- Pure Storage Pure Service Orchestrator
- Portworx
- VMware vSphere CSI Driver
Troubleshooting
Common issues and solutions:
- PVC stuck in pending state
- Issue: PVC remains in pending state and doesn't bind to a PV
- Diagnosis:
- Common causes:
- No matching PV available
- StorageClass doesn't exist
- Access mode incompatibility
- Capacity requirements not met
- Volume binding mode set to WaitForFirstConsumer
- Solution:
- Check if the StorageClass exists and has a provisioner
- Verify PVC is requesting supported access modes
- Check if dynamic provisioning is enabled
- Ensure cloud provider permissions are correct
- Create a matching PV manually if using static provisioning
- Volume mount failures
- Issue: Pod can't start because of volume mount problems
- Diagnosis:
- Common causes:
- PV is mounted on another node (for RWO volumes)
- Filesystem issues or corruption
- Wrong permissions on the mounted volume
- Network issues with NFS or other network storage
- Solution:
- Check if volume is already mounted elsewhere
- Verify storage backend health and connectivity
- Check filesystem with fsck if applicable
- Ensure PV and node are in the same zone for zonal storage
- Permission issues
- Issue: Container can't write to mounted volume
- Diagnosis:
- Common causes:
- Mismatched user/group IDs
- Read-only filesystem
- SELinux/AppArmor restrictions
- Solution:
- Add securityContext with appropriate fsGroup
- Use an initContainer to set permissions
- Modify container to run as the correct user
- Example:
- Storage capacity issues
- Issue: Insufficient storage space on PV
- Diagnosis:
- Common causes:
- Initial capacity too small
- Application consuming more space than expected
- Temporary files not being cleaned up
- Log files growing unchecked
- Solution:
- Use volume expansion if supported
- Implement log rotation and cleanup routines
- Move data to a larger volume
- Configure monitoring and alerts for storage usage
Storage Management
Volume Expansion
Volume expansion allows you to increase the size of a PVC without disrupting applications:
Steps for volume expansion:
- Verify StorageClass supports expansion (allowVolumeExpansion: true)
- Edit the PVC to increase storage request
- Wait for resize to complete (may require pod restart depending on storage provider)
- Verify new size is available to the pod
Data Migration
When you need to move data to a different storage class or region:
- Create volume snapshot of the source PVC
- Create new PV from snapshot
- Update pod configuration to use the new PVC
- Verify data integrity after migration
StatefulSet Volume Management
StatefulSets have special handling for persistent storage:
The volumeClaimTemplates
creates a PVC for each pod in the StatefulSet with predictable names: www-web-0
, www-web-1
, etc.
Monitoring and Maintenance
Regular Tasks
- Monitor usage: Set up Prometheus metrics for volume utilization
- Check capacity: Implement automated checks for storage thresholds
- Verify backups: Regularly test restore procedures from snapshots
- Update policies: Review and adjust retention policies and quotas
- Performance analysis: Monitor I/O metrics for storage bottlenecks
Health Checks
- Volume status: Regularly check PV and PVC status
- Mount points: Verify volumes are correctly mounted in pods
- I/O performance: Monitor read/write latency and throughput
- Error logs: Check for volume-related errors in pod and system logs
- Storage events: Monitor Kubernetes events for storage issues
Automated Storage Management
Create automated processes for common storage tasks:
Local Persistent Volumes
For high-performance workloads or specific hardware requirements, Kubernetes supports local persistent volumes that are directly attached to a specific node:
Key considerations for local volumes:
- Data is not replicated or protected at the storage level
- Pods using local volumes will be scheduled to specific nodes
- If the node fails, pod can't be rescheduled elsewhere
- Use for high-performance requirements or data that can be regenerated
- Consider using StatefulSets with anti-affinity for distributed workloads
Advanced Storage Patterns
Sidecar Containers for Storage Management
Init Containers for Data Preparation
Ephemeral Volumes
For non-persistent data that should exist for the pod's lifetime but requires more flexibility than emptyDir: