Cluster API represents a significant paradigm shift in Kubernetes cluster management, providing a declarative, Kubernetes-native approach to deploying, scaling, and managing the lifecycle of Kubernetes clusters. This framework embodies the principles of infrastructure as code specifically tailored for Kubernetes infrastructure:
Kubernetes-native abstractions : Define clusters using familiar Custom Resource Definitions (CRDs)Provider agnostic design : Support for multiple infrastructure providers through common interfacesDeclarative lifecycle management : Create, scale, upgrade, and delete clusters using Kubernetes-style objectsConsistent experience : Standardized approach across different environments and cloud providersGitOps friendly : Resources can be managed through standard GitOps workflowsThis guide explores the architecture, components, and implementation patterns of Cluster API, enabling you to adopt a consistent, declarative approach to managing Kubernetes clusters across any environment.
Cluster API is built around a collection of controllers and custom resources:
Cluster API Manager : Central controller managing core cluster objectsBootstrap Provider : Handles node initialization and kubelet configurationControl Plane Provider : Manages the control plane componentsInfrastructure Provider : Interfaces with specific infrastructure platforms (AWS, Azure, vSphere, etc.)
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ │ │ │ │ │
│ Cluster API │◄────►│ Bootstrap │◄────►│ Control Plane │
│ Manager │ │ Provider │ │ Provider │
│ │ │ │ │ │
└───────┬───────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────┐
│ │
│Infrastructure │
│ Provider │
│ │
└───────────────┘
Each component manages specific Custom Resource Definitions (CRDs) that together define a Kubernetes cluster.
Cluster API defines a hierarchy of resources that represent clusters and their components:
# Core Cluster API Resource
apiVersion : cluster.x-k8s.io/v1beta1
kind : Cluster
metadata :
name : example-cluster
spec :
clusterNetwork :
pods :
cidrBlocks : [ "192.168.0.0/16" ]
services :
cidrBlocks : [ "10.96.0.0/12" ]
infrastructureRef :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSCluster
name : example-cluster
controlPlaneRef :
apiVersion : controlplane.cluster.x-k8s.io/v1beta1
kind : KubeadmControlPlane
name : example-cluster-control-plane
The key resources include:
Cluster : Top-level representation of a Kubernetes clusterMachine : Individual node in a cluster (control plane or worker)MachineDeployment : Declarative management of a group of machines (similar to Deployments for Pods)MachineSet : Ensures a specified number of machines exist (similar to ReplicaSets)MachineHealthCheck : Automates remediation of unhealthy machinesEach core resource references provider-specific resources through infrastructure references.
To begin using Cluster API, install the clusterctl
command line tool:
# Download and install clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.3.5/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl
sudo mv clusterctl /usr/local/bin/
Initialize Cluster API with your chosen infrastructure provider:
# Initialize Cluster API with AWS provider
clusterctl init --infrastructure aws
This installs the core Cluster API components and the specified infrastructure provider into your management cluster.
Create a cluster configuration file:
# cluster-config.yaml
apiVersion : cluster.x-k8s.io/v1beta1
kind : Cluster
metadata :
name : my-cluster
namespace : default
spec :
clusterNetwork :
pods :
cidrBlocks : [ "192.168.0.0/16" ]
services :
cidrBlocks : [ "10.96.0.0/12" ]
infrastructureRef :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSCluster
name : my-cluster
namespace : default
controlPlaneRef :
apiVersion : controlplane.cluster.x-k8s.io/v1beta1
kind : KubeadmControlPlane
name : my-cluster-control-plane
namespace : default
---
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSCluster
metadata :
name : my-cluster
namespace : default
spec :
region : us-west-2
sshKeyName : my-ssh-key
---
apiVersion : controlplane.cluster.x-k8s.io/v1beta1
kind : KubeadmControlPlane
metadata :
name : my-cluster-control-plane
namespace : default
spec :
replicas : 3
version : v1.25.0
infrastructureTemplate :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
name : my-cluster-control-plane
namespace : default
kubeadmConfigSpec :
initConfiguration :
nodeRegistration :
name : '{{ ds.meta_data.local_hostname }}'
kubeletExtraArgs :
cloud-provider : aws
clusterConfiguration :
apiServer :
extraArgs :
cloud-provider : aws
controllerManager :
extraArgs :
cloud-provider : aws
---
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
metadata :
name : my-cluster-control-plane
namespace : default
spec :
template :
spec :
instanceType : t3.large
iamInstanceProfile : control-plane.cluster-api-provider-aws.sigs.k8s.io
sshKeyName : my-ssh-key
---
apiVersion : cluster.x-k8s.io/v1beta1
kind : MachineDeployment
metadata :
name : my-cluster-md-0
namespace : default
spec :
clusterName : my-cluster
replicas : 2
selector :
matchLabels :
cluster.x-k8s.io/cluster-name : my-cluster
pool : worker-pool-1
template :
metadata :
labels :
cluster.x-k8s.io/cluster-name : my-cluster
pool : worker-pool-1
spec :
clusterName : my-cluster
version : v1.25.0
bootstrap :
configRef :
apiVersion : bootstrap.cluster.x-k8s.io/v1beta1
kind : KubeadmConfigTemplate
name : my-cluster-md-0
namespace : default
infrastructureRef :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
name : my-cluster-md-0
namespace : default
---
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
metadata :
name : my-cluster-md-0
namespace : default
spec :
template :
spec :
instanceType : t3.large
iamInstanceProfile : nodes.cluster-api-provider-aws.sigs.k8s.io
sshKeyName : my-ssh-key
---
apiVersion : bootstrap.cluster.x-k8s.io/v1beta1
kind : KubeadmConfigTemplate
metadata :
name : my-cluster-md-0
namespace : default
spec :
template :
spec :
joinConfiguration :
nodeRegistration :
name : '{{ ds.meta_data.local_hostname }}'
kubeletExtraArgs :
cloud-provider : aws
Use clusterctl
to generate provider-specific configurations:
# Generate a cluster configuration for AWS
clusterctl generate cluster my-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 > my-cluster.yaml
Apply the configuration to create your cluster:
# Create the cluster
kubectl apply -f my-cluster.yaml
# Get the kubeconfig for the new cluster
clusterctl get kubeconfig my-cluster > my-cluster.kubeconfig
The AWS provider enables Cluster API to create and manage clusters on Amazon Web Services:
# AWS-specific configuration
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSCluster
metadata :
name : production-cluster
spec :
region : us-east-1
sshKeyName : production-key
networkSpec :
vpc :
cidrBlock : 10.0.0.0/16
subnets :
- cidrBlock : 10.0.1.0/24
availabilityZone : us-east-1a
isPublic : true
- cidrBlock : 10.0.2.0/24
availabilityZone : us-east-1b
isPublic : false
Key AWS provider features include:
VPC management : Create or use existing VPCsELB integration : Automatic load balancer provisioningIAM support : Create necessary IAM roles and policiesSpot instance support : Use EC2 Spot instances for worker nodesMulti-AZ deployment : Distribute nodes across availability zonesThe Azure provider manages clusters on Microsoft Azure:
# Azure-specific configuration
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AzureCluster
metadata :
name : production-cluster
spec :
resourceGroup : production-rg
location : eastus
networkSpec :
vnet :
name : production-vnet
cidrBlocks :
- 10.0.0.0/16
subnets :
- name : control-plane-subnet
cidrBlocks :
- 10.0.1.0/24
- name : node-subnet
cidrBlocks :
- 10.0.2.0/24
Azure provider capabilities include:
Resource group management : Create or use existing resource groupsVNET/subnet configuration : Flexible networking optionsManaged identity integration : Simplified authenticationAKS compatibility : Options for managed or self-managed control planesAvailability sets : High availability configurationsThe vSphere provider manages clusters on VMware vSphere environments:
# vSphere-specific configuration
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : VSphereCluster
metadata :
name : production-cluster
spec :
server : vcenter.example.com
thumbprint : "01:23:45:67:89:AB:CD:EF:01:23:45:67:89:AB:CD:EF:01:23:45:67"
cloudProviderConfiguration :
global :
secretName : vsphere-credentials
secretNamespace : default
virtualCenter :
vcenter.example.com :
datacenters : DATACENTER
network :
name : VM_NETWORK
disk :
scsiControllerType : pvscsi
workspace :
server : vcenter.example.com
datacenter : DATACENTER
datastore : DATASTORE
folder : FOLDER
vSphere provider features include:
VM template management : Use existing VM templates for nodesResource pool integration : Place nodes in specific resource poolsDatastore selection : Choose appropriate storage for nodesFolder organization : Maintain VM organization structurevSphere CSI driver integration : Automatic storage provisioningScale worker nodes by modifying the MachineDeployment:
# Scale worker nodes
apiVersion : cluster.x-k8s.io/v1beta1
kind : MachineDeployment
metadata :
name : my-cluster-md-0
namespace : default
spec :
replicas : 5 # Increase from 2 to 5 workers
Apply the change:
kubectl apply -f machine-deployment.yaml
Perform cluster upgrades by updating the Kubernetes version:
# Upgrade control plane
apiVersion : controlplane.cluster.x-k8s.io/v1beta1
kind : KubeadmControlPlane
metadata :
name : my-cluster-control-plane
namespace : default
spec :
replicas : 3
version : v1.26.0 # Upgrade from v1.25.0 to v1.26.0
# ...rest of the spec remains unchanged
Apply the control plane upgrade:
kubectl apply -f control-plane-upgrade.yaml
Then upgrade worker nodes:
# Upgrade workers
apiVersion : cluster.x-k8s.io/v1beta1
kind : MachineDeployment
metadata :
name : my-cluster-md-0
namespace : default
spec :
template :
spec :
version : v1.26.0 # Upgrade from v1.25.0 to v1.26.0
Apply the worker upgrade:
kubectl apply -f worker-upgrade.yaml
Manage multiple clusters from a single management cluster:
# Create clusters in different regions
clusterctl generate cluster east-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 --target-namespace=east-prod > east-cluster.yaml
clusterctl generate cluster west-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 --target-namespace=west-prod > west-cluster.yaml
# Apply configurations
kubectl apply -f east-cluster.yaml
kubectl apply -f west-cluster.yaml
# List all managed clusters
kubectl get clusters --all-namespaces
Implement automatic remediation of unhealthy nodes:
# MachineHealthCheck configuration
apiVersion : cluster.x-k8s.io/v1beta1
kind : MachineHealthCheck
metadata :
name : worker-healthcheck
namespace : default
spec :
clusterName : my-cluster
selector :
matchLabels :
cluster.x-k8s.io/cluster-name : my-cluster
pool : worker-pool-1
unhealthyConditions :
- type : Ready
status : Unknown
timeout : 300s
- type : Ready
status : "False"
timeout : 300s
maxUnhealthy : 40%
nodeStartupTimeout : 10m
This resource automatically replaces machines that remain unhealthy for the specified duration.
ClusterClass provides templating for cluster configurations:
# ClusterClass definition
apiVersion : cluster.x-k8s.io/v1beta1
kind : ClusterClass
metadata :
name : aws-standard
namespace : default
spec :
controlPlane :
ref :
apiVersion : controlplane.cluster.x-k8s.io/v1beta1
kind : KubeadmControlPlaneTemplate
name : aws-standard-control-plane
machineInfrastructure :
ref :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
name : aws-standard-control-plane
infrastructure :
ref :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSClusterTemplate
name : aws-standard-cluster
workers :
machineDeployments :
- class : general-purpose
template :
bootstrap :
ref :
apiVersion : bootstrap.cluster.x-k8s.io/v1beta1
kind : KubeadmConfigTemplate
name : aws-standard-md-general-purpose
infrastructure :
ref :
apiVersion : infrastructure.cluster.x-k8s.io/v1beta1
kind : AWSMachineTemplate
name : aws-standard-md-general-purpose
Create a cluster using the ClusterClass:
# Cluster using ClusterClass
apiVersion : cluster.x-k8s.io/v1beta1
kind : Cluster
metadata :
name : production-cluster
namespace : default
spec :
clusterNetwork :
pods :
cidrBlocks : [ "192.168.0.0/16" ]
services :
cidrBlocks : [ "10.96.0.0/12" ]
topology :
class : aws-standard
version : v1.25.0
controlPlane :
replicas : 3
workers :
machineDeployments :
- class : general-purpose
name : md-0
replicas : 3
ClusterClass enables:
Standardized templates : Consistent cluster configurationsSimplified management : Create clusters with minimal configurationCentralized updates : Change templates to affect all derived clustersVariable substitution : Customize templates with variablesIntegrate Cluster API with GitOps workflows using tools like Flux:
# Flux Kustomization for cluster management
apiVersion : kustomize.toolkit.fluxcd.io/v1beta2
kind : Kustomization
metadata :
name : clusters
namespace : flux-system
spec :
interval : 10m
path : "./clusters/"
prune : true
sourceRef :
kind : GitRepository
name : cluster-configs
validation : client
healthChecks :
- apiVersion : cluster.x-k8s.io/v1beta1
kind : Cluster
name : production-cluster
namespace : default
This approach enables:
Declarative cluster management : Define clusters as codeVersion-controlled configurations : Track all changes in GitAutomated reconciliation : Apply changes automaticallyDrift detection : Identify and correct configuration driftAuditable changes : Full history of cluster modificationsFollow these design patterns for production deployments:
Management cluster isolation : Separate management clusters from workload clustersHighly available management : Ensure management cluster has multiple control plane nodesInfrastructure separation : Use separate infrastructure accounts/projects for different environmentsFailure domain distribution : Spread nodes across multiple availability zonesCluster segregation : Create purpose-specific clusters (dev, test, prod)Implement these security best practices:
Least privilege IAM : Minimize permissions for infrastructure providersNetwork isolation : Implement proper security groups and network policiesNode hardening : Apply security baselines to node templatesCredential rotation : Regularly rotate infrastructure credentialsRBAC for Cluster API : Restrict access to cluster management functions
# RBAC for Cluster API users
apiVersion : rbac.authorization.k8s.io/v1
kind : ClusterRole
metadata :
name : cluster-api-user
rules :
- apiGroups : [ "cluster.x-k8s.io" ]
resources : [ "clusters" , "machinedeployments" ]
verbs : [ "get" , "list" , "watch" ]
- apiGroups : [ "cluster.x-k8s.io" ]
resources : [ "clusters" , "machinedeployments" ]
verbs : [ "create" , "update" , "patch" , "delete" ]
resourceNames : [ "dev-*" ] # Only manage dev clusters
Adopt these operational best practices:
Backup management cluster : Regularly backup Cluster API resourcesStaged upgrades : Implement canary or rolling upgrade strategiesTesting process : Test cluster configurations in lower environments firstMonitoring integration : Monitor both Cluster API components and managed clustersDocumentation : Maintain clear documentation of cluster configurationsCluster API represents a transformative approach to Kubernetes cluster management, bringing the declarative, Kubernetes-native methodology to infrastructure itself. By treating clusters as declarative resources, teams can standardize processes, automate lifecycle operations, and manage infrastructure using the same tools and practices as application deployments.
As Cluster API continues to mature, it's becoming an essential tool for organizations operating Kubernetes at scale across multiple environments. The ability to define consistent cluster configurations, automate upgrades, and integrate with GitOps workflows makes it particularly valuable for platform engineering teams seeking to provide reliable, self-service infrastructure capabilities.
By adopting the patterns and practices outlined in this guide, you can leverage Cluster API to implement a robust, scalable approach to Kubernetes infrastructure management that aligns with modern DevOps and platform engineering principles.