Welcome to from-docker-to-kubernetes

Kubernetes Cluster API

Comprehensive guide to Kubernetes Cluster API for declarative infrastructure management and cluster lifecycle operations

Introduction to Cluster API

Cluster API represents a significant paradigm shift in Kubernetes cluster management, providing a declarative, Kubernetes-native approach to deploying, scaling, and managing the lifecycle of Kubernetes clusters. This framework embodies the principles of infrastructure as code specifically tailored for Kubernetes infrastructure:

  • Kubernetes-native abstractions: Define clusters using familiar Custom Resource Definitions (CRDs)
  • Provider agnostic design: Support for multiple infrastructure providers through common interfaces
  • Declarative lifecycle management: Create, scale, upgrade, and delete clusters using Kubernetes-style objects
  • Consistent experience: Standardized approach across different environments and cloud providers
  • GitOps friendly: Resources can be managed through standard GitOps workflows

This guide explores the architecture, components, and implementation patterns of Cluster API, enabling you to adopt a consistent, declarative approach to managing Kubernetes clusters across any environment.

Cluster API Architecture

Core Components

Cluster API is built around a collection of controllers and custom resources:

  1. Cluster API Manager: Central controller managing core cluster objects
  2. Bootstrap Provider: Handles node initialization and kubelet configuration
  3. Control Plane Provider: Manages the control plane components
  4. Infrastructure Provider: Interfaces with specific infrastructure platforms (AWS, Azure, vSphere, etc.)
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│               │      │               │      │               │
│  Cluster API  │◄────►│   Bootstrap   │◄────►│ Control Plane │
│    Manager    │      │   Provider    │      │   Provider    │
│               │      │               │      │               │
└───────┬───────┘      └───────────────┘      └───────────────┘
        │
        ▼
┌───────────────┐
│               │
│Infrastructure │
│   Provider    │
│               │
└───────────────┘

Each component manages specific Custom Resource Definitions (CRDs) that together define a Kubernetes cluster.

Resource Model

Cluster API defines a hierarchy of resources that represent clusters and their components:

# Core Cluster API Resource
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: example-cluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: example-cluster
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: example-cluster-control-plane

The key resources include:

  • Cluster: Top-level representation of a Kubernetes cluster
  • Machine: Individual node in a cluster (control plane or worker)
  • MachineDeployment: Declarative management of a group of machines (similar to Deployments for Pods)
  • MachineSet: Ensures a specified number of machines exist (similar to ReplicaSets)
  • MachineHealthCheck: Automates remediation of unhealthy machines

Each core resource references provider-specific resources through infrastructure references.

Getting Started with Cluster API

Installation and Setup

To begin using Cluster API, install the clusterctl command line tool:

# Download and install clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.3.5/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl
sudo mv clusterctl /usr/local/bin/

Initialize Cluster API with your chosen infrastructure provider:

# Initialize Cluster API with AWS provider
clusterctl init --infrastructure aws

This installs the core Cluster API components and the specified infrastructure provider into your management cluster.

Creating Your First Cluster

Create a cluster configuration file:

# cluster-config.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: my-cluster
    namespace: default
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: my-cluster-control-plane
    namespace: default
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  region: us-west-2
  sshKeyName: my-ssh-key
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: my-cluster-control-plane
  namespace: default
spec:
  replicas: 3
  version: v1.25.0
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSMachineTemplate
    name: my-cluster-control-plane
    namespace: default
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data.local_hostname }}'
        kubeletExtraArgs:
          cloud-provider: aws
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: my-cluster-control-plane
  namespace: default
spec:
  template:
    spec:
      instanceType: t3.large
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      sshKeyName: my-ssh-key
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: my-cluster-md-0
  namespace: default
spec:
  clusterName: my-cluster
  replicas: 2
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: my-cluster
      pool: worker-pool-1
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: my-cluster
        pool: worker-pool-1
    spec:
      clusterName: my-cluster
      version: v1.25.0
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: my-cluster-md-0
          namespace: default
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
        name: my-cluster-md-0
        namespace: default
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: my-cluster-md-0
  namespace: default
spec:
  template:
    spec:
      instanceType: t3.large
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      sshKeyName: my-ssh-key
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: my-cluster-md-0
  namespace: default
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          name: '{{ ds.meta_data.local_hostname }}'
          kubeletExtraArgs:
            cloud-provider: aws

Use clusterctl to generate provider-specific configurations:

# Generate a cluster configuration for AWS
clusterctl generate cluster my-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 > my-cluster.yaml

Apply the configuration to create your cluster:

# Create the cluster
kubectl apply -f my-cluster.yaml

# Get the kubeconfig for the new cluster
clusterctl get kubeconfig my-cluster > my-cluster.kubeconfig

Working with Infrastructure Providers

AWS Provider

The AWS provider enables Cluster API to create and manage clusters on Amazon Web Services:

# AWS-specific configuration
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production-cluster
spec:
  region: us-east-1
  sshKeyName: production-key
  networkSpec:
    vpc:
      cidrBlock: 10.0.0.0/16
    subnets:
    - cidrBlock: 10.0.1.0/24
      availabilityZone: us-east-1a
      isPublic: true
    - cidrBlock: 10.0.2.0/24
      availabilityZone: us-east-1b
      isPublic: false

Key AWS provider features include:

  1. VPC management: Create or use existing VPCs
  2. ELB integration: Automatic load balancer provisioning
  3. IAM support: Create necessary IAM roles and policies
  4. Spot instance support: Use EC2 Spot instances for worker nodes
  5. Multi-AZ deployment: Distribute nodes across availability zones

Azure Provider

The Azure provider manages clusters on Microsoft Azure:

# Azure-specific configuration
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureCluster
metadata:
  name: production-cluster
spec:
  resourceGroup: production-rg
  location: eastus
  networkSpec:
    vnet:
      name: production-vnet
      cidrBlocks: 
      - 10.0.0.0/16
    subnets:
    - name: control-plane-subnet
      cidrBlocks: 
      - 10.0.1.0/24
    - name: node-subnet
      cidrBlocks:
      - 10.0.2.0/24

Azure provider capabilities include:

  1. Resource group management: Create or use existing resource groups
  2. VNET/subnet configuration: Flexible networking options
  3. Managed identity integration: Simplified authentication
  4. AKS compatibility: Options for managed or self-managed control planes
  5. Availability sets: High availability configurations

vSphere Provider

The vSphere provider manages clusters on VMware vSphere environments:

# vSphere-specific configuration
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: production-cluster
spec:
  server: vcenter.example.com
  thumbprint: "01:23:45:67:89:AB:CD:EF:01:23:45:67:89:AB:CD:EF:01:23:45:67"
  cloudProviderConfiguration:
    global:
      secretName: vsphere-credentials
      secretNamespace: default
    virtualCenter:
      vcenter.example.com:
        datacenters: DATACENTER
    network:
      name: VM_NETWORK
    disk:
      scsiControllerType: pvscsi
    workspace:
      server: vcenter.example.com
      datacenter: DATACENTER
      datastore: DATASTORE
      folder: FOLDER

vSphere provider features include:

  1. VM template management: Use existing VM templates for nodes
  2. Resource pool integration: Place nodes in specific resource pools
  3. Datastore selection: Choose appropriate storage for nodes
  4. Folder organization: Maintain VM organization structure
  5. vSphere CSI driver integration: Automatic storage provisioning

Cluster Lifecycle Management

Scaling Clusters

Scale worker nodes by modifying the MachineDeployment:

# Scale worker nodes
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: my-cluster-md-0
  namespace: default
spec:
  replicas: 5  # Increase from 2 to 5 workers

Apply the change:

kubectl apply -f machine-deployment.yaml

Upgrading Clusters

Perform cluster upgrades by updating the Kubernetes version:

# Upgrade control plane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: my-cluster-control-plane
  namespace: default
spec:
  replicas: 3
  version: v1.26.0  # Upgrade from v1.25.0 to v1.26.0
  # ...rest of the spec remains unchanged

Apply the control plane upgrade:

kubectl apply -f control-plane-upgrade.yaml

Then upgrade worker nodes:

# Upgrade workers
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: my-cluster-md-0
  namespace: default
spec:
  template:
    spec:
      version: v1.26.0  # Upgrade from v1.25.0 to v1.26.0

Apply the worker upgrade:

kubectl apply -f worker-upgrade.yaml

Multi-Cluster Management

Manage multiple clusters from a single management cluster:

# Create clusters in different regions
clusterctl generate cluster east-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 --target-namespace=east-prod > east-cluster.yaml
clusterctl generate cluster west-cluster --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 --target-namespace=west-prod > west-cluster.yaml

# Apply configurations
kubectl apply -f east-cluster.yaml
kubectl apply -f west-cluster.yaml

# List all managed clusters
kubectl get clusters --all-namespaces

Advanced Cluster API Patterns

Machine Health Checks

Implement automatic remediation of unhealthy nodes:

# MachineHealthCheck configuration
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: worker-healthcheck
  namespace: default
spec:
  clusterName: my-cluster
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: my-cluster
      pool: worker-pool-1
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s
  maxUnhealthy: 40%
  nodeStartupTimeout: 10m

This resource automatically replaces machines that remain unhealthy for the specified duration.

ClusterClass and Managed Topologies

ClusterClass provides templating for cluster configurations:

# ClusterClass definition
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: aws-standard
  namespace: default
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: aws-standard-control-plane
    machineInfrastructure:
      ref:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
        name: aws-standard-control-plane
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AWSClusterTemplate
      name: aws-standard-cluster
  workers:
    machineDeployments:
    - class: general-purpose
      template:
        bootstrap:
          ref:
            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
            kind: KubeadmConfigTemplate
            name: aws-standard-md-general-purpose
        infrastructure:
          ref:
            apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
            kind: AWSMachineTemplate
            name: aws-standard-md-general-purpose

Create a cluster using the ClusterClass:

# Cluster using ClusterClass
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  topology:
    class: aws-standard
    version: v1.25.0
    controlPlane:
      replicas: 3
    workers:
      machineDeployments:
      - class: general-purpose
        name: md-0
        replicas: 3

ClusterClass enables:

  1. Standardized templates: Consistent cluster configurations
  2. Simplified management: Create clusters with minimal configuration
  3. Centralized updates: Change templates to affect all derived clusters
  4. Variable substitution: Customize templates with variables

GitOps Integration

Integrate Cluster API with GitOps workflows using tools like Flux:

# Flux Kustomization for cluster management
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: clusters
  namespace: flux-system
spec:
  interval: 10m
  path: "./clusters/"
  prune: true
  sourceRef:
    kind: GitRepository
    name: cluster-configs
  validation: client
  healthChecks:
  - apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: production-cluster
    namespace: default

This approach enables:

  1. Declarative cluster management: Define clusters as code
  2. Version-controlled configurations: Track all changes in Git
  3. Automated reconciliation: Apply changes automatically
  4. Drift detection: Identify and correct configuration drift
  5. Auditable changes: Full history of cluster modifications

Best Practices and Patterns

Infrastructure Design Patterns

Follow these design patterns for production deployments:

  1. Management cluster isolation: Separate management clusters from workload clusters
  2. Highly available management: Ensure management cluster has multiple control plane nodes
  3. Infrastructure separation: Use separate infrastructure accounts/projects for different environments
  4. Failure domain distribution: Spread nodes across multiple availability zones
  5. Cluster segregation: Create purpose-specific clusters (dev, test, prod)

Security Considerations

Implement these security best practices:

  1. Least privilege IAM: Minimize permissions for infrastructure providers
  2. Network isolation: Implement proper security groups and network policies
  3. Node hardening: Apply security baselines to node templates
  4. Credential rotation: Regularly rotate infrastructure credentials
  5. RBAC for Cluster API: Restrict access to cluster management functions
# RBAC for Cluster API users
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-api-user
rules:
- apiGroups: ["cluster.x-k8s.io"]
  resources: ["clusters", "machinedeployments"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["cluster.x-k8s.io"]
  resources: ["clusters", "machinedeployments"]
  verbs: ["create", "update", "patch", "delete"]
  resourceNames: ["dev-*"]  # Only manage dev clusters

Operational Considerations

Adopt these operational best practices:

  1. Backup management cluster: Regularly backup Cluster API resources
  2. Staged upgrades: Implement canary or rolling upgrade strategies
  3. Testing process: Test cluster configurations in lower environments first
  4. Monitoring integration: Monitor both Cluster API components and managed clusters
  5. Documentation: Maintain clear documentation of cluster configurations

Conclusion

Cluster API represents a transformative approach to Kubernetes cluster management, bringing the declarative, Kubernetes-native methodology to infrastructure itself. By treating clusters as declarative resources, teams can standardize processes, automate lifecycle operations, and manage infrastructure using the same tools and practices as application deployments.

As Cluster API continues to mature, it's becoming an essential tool for organizations operating Kubernetes at scale across multiple environments. The ability to define consistent cluster configurations, automate upgrades, and integrate with GitOps workflows makes it particularly valuable for platform engineering teams seeking to provide reliable, self-service infrastructure capabilities.

By adopting the patterns and practices outlined in this guide, you can leverage Cluster API to implement a robust, scalable approach to Kubernetes infrastructure management that aligns with modern DevOps and platform engineering principles.