Welcome to from-docker-to-kubernetes

Kubernetes Custom Resource Definitions

Comprehensive guide to extending Kubernetes with Custom Resource Definitions (CRDs) for declarative application management

Introduction to Custom Resource Definitions

Custom Resource Definitions (CRDs) represent one of Kubernetes' most powerful extension mechanisms, enabling platform engineers and developers to extend the Kubernetes API with custom objects that represent application-specific concepts and resources:

  • API extension: Add domain-specific objects to the Kubernetes API server
  • Declarative management: Manage custom resources using the same kubectl and API machinery
  • Kubernetes-native patterns: Apply GitOps, RBAC, and other Kubernetes patterns to custom resources
  • Operator foundations: Form the basis for building advanced Kubernetes operators
  • Platform building blocks: Create composable abstractions for self-service platforms

This comprehensive guide explores the architecture, implementation patterns, and best practices for creating, managing, and leveraging Custom Resource Definitions in Kubernetes, enabling you to build sophisticated declarative workflows for your applications and platforms.

CRD Fundamentals

Core Concepts

Custom Resource Definitions extend the Kubernetes API by defining new resource types:

# Example CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    listKind: WidgetList
    plural: widgets
    singular: widget
    shortNames:
      - wg
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  enum: [small, medium, large]
                color:
                  type: string
                replicas:
                  type: integer
                  minimum: 1
              required: [size, replicas]

The key components of a CRD include:

  1. API Group: A logical collection of related resources (e.g., example.com)
  2. Resource Names: Kind, plural, singular, and shorthand names
  3. Scope: Namespaced or cluster-wide resources
  4. Versions: API versions with schema definitions
  5. Schema: OpenAPI v3 schema defining the structure of the custom resource

Using Custom Resources

Once a CRD is defined, you can create and manage custom resources:

# Example Widget custom resource
apiVersion: example.com/v1
kind: Widget
metadata:
  name: my-widget
spec:
  size: medium
  color: blue
  replicas: 3

Interact with custom resources using standard kubectl commands:

# List all widgets
kubectl get widgets

# Get details about a specific widget
kubectl describe widget my-widget

# Edit a widget
kubectl edit widget my-widget

# Delete a widget
kubectl delete widget my-widget

Schema Definition and Validation

OpenAPI Schema

Define precise validation rules using OpenAPI v3 schema:

Structural Schemas

Structural schemas provide additional guarantees for CRD validation:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          # Every object node must have properties and additionalProperties
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
              # Disallow additional fields
              additionalProperties: false
          # Schema is "structural" when types are defined at every level

A structural schema meets these requirements:

  1. Explicit types: Every schema node specifies a type
  2. No embedded resources: No arbitrary embedding of resources
  3. No introns: No complex conditionals at the root or metadata level
  4. Complete property definitions: Every property has a defined schema

Default Values and Pruning

Configure default values and unknown field handling:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  # Default value applied when not specified
                  default: medium
                replicas:
                  type: integer
                  default: 1
      # Control what happens to unknown fields
      # Preserve: keep unknown fields
      # Reject: validation fails if unknown fields present
      # Drop: remove unknown fields during creation/update
      x-kubernetes-preserve-unknown-fields: false

Versioning and Conversion

Multiple API Versions

Define multiple versions of your custom resource:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: false
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                numberOfReplicas:  # Old field name
                  type: integer
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                replicas:  # New field name
                  type: integer

API version configuration includes:

  1. Multiple versions: Define several API versions simultaneously
  2. Served flag: Control which versions are exposed in the API
  3. Storage flag: Specify which version is used for persistence
  4. Version-specific schemas: Define different schemas for each version

Webhook Conversion

Implement conversion between versions using webhooks:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: false
      schema:
        openAPIV3Schema:
          # Schema for v1alpha1
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          # Schema for v1
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          namespace: default
          name: widget-converter
          path: /convert
        caBundle: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0t..."
      conversionReviewVersions: ["v1"]

The conversion webhook receives an AdmissionReview object containing the source object and responds with the converted object:

// Example conversion handler in Go
func convertHandler(w http.ResponseWriter, r *http.Request) {
    var review apiextensionsv1.ConversionReview
    
    // Decode the request
    if err := json.NewDecoder(r.Body).Decode(&review); err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    
    // Process each object for conversion
    convertedObjects := make([]runtime.RawExtension, len(review.Request.Objects))
    for i, rawObj := range review.Request.Objects {
        // Determine conversion direction and perform conversion
        srcVersion := review.Request.DesiredAPIVersion
        destVersion := review.Request.DesiredAPIVersion
        
        var obj map[string]interface{}
        json.Unmarshal(rawObj.Raw, &obj)
        
        // Example conversion: rename numberOfReplicas to replicas
        if srcVersion == "example.com/v1alpha1" && destVersion == "example.com/v1" {
            spec := obj["spec"].(map[string]interface{})
            if replicas, exists := spec["numberOfReplicas"]; exists {
                spec["replicas"] = replicas
                delete(spec, "numberOfReplicas")
            }
        } else if srcVersion == "example.com/v1" && destVersion == "example.com/v1alpha1" {
            spec := obj["spec"].(map[string]interface{})
            if replicas, exists := spec["replicas"]; exists {
                spec["numberOfReplicas"] = replicas
                delete(spec, "replicas")
            }
        }
        
        // Set the apiVersion to the desired version
        obj["apiVersion"] = destVersion
        
        // Convert back to raw JSON
        convertedJSON, _ := json.Marshal(obj)
        convertedObjects[i] = runtime.RawExtension{Raw: convertedJSON}
    }
    
    // Prepare the response
    review.Response = &apiextensionsv1.ConversionResponse{
        UID:              review.Request.UID,
        ConvertedObjects: convertedObjects,
        Result:           metav1.Status{Status: "Success"},
    }
    review.Request = nil
    
    // Send the response
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(review)
}

Status Subresource

Status Configuration

Enable the status subresource to separate spec from status:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      # Enable status subresource
      subresources:
        status: {}
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                # Spec fields here
            status:
              type: object
              properties:
                phase:
                  type: string
                  enum: [Pending, Running, Failed, Succeeded]
                availableReplicas:
                  type: integer
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      status:
                        type: string
                        enum: [True, False, Unknown]
                      lastTransitionTime:
                        type: string
                      reason:
                        type: string
                      message:
                        type: string

With the status subresource enabled:

  1. Status updates: Update status independently from spec
  2. Validation separation: Apply different validation rules to status
  3. RBAC separation: Apply different permissions to status updates
  4. Change tracking: Status changes don't count as spec modifications

Using Status Updates

Create and update custom resources with status:

# Create a Widget with spec only
apiVersion: example.com/v1
kind: Widget
metadata:
  name: my-widget
spec:
  size: medium
  replicas: 3

Update the status subresource separately:

# Get the current resource
kubectl get widget my-widget -o yaml > widget.yaml

# Edit the status field in widget.yaml
# Then update using status subresource
kubectl replace --subresource=status -f widget.yaml

Programmatically update status using the Kubernetes API:

// Update status using Go client
import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/types"
    "k8s.io/client-go/kubernetes/scheme"
    "k8s.io/client-go/rest"
)

func updateStatus(name, namespace string) error {
    config, err := rest.InClusterConfig()
    if err != nil {
        return err
    }
    
    // Create a RESTClient for our CRD
    crdConfig := *config
    crdConfig.ContentConfig.GroupVersion = &schema.GroupVersion{Group: "example.com", Version: "v1"}
    crdConfig.APIPath = "/apis"
    
    client, err := rest.RESTClientFor(&crdConfig)
    if err != nil {
        return err
    }
    
    // Create a status patch
    statusPatch := map[string]interface{}{
        "status": map[string]interface{}{
            "phase": "Running",
            "availableReplicas": 3,
            "conditions": []map[string]interface{}{
                {
                    "type":               "Available",
                    "status":             "True",
                    "lastTransitionTime": metav1.Now().Format(time.RFC3339),
                    "reason":             "MinimumReplicasAvailable",
                    "message":            "Widget has minimum availability.",
                },
            },
        },
    }
    
    patchBytes, _ := json.Marshal(statusPatch)
    
    // Apply the patch to the status subresource
    result := client.Patch(types.MergePatchType).
        Namespace(namespace).
        Resource("widgets").
        Name(name).
        SubResource("status").
        Body(patchBytes).
        Do(context.Background())
    
    return result.Error()
}

Controllers and Operators

Controller Pattern

Implement the controller pattern for your custom resources:

// Simplified controller example in Go
package main

import (
    "context"
    "fmt"
    "time"
    
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
    "k8s.io/apimachinery/pkg/runtime/schema"
    "k8s.io/client-go/dynamic"
    "k8s.io/client-go/dynamic/dynamicinformer"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/workqueue"
)

func main() {
    // Set up Kubernetes client
    config, err := clientcmd.BuildConfigFromFlags("", "/path/to/kubeconfig")
    if err != nil {
        panic(err)
    }
    
    client, err := dynamic.NewForConfig(config)
    if err != nil {
        panic(err)
    }
    
    // Define the Widget resource
    widgetGVR := schema.GroupVersionResource{
        Group:    "example.com",
        Version:  "v1",
        Resource: "widgets",
    }
    
    // Create an informer factory
    factory := dynamicinformer.NewFilteredDynamicSharedInformerFactory(
        client, time.Minute, metav1.NamespaceAll, nil,
    )
    
    // Get an informer for Widgets
    informer := factory.ForResource(widgetGVR).Informer()
    
    // Create a work queue
    queue := workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter())
    
    // Add event handlers
    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            key, err := cache.MetaNamespaceKeyFunc(obj)
            if err == nil {
                queue.Add(key)
            }
        },
        UpdateFunc: func(old, new interface{}) {
            key, err := cache.MetaNamespaceKeyFunc(new)
            if err == nil {
                queue.Add(key)
            }
        },
        DeleteFunc: func(obj interface{}) {
            key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj)
            if err == nil {
                queue.Add(key)
            }
        },
    })
    
    // Start the informer
    stopCh := make(chan struct{})
    defer close(stopCh)
    factory.Start(stopCh)
    
    // Wait for caches to sync
    factory.WaitForCacheSync(stopCh)
    
    // Process items from the queue
    for {
        // Get an item from the queue
        key, shutdown := queue.Get()
        if shutdown {
            break
        }
        
        // Process the item
        func() {
            defer queue.Done(key)
            
            // Get namespace and name from key
            namespace, name, err := cache.SplitMetaNamespaceKey(key.(string))
            if err != nil {
                queue.Forget(key)
                return
            }
            
            // Get the Widget resource
            obj, err := client.Resource(widgetGVR).Namespace(namespace).Get(
                context.Background(), name, metav1.GetOptions{},
            )
            if err != nil {
                queue.Forget(key)
                return
            }
            
            // Extract spec fields
            spec, found, err := unstructured.NestedMap(obj.Object, "spec")
            if err != nil || !found {
                queue.Forget(key)
                return
            }
            
            size, _ := spec["size"].(string)
            replicas, _ := spec["replicas"].(int64)
            
            fmt.Printf("Processing Widget %s/%s: size=%s, replicas=%d\n", 
                       namespace, name, size, replicas)
            
            // Implement reconciliation logic here
            // ...
            
            // Update status
            status := map[string]interface{}{
                "phase": "Running",
                "availableReplicas": replicas,
            }
            unstructured.SetNestedMap(obj.Object, status, "status")
            
            _, err = client.Resource(widgetGVR).Namespace(namespace).
                UpdateStatus(context.Background(), obj, metav1.UpdateOptions{})
            if err != nil {
                // Handle error, maybe requeue
                queue.AddRateLimited(key)
                return
            }
            
            // Successfully processed
            queue.Forget(key)
        }()
    }
}

The controller pattern consists of these components:

  1. Informers: Watch for changes to resources
  2. Work Queue: Queue resources for processing
  3. Reconciliation Loop: Process resources to align actual state with desired state
  4. Status Updates: Report current state back to the custom resource

Building Operators

Create full-featured operators using frameworks like Operator SDK:

# Install Operator SDK
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.25.0
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
chmod +x operator-sdk_${OS}_${ARCH}
sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk

# Create a new operator
mkdir widget-operator
cd widget-operator
operator-sdk init --domain example.com --repo github.com/example/widget-operator

# Create an API (CRD)
operator-sdk create api --group widgets --version v1 --kind Widget --resource --controller

# Define the Widget spec and status in api/v1/widget_types.go
# Implement the controller in controllers/widget_controller.go

# Generate CRD manifests
make manifests

# Build and deploy the operator
make docker-build docker-push
make deploy

Operator frameworks provide:

  1. Scaffolding: Generate initial code structure
  2. API Generation: Create typed APIs for your CRDs
  3. Controller Framework: Handle common controller patterns
  4. Testing Tools: Simplify testing of controllers
  5. Deployment Tools: Package and deploy operators

RBAC and Security

Role-Based Access Control

Configure RBAC for custom resources:

# ClusterRole for read-only access to Widgets
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: widget-viewer
rules:
- apiGroups: ["example.com"]
  resources: ["widgets"]
  verbs: ["get", "list", "watch"]

---
# ClusterRole for full access to Widgets
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: widget-admin
rules:
- apiGroups: ["example.com"]
  resources: ["widgets", "widgets/status"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

---
# RoleBinding to grant a user access to Widgets in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: widget-viewer-binding
  namespace: default
subjects:
- kind: User
  name: jane
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: widget-viewer
  apiGroup: rbac.authorization.k8s.io

RBAC considerations for CRDs:

  1. Resource-level permissions: Control access to the custom resource type
  2. Subresource permissions: Separate permissions for status updates
  3. Namespace scoping: Restrict access to specific namespaces
  4. Verb restrictions: Limit which operations users can perform
  5. Aggregated roles: Create role hierarchies for different access levels

Securing Controllers

Implement secure controllers with least privilege:

# Service account for the controller
apiVersion: v1
kind: ServiceAccount
metadata:
  name: widget-controller
  namespace: system

---
# Role with minimum required permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: widget-controller-role
rules:
- apiGroups: ["example.com"]
  resources: ["widgets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["example.com"]
  resources: ["widgets/status"]
  verbs: ["update", "patch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

---
# Bind the role to the controller's service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: widget-controller-rolebinding
subjects:
- kind: ServiceAccount
  name: widget-controller
  namespace: system
roleRef:
  kind: ClusterRole
  name: widget-controller-role
  apiGroup: rbac.authorization.k8s.io

Controller security best practices:

  1. Least privilege: Grant only permissions needed for reconciliation
  2. Namespace isolation: Run controllers in dedicated namespaces
  3. Resource limits: Set appropriate CPU and memory limits
  4. RBAC auditing: Regularly review and tighten permissions
  5. Secure communication: Use TLS for webhook communication

Advanced CRD Features

Finalizers

Implement finalizers for proper resource cleanup:

# Widget with a finalizer
apiVersion: example.com/v1
kind: Widget
metadata:
  name: my-widget
  finalizers:
  - widget.example.com/finalizer
spec:
  size: medium
  replicas: 3

Controller code to handle finalizers:

// Finalizer handling in a controller
func reconcile(widget *unstructured.Unstructured) error {
    // Get the finalizers
    finalizers, found, err := unstructured.NestedStringSlice(
        widget.Object, "metadata", "finalizers",
    )
    if err != nil {
        return err
    }
    
    // Check if the resource is being deleted
    deletionTimestamp, found, err := unstructured.NestedString(
        widget.Object, "metadata", "deletionTimestamp",
    )
    if err != nil {
        return err
    }
    
    // Resource is being deleted
    if found && deletionTimestamp != "" {
        if contains(finalizers, "widget.example.com/finalizer") {
            // Perform cleanup operations
            err := performCleanup(widget)
            if err != nil {
                return err
            }
            
            // Remove finalizer once cleanup is complete
            finalizers = removeString(finalizers, "widget.example.com/finalizer")
            err = unstructured.SetNestedStringSlice(
                widget.Object, finalizers, "metadata", "finalizers",
            )
            if err != nil {
                return err
            }
            
            // Update the resource to remove the finalizer
            _, err = client.Resource(widgetGVR).Namespace(widget.GetNamespace()).
                Update(context.Background(), widget, metav1.UpdateOptions{})
            return err
        }
        return nil
    }
    
    // Resource is not being deleted, ensure finalizer exists
    if !contains(finalizers, "widget.example.com/finalizer") {
        finalizers = append(finalizers, "widget.example.com/finalizer")
        err = unstructured.SetNestedStringSlice(
            widget.Object, finalizers, "metadata", "finalizers",
        )
        if err != nil {
            return err
        }
        
        // Update the resource to add the finalizer
        _, err = client.Resource(widgetGVR).Namespace(widget.GetNamespace()).
            Update(context.Background(), widget, metav1.UpdateOptions{})
        return err
    }
    
    // Normal reconciliation logic here
    return nil
}

Finalizers ensure:

  1. Resource protection: Prevent premature deletion
  2. Cleanup operations: Perform required cleanup before deletion
  3. Resource dependencies: Handle dependent resource cleanup
  4. External resource management: Clean up external resources

Owner References

Establish ownership relationships between resources:

# Parent resource
apiVersion: example.com/v1
kind: Widget
metadata:
  name: parent-widget
spec:
  size: large
  replicas: 3

---
# Child resource with owner reference
apiVersion: apps/v1
kind: Deployment
metadata:
  name: widget-deployment
  ownerReferences:
  - apiVersion: example.com/v1
    kind: Widget
    name: parent-widget
    uid: d9607e19-f88f-11e6-a518-42010a800195
    controller: true
    blockOwnerDeletion: true
spec:
  replicas: 3
  # ... rest of Deployment spec

Setting owner references in controller code:

// Set owner reference in Go
func setOwnerReference(owner, object *unstructured.Unstructured) error {
    ownerRefs := []metav1.OwnerReference{
        {
            APIVersion:         owner.GetAPIVersion(),
            Kind:               owner.GetKind(),
            Name:               owner.GetName(),
            UID:                owner.GetUID(),
            Controller:         pointer.Bool(true),
            BlockOwnerDeletion: pointer.Bool(true),
        },
    }
    
    return controllerutil.SetOwnerReferences(object, ownerRefs)
}

Owner references provide:

  1. Garbage collection: Automatic deletion of dependent resources
  2. Ownership tracking: Clear relationship between parent and child resources
  3. Cascading deletions: Orderly cleanup of resource hierarchies
  4. Dependency visualization: Makes resource relationships explicit

Scale Subresource

Enable the scale subresource for HPA integration:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  names:
    kind: Widget
    plural: widgets
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      subresources:
        status: {}
        scale:
          specReplicasPath: .spec.replicas
          statusReplicasPath: .status.availableReplicas
          labelSelectorPath: .status.selector
      schema:
        # Schema definition

Use Horizontal Pod Autoscaler with custom resources:

# HPA targeting a Widget
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: widget-hpa
spec:
  scaleTargetRef:
    apiVersion: example.com/v1
    kind: Widget
    name: my-widget
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

The scale subresource enables:

  1. HPA integration: Scale based on metrics
  2. kubectl scale: Use kubectl scale command with custom resources
  3. Consistent scaling API: Standard interface for scaling operations
  4. Scale status reporting: Report current scale status

Testing and Validation

Unit Testing Controllers

Write effective unit tests for your controllers:

// Example unit test for a widget controller
package controllers

import (
    "context"
    "testing"
    "time"
    
    "github.com/stretchr/testify/assert"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    "k8s.io/client-go/kubernetes/scheme"
    "sigs.k8s.io/controller-runtime/pkg/client/fake"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
    
    widgetsv1 "github.com/example/widget-operator/api/v1"
)

func TestWidgetReconciler(t *testing.T) {
    // Register widget types with the runtime scheme
    s := scheme.Scheme
    s.AddKnownTypes(widgetsv1.GroupVersion, &widgetsv1.Widget{})
    
    // Create a widget instance
    widget := &widgetsv1.Widget{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "test-widget",
            Namespace: "default",
        },
        Spec: widgetsv1.WidgetSpec{
            Size:     "medium",
            Replicas: 3,
        },
    }
    
    // Create a fake client with the widget
    client := fake.NewClientBuilder().
        WithScheme(s).
        WithObjects(widget).
        Build()
    
    // Create the reconciler with the fake client
    reconciler := &WidgetReconciler{
        Client: client,
        Scheme: s,
    }
    
    // Create a request to reconcile the widget
    req := reconcile.Request{
        NamespacedName: types.NamespacedName{
            Name:      "test-widget",
            Namespace: "default",
        },
    }
    
    // Call the reconciler
    _, err := reconciler.Reconcile(context.Background(), req)
    assert.NoError(t, err)
    
    // Verify the widget's status was updated
    updatedWidget := &widgetsv1.Widget{}
    err = client.Get(context.Background(), req.NamespacedName, updatedWidget)
    assert.NoError(t, err)
    
    // Verify expected status fields
    assert.Equal(t, "Running", updatedWidget.Status.Phase)
    assert.Equal(t, int32(3), updatedWidget.Status.AvailableReplicas)
}

Unit testing approaches include:

  1. Fake clients: Use in-memory clients for fast testing
  2. Mocked dependencies: Mock external dependencies
  3. Scenario testing: Test different reconciliation scenarios
  4. Edge cases: Test error conditions and edge cases
  5. Webhook testing: Validate webhook implementations

Integration Testing

Implement integration tests with the Kubernetes API:

// Integration test using envtest
package controllers

import (
    "context"
    "path/filepath"
    "testing"
    "time"
    
    . "github.com/onsi/ginkgo/v2"
    . "github.com/onsi/gomega"
    
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes/scheme"
    "k8s.io/client-go/rest"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/envtest"
    logf "sigs.k8s.io/controller-runtime/pkg/log"
    
    widgetsv1 "github.com/example/widget-operator/api/v1"
)

var cfg *rest.Config
var k8sClient client.Client
var testEnv *envtest.Environment

func TestControllers(t *testing.T) {
    RegisterFailHandler(Fail)
    RunSpecs(t, "Controller Suite")
}

var _ = BeforeSuite(func() {
    logf.SetLogger(GinkgoLogr)
    
    By("bootstrapping test environment")
    testEnv = &envtest.Environment{
        CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
    }
    
    var err error
    cfg, err = testEnv.Start()
    Expect(err).NotTo(HaveOccurred())
    
    err = widgetsv1.AddToScheme(scheme.Scheme)
    Expect(err).NotTo(HaveOccurred())
    
    k8sClient, err = client.New(cfg, client.Options{Scheme: scheme.Scheme})
    Expect(err).NotTo(HaveOccurred())
    
    // Start the controller manager
    mgr, err := ctrl.NewManager(cfg, ctrl.Options{Scheme: scheme.Scheme})
    Expect(err).NotTo(HaveOccurred())
    
    err = (&WidgetReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr)
    Expect(err).NotTo(HaveOccurred())
    
    go func() {
        err = mgr.Start(ctrl.SetupSignalHandler())
        Expect(err).NotTo(HaveOccurred())
    }()
})

var _ = AfterSuite(func() {
    By("tearing down the test environment")
    err := testEnv.Stop()
    Expect(err).NotTo(HaveOccurred())
})

var _ = Describe("Widget controller", func() {
    const (
        widgetName      = "test-widget"
        widgetNamespace = "default"
    )
    
    Context("When creating a Widget", func() {
        It("Should update the status with replicas information", func() {
            By("Creating a new Widget")
            ctx := context.Background()
            widget := &widgetsv1.Widget{
                ObjectMeta: metav1.ObjectMeta{
                    Name:      widgetName,
                    Namespace: widgetNamespace,
                },
                Spec: widgetsv1.WidgetSpec{
                    Size:     "medium",
                    Replicas: 3,
                },
            }
            Expect(k8sClient.Create(ctx, widget)).Should(Succeed())
            
            By("Checking if the status is updated")
            createdWidget := &widgetsv1.Widget{}
            Eventually(func() string {
                err := k8sClient.Get(ctx, client.ObjectKey{Name: widgetName, Namespace: widgetNamespace}, createdWidget)
                if err != nil {
                    return ""
                }
                return createdWidget.Status.Phase
            }, time.Second*10, time.Millisecond*250).Should(Equal("Running"))
            
            Expect(createdWidget.Status.AvailableReplicas).Should(Equal(int32(3)))
        })
    })
})

Integration testing approaches include:

  1. envtest: Run tests against a temporary control plane
  2. End-to-end scenarios: Test complete workflows
  3. Real resource creation: Validate actual resource creation and reconciliation
  4. Asynchronous testing: Wait for controller reconciliation loops
  5. Test cleanup: Ensure proper resource cleanup after tests

Best Practices and Patterns

Schema Design

Follow these best practices for CRD schemas:

  1. Descriptive fields: Use clear, descriptive field names
  2. Validation: Add comprehensive validation to prevent errors
  3. Required fields: Mark essential fields as required
  4. Defaults: Provide sensible defaults where appropriate
  5. Documentation: Add descriptions to all fields
# Example of well-designed schema
openAPIV3Schema:
  type: object
  properties:
    spec:
      type: object
      description: "Widget specification"
      properties:
        size:
          type: string
          description: "Size of the widget (small, medium, large)"
          enum: [small, medium, large]
          default: medium
        replicas:
          type: integer
          description: "Number of widget replicas to create"
          minimum: 1
          default: 1
        resources:
          type: object
          description: "Resource requirements for widget instances"
          properties:
            memoryLimit:
              type: string
              description: "Memory limit in Kubernetes resource format (e.g., 512Mi)"
              pattern: '^[0-9]+(Ki|Mi|Gi|Ti|Pi|Ei)?$'
            cpuLimit:
              type: string
              description: "CPU limit in Kubernetes resource format (e.g., 500m)"
              pattern: '^[0-9]+m?$'
      required: [size]

Controller Implementation

Implement controllers following these patterns:

  1. Idempotency: Ensure multiple reconciliations produce the same result
  2. Eventual consistency: Design for eventual consistency, not immediate
  3. Error handling: Properly handle and report errors
  4. Exponential backoff: Use appropriate retry mechanisms
  5. State management: Use status to track and report state
// Example of a well-structured reconcile function
func (r *WidgetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("widget", req.NamespacedName)
    
    // Fetch the Widget instance
    widget := &widgetsv1.Widget{}
    err := r.Get(ctx, req.NamespacedName, widget)
    if err != nil {
        if errors.IsNotFound(err) {
            // Widget was deleted, nothing to do
            return ctrl.Result{}, nil
        }
        log.Error(err, "Failed to get Widget")
        return ctrl.Result{}, err
    }
    
    // Handle finalizers
    if widget.DeletionTimestamp != nil {
        return r.handleDeletion(ctx, widget)
    }
    
    // Ensure finalizer
    if !containsString(widget.Finalizers, widgetFinalizerName) {
        widget.Finalizers = append(widget.Finalizers, widgetFinalizerName)
        if err := r.Update(ctx, widget); err != nil {
            log.Error(err, "Failed to update Widget with finalizer")
            return ctrl.Result{}, err
        }
        // Return here to avoid processing a stale object
        return ctrl.Result{Requeue: true}, nil
    }
    
    // Check if Deployment exists, create if not
    deployment := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: widget.Name, Namespace: widget.Namespace}, deployment)
    if err != nil && errors.IsNotFound(err) {
        // Create the Deployment
        deployment = r.deploymentForWidget(widget)
        log.Info("Creating a new Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
        err = r.Create(ctx, deployment)
        if err != nil {
            log.Error(err, "Failed to create Deployment")
            return ctrl.Result{}, err
        }
        // Deployment created, requeue to check its status
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Deployment")
        return ctrl.Result{}, err
    }
    
    // Update the Deployment if it doesn't match the Widget spec
    if widget.Spec.Replicas != *deployment.Spec.Replicas {
        deployment.Spec.Replicas = &widget.Spec.Replicas
        err = r.Update(ctx, deployment)
        if err != nil {
            log.Error(err, "Failed to update Deployment")
            return ctrl.Result{}, err
        }
        // Requeue to check status after update
        return ctrl.Result{Requeue: true}, nil
    }
    
    // Update Widget status
    if widget.Status.AvailableReplicas != deployment.Status.AvailableReplicas {
        widget.Status.AvailableReplicas = deployment.Status.AvailableReplicas
        widget.Status.Phase = "Running"
        err = r.Status().Update(ctx, widget)
        if err != nil {
            log.Error(err, "Failed to update Widget status")
            return ctrl.Result{}, err
        }
    }
    
    // Requeue periodically to ensure state remains consistent
    return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}

Resource Management

Follow these resource management best practices:

  1. Declarative design: Design resources declaratively, not imperatively
  2. Single responsibility: Each CRD should have a clear, focused purpose
  3. Logical grouping: Group related fields and functionalities
  4. Progressive disclosure: Simple defaults with optional advanced configuration
  5. Composition over inheritance: Compose resources rather than building hierarchies

Conclusion

Custom Resource Definitions represent a powerful extension point in Kubernetes, enabling platform teams to create domain-specific abstractions that leverage the full capabilities of the Kubernetes API machinery. By following the patterns and best practices outlined in this guide, you can create robust, user-friendly custom resources that integrate seamlessly with the Kubernetes ecosystem.

The journey from basic CRDs to sophisticated controllers and operators allows for incremental complexity, starting with simple resource definitions and gradually adding advanced features like validation, conversion, status management, and complex reconciliation logic. This progressive approach helps teams build production-ready extensions to Kubernetes that meet their specific application and platform needs.

As Kubernetes continues to evolve as a cloud-native application platform, CRDs and controllers have become essential tools for creating higher-level abstractions, enabling GitOps workflows, and building self-service internal developer platforms that increase productivity while maintaining operational excellence.