Declarative Operators: A New Model for Kubernetes Extensibility

Orkestra Project — March 2026

Abstract

Kubernetes operators encode domain knowledge as reconciliation logic. Every major operator framework to date requires this logic to be written in a programming language, compiled into a binary, and deployed as a separate long-running process — one per CRD. The result, at scale, is operator sprawl: dozens of binaries each with independent lifecycle, observability, and resource consumption.

This paper presents a different model. We argue that the operator pattern is fundamentally correct, but its implementation has been conflated with its mechanism. The pattern requires one reconciler per CRD with proper isolation. It does not require one binary per CRD. When a shared runtime provides the isolation, each CRD becomes a complete, independent operator — with its own informer, worker pool, workqueue, health endpoint, and metrics — while the operational overhead of running them collapses to a single process.

We describe Orkestra, a runtime for declarative operators built on this principle. Users declare CRDs and their reconcile behavior in a YAML Katalog. The runtime interprets these declarations, provides full operator lifecycle, and composes multiple Katalogs through a Komposer model. We demonstrate that this approach eliminates code generation, build pipelines, and per-operator deployments while preserving and strengthening the isolation properties that operator frameworks seek.

1. The Operator Pattern and Its Costs

1.1 The Original Model

The operator pattern, introduced in 2016, proposed encoding operational knowledge as a reconciliation loop: a controller that watches a custom resource and continuously drives the cluster toward the desired state declared in it. The pattern was correct. Its implementation required intimate familiarity with client-go internals — informers, workqueues, REST mappers, and scheme registration. The business logic was a small fraction of the total code. The remainder was identical across every operator ever written.

1.2 Frameworks Address Boilerplate

Kubebuilder, Operator SDK, and controller-runtime addressed the boilerplate by generating the common parts. The operator developer could focus on the reconcile function rather than its surrounding infrastructure. This was genuine progress.

The cost did not disappear. The generated project still required Go, a build pipeline, an image registry, and a deployment manifest. Adding a new CRD meant adding a new Go type, running code generation, rebuilding the binary, pushing the image, and rolling the deployment. The development loop was compressed but not eliminated.

More significantly, the framework design encoded an assumption that would compound over time: one operator per CRD, one binary per operator.

1.3 The Operator Sprawl Problem

The one-binary-per-CRD assumption produces predictable operational overhead at scale. A production cluster running Prometheus, Cert Manager, External Secrets, an Ingress controller, a service mesh, and a collection of internal CRDs routinely runs twenty to fifty operator processes. Each consumes memory and CPU even when idle. Each maintains its own informer cache, duplicating watch traffic against the API server. Each has its own RBAC configuration, its own metrics endpoint (if any), its own upgrade cadence, and its own health story.

Platform engineers managing these clusters do not have one observability problem. They have fifty. Understanding why an application failed requires consulting multiple dashboards, multiple log streams, multiple health endpoints — each with its own format, its own conventions, and its own operational vocabulary.

This is operator sprawl. It is not a consequence of the operator pattern. It is a consequence of the assumption that operator equals binary.

2. Reframing: The Super-Operator Model

The Kubernetes community has long held that the correct design is one operator per CRD. This paper agrees — and argues that Orkestra fulfills this principle more completely than previous frameworks.

The confusion lies in what “one operator” means. In traditional frameworks, it means one process, one binary, one deployment. The per-CRD isolation is an architectural intention, but it is implemented at the process boundary — not within the runtime.

Orkestra makes the isolation explicit and structural. Each CRD in Orkestra receives its own isolated operator stack:

Informer — watches exactly one GVK with its own resync interval
Workqueue — independent depth, backoff, and rate limiting
Worker pool — dedicated goroutines; no other CRD can consume them
Reconciler — interprets this CRD’s templates and hooks only
Health endpoint — /katalog/{crd}/health per CRD
Metrics — five Prometheus metrics, all labeled by GVK
Failure domain — a panic in one reconciler is recovered; others continue

These components are hosted by a shared runtime that provides the infrastructure — API server connections, leader election, dependency ordering, the informer factory, the queue registry. The runtime is infrastructure. The operator stack per CRD is the tenant.

This is the super-operator model: each CRD becomes a complete, production-grade operator. They share infrastructure the way microservices share a Kubernetes cluster — not by merging their logic, but by running on common platforms.

3. Orkestra

3.1 The Katalog

A Katalog is a YAML document that declares one or more CRDs and how they should be reconciled. It is the unit of operator definition in Orkestra.

apiVersion: orkestra.orkspace.io/v1
kind: Katalog
metadata:
  name: website-operator
spec:
  crds:
    - name: website
      workers: 3
      resync: 30s
      apiTypes:
        group: demo.orkestra.io
        version: v1alpha1
        kind: Website
        plural: websites
      operatorBox:
        default: true
        onCreate:
          deployments:
            - image: "{{ .spec.image }}"
              replicas: "{{ .spec.replicas }}"
              reconcile: true
          services:
            - port: "80"
              targetPort: "{{ .spec.port }}"
              reconcile: true

This is a complete operator declaration. ork run --file katalog.yaml starts the runtime. Every Website CR triggers a reconcile that creates and drift-corrects a Deployment and Service. Deletion cascades via owner references. Finalizers ensure cleanup completes before CR removal.

The field reconcile: true on each resource means it is also corrected on every reconcile cycle — not just created. Drift is detected and corrected without additional configuration.

3.2 The Dynamic Client Model

Orkestra operates on unstructured CRDs by default — the same *unstructured.Unstructured representation Kubernetes uses internally. The API types (apiTypes.location) field is optional, needed only when Go hooks require concrete type assertions. This distinction matters for three reasons.

First, it eliminates the code generation step for the common case. The cluster already holds the CRD schema. Orkestra reads it at startup via the discovery API. The user does not need to replicate it in Go structs.

Second, it enables watching any CRD — including ones the operator does not own. A Katalog entry with kind: Deployment and no apiTypes.location is sufficient for Orkestra to watch all Deployments in the cluster. The cluster knows the schema. Orkestra asks for it.

Third, it is the mechanism that makes multi-version CRDs tractable. Each version of a CRD is registered as a separate Katalog entry. Each version gets its own complete operator stack. The conversion logic is declared alongside the reconcile logic, interpreted by the same template resolver.

3.3 The Template Resolver

Orkestra’s template resolver evaluates Go text/template expressions against the live CR object at reconcile time. This is the mechanism through which declarative templates become Kubernetes API calls.

deployments:
  - name: "{{ .metadata.name }}"
    image: "{{ .spec.image }}"
    replicas: "{{ .spec.replicas }}"
    reconcile: true

The resolver resolves each field against the CR’s unstructured map before calling the OrkestraRegistry. The OrkestraRegistry handles the Kubernetes API calls — create, update, delete, owner references, idempotency — for each resource type. Adding a new resource type to the registry is a single file addition.

3.4 Dependency Ordering

CRDs can declare dependencies:

crds:
  - name: project
    dependsOn: []
  - name: namespace
    dependsOn: [project]
  - name: application
    dependsOn: [project, namespace]

Orkestra computes the topological order from the dependency graph and starts CRDs in that order. Each CRD waits for its dependencies to signal readiness before its workers start. Missing CRDs — declared but not yet installed in the cluster — are retried in the background without blocking healthy CRDs.

This capability is structurally impossible with separate operators. Separate processes have no coordination mechanism. Orkestra provides it as a declared property of the Katalog.

Shutdown runs in reverse dependency order. No partial reconciliations. No orphaned resources.

3.5 The Komposer

A Komposer composes Katalogs from multiple sources into one runtime:

apiVersion: orkestra.orkspace.io/v1
kind: Komposer
metadata:
  name: platform-komposer
imports:
  files:
    - ./katalogs/website.yaml
    - https://platform.myorg.io/crds/database.yaml
    - url: https://private.myorg.io/crds/internal.yaml
      auth:
        type: bearer
        fromEnv: PLATFORM_TOKEN
  helm:
    - repo: https://charts.myorg.io
      chart: platform-crds
      version: 2.1.0
  registry:
    - katalog:
        application:
          version: v1.4.0
spec:
  crds:
    # Inline override — wins on name conflict with any source
    - name: application
      workers: 8

Orkestra’s in-built merger resolves all sources, deduplicates by CRD name, and produces one validated configuration. Inline spec.crds are merged last and override source definitions — the mechanism for environment-specific configuration without forking source Katalogs.

This is the pattern that Helm brought to deployment manifests, applied to operator behavior. Platform teams publish Katalogs. Application teams compose and selectively override.

3.6 Observability

Every CRD managed by Orkestra automatically exposes:

GET /katalog                   All CRDs — health, dependency graph, stats
GET /katalog/{crd}             Single CRD — config, reconcile stats
GET /katalog/{crd}/health      200 healthy / 503 degraded
GET /metrics                   Prometheus metrics for all CRDs

Five metrics, all per-CRD, all labeled by full GVK:

controller_reconcile_total{crd, result}
controller_reconcile_duration_seconds{crd}
controller_queue_depth{crd}
controller_workers_active{crd}
controller_resource_count{crd}

This unified observability is a structural consequence of the single-runtime model. Separate operators cannot provide it.

4. Multi-Version CRDs: Declarative Conversion

4.1 The Standard Approach

Kubernetes CRDs support multiple API versions through a conversion webhook. When a client requests a version different from the storage version, the API server sends a ConversionReview request to the webhook. The webhook must return the converted objects.

The standard implementation requires writing conversion functions in Go, deploying a separate webhook server, managing TLS certificates, configuring the CRD to point to the webhook, and maintaining conversion logic as versions evolve. For a change as simple as adding a field, this infrastructure overhead frequently exceeds the development cost of the change itself.

4.2 Conversion as a Consequence of the Super-Operator Model

The super-operator model makes declarative conversion architecturally natural. Each version of a CRD is a separate Katalog entry with its own complete operator stack — its own informer watching that specific version, its own workers, its own reconciler. The version boundary is a first-class concept in the runtime.

Conversion rules are declared alongside reconcile templates using the same template resolver:

- name: website-v1
  apiTypes:
    group: demo.orkestra.io
    version: v1
    kind: Website
  conversion:
    storageVersion: v1
    paths:
      - from: v1alpha1
        to: v1
        spec:
          image: "{{ .spec.image }}"
          replicas: "{{ .spec.replicas }}"
          seo:
            enabled: false   # default — v1alpha1 has no seo field
      - from: v1
        to: v1alpha1
        spec:
          image: "{{ .spec.image }}"
          replicas: "{{ .spec.replicas }}"
          theme: "default"   # default — v1 has no theme field

Orkestra’s HTTPS server serves the /convert endpoint. The conversion handler resolves each path’s spec template against the source object and returns the converted objects. The CRD’s conversion block points to this endpoint.

4.3 Production Results

The following is from a live deployment managing both versions of the Website CRD:

{
  "name": "website-v1alpha1",
  "conversion": {
    "enabled": true,
    "total": 62,
    "success": 62,
    "failures": 0,
    "avgLatencyMs": 0.5,
    "p95LatencyMs": 1.2
  }
}

orkestra_conversion_requests_total{kind="Website",from="v1alpha1",to="v1",result="success"} 14
orkestra_conversion_requests_total{kind="Website",from="v1",to="v1alpha1",result="success"} 17
orkestra_conversion_duration_seconds_sum{from="v1alpha1",to="v1"} 0.007

62 successful conversions. Zero failures. Sub-millisecond average latency. Zero lines of Go written for the conversion. Zero additional deployments. Zero TLS certificates to manage.

5. Addressing the One-Operator-Per-CRD Principle

5.1 The Principle is Correct

The Operator SDK best practices document states: “Avoid a design solution where more than one Kind is reconciled by the same controller.” The concerns articulated are encapsulation, Single Responsibility, cohesion, and preventing unexpected side effects between CRDs.

These concerns are valid. Orkestra does not disagree with them.

5.2 The Principle Refers to Reconciler Logic, Not Process Boundaries

The Operator SDK’s concerns are about reconciler logic — that a reconciler for Website should not contain logic for Database. Orkestra enforces this more strictly than any previous framework.

In Orkestra, the reconciler for Website is a closure that closes over exactly the Katalog entry for Website. It has no access to the reconciler for Database. It cannot affect the Database workqueue. It cannot observe the Database informer. The isolation is structural.

What is shared is the orchestration infrastructure: the API server connection, the informer factory, the queue registry, the health server, the leader election lease. This is analogous to how the Kubernetes kube-controller-manager runs the Deployment controller, the ReplicaSet controller, the Endpoint controller, and dozens of others in a single process — each isolated, each maintaining the Single Responsibility Principle, all sharing infrastructure.

No one argues that kube-controller-manager violates good design. Orkestra applies the same principle to user-defined operators.

5.3 The Shared Failure Domain

The legitimate remaining concern is that a single process is a single failure domain. If the Orkestra process crashes, all CRD operators stop together.

This is addressed the same way Kubernetes addresses it for kube-controller-manager: multiple replicas with leader election and warm informer caches. Followers are not idle — they maintain synced informer caches. When the leader fails, a follower takes over within the lease renewal period, already warm.

Within the process, Orkestra’s safeReconcile wrapper recovers from panics in individual reconcilers. A nil pointer dereference in the Website reconciler does not crash the Database reconciler.

6. What This Replaces

Traditional	Orkestra
Go operator binary per CRD	Katalog YAML
Kubebuilder scaffolding	`ork init`
One deployment per CRD	One runtime per operator surface
Per-operator health endpoints	Unified `/katalog/*` API
Per-operator metrics (or none)	Unified per-CRD metrics
Manual dependency management	Declared `dependsOn`
Helm chart per operator	Komposer
Conversion webhook binary + TLS	Conversion rules in Katalog
Operator sprawl	One runtime

7. Conclusion

The operator pattern is the right abstraction for Kubernetes extensibility. The requirement to implement one binary per CRD has been a constraint of implementation convention, not of the pattern itself.

Orkestra demonstrates that when a runtime provides per-CRD isolation structurally — through dedicated informers, workqueues, worker pools, and failure domains — the isolation properties the community seeks are preserved and strengthened. Each CRD becomes a complete, production-grade operator. They share only the infrastructure that makes this possible.

The consequences extend beyond simplification. When each CRD version is its own operator entry, multi-version conversion becomes a declaration rather than an infrastructure project. When operators are YAML declarations, they become composable through the same mechanisms as any other Kubernetes resource. When one runtime watches all managed CRDs, unified observability is a structural consequence, not an integration project.

Operators become data, not code. They are composed, not programmed. They are versioned, shared, and reused like any other Kubernetes resource.

Kubernetes made infrastructure declarative. Orkestra makes the operators that extend Kubernetes declarative. The same principle, applied one level up.

Orkestra — Declarative Operators for Kubernetes March 2026 — https://github.com/orkspace/orkestra