$Hand writing mathematical equations on paper, representing the precision and composability of platform engineering as a formal system$

Photo: ramdlon / Pixabay

Unifying Platforms, AI, and Operators

We've written about three foundational CNCF whitepapers (and why we build on CNCF and open source in the first place):

Platforms, the why and how of Internal Developer Platforms
Cloud Native AI, running AI/ML workloads on Kubernetes
Operator Pattern, encoding operational expertise into software

Each whitepaper is excellent in isolation. But the real power emerges when you read them together, because they describe three layers of the same system.

At Scaletific, we've spent the last year building GoldenPath IDP with exactly this unified model in mind. This article explains how the three pillars fit together, where they overlap, and what the resulting architecture looks like.

The Three Layers

Think of modern cloud native infrastructure as three interconnected layers:

┌─────────────────────────────────────────────────┐
│              PLATFORM LAYER                      │
│  Portal · Golden Paths · Self-Service · Govern.  │
├─────────────────────────────────────────────────┤
│           INTELLIGENCE LAYER                     │
│  RAG Pipelines · LLM Agents · Vector DBs · Graph │
├─────────────────────────────────────────────────┤
│            OPERATOR LAYER                        │
│  Controllers · Reconciliation · Domain Knowledge  │
├─────────────────────────────────────────────────┤
│           KUBERNETES + CLOUD                     │
└─────────────────────────────────────────────────┘

The Platform Layer (CNCF Platforms whitepaper) provides the human interface, the portals, golden paths, documentation, and governance that developers interact with daily.

The Intelligence Layer (CNCF Cloud Native AI whitepaper) provides cognitive capabilities, RAG pipelines that answer questions, LLM agents that generate code, vector databases that power semantic search, and ML models that detect anomalies.

The Operator Layer (CNCF Operator whitepaper) provides the automation backbone, controllers that continuously reconcile desired state with reality, encoding operational expertise into software that runs 24/7.

Where the Layers Intersect

The real value isn't in any single layer. It's in the connections between them.

Platforms + Operators: Self-Healing Infrastructure

The Platforms whitepaper calls for self-service delivery and security by default. The Operator whitepaper explains how to implement that: through custom controllers that encode provisioning, upgrading, backup, and auto-remediation logic.

When a developer requests a new database through the platform portal, an operator handles the actual provisioning, configuration tuning, backup scheduling, and ongoing health management. The platform provides the interface; the operator provides the automation.

In GoldenPath IDP, this manifests as:

Backstage service catalog (platform portal) backed by Terraform modules (infrastructure operators)
CI/CD pipelines that validate governance policies (continuous reconciliation)
Certified scripts that encode operational procedures with automated verification

Platforms + AI: Intelligent Developer Experience

The Platforms whitepaper emphasises cognitive load reduction. The Cloud Native AI whitepaper provides the toolkit to make that reduction intelligent.

Instead of static golden path templates, imagine:

A RAG pipeline that answers "how do I deploy to staging?" by searching your ADRs, runbooks, and past incident reports
An LLM agent that generates Terraform modules based on natural language descriptions, pre-validated against your governance policies
Anomaly detection that identifies when a deployment pattern deviates from your golden path

In GoldenPath IDP, this manifests as:

RAG-powered documentation search across 183+ ADRs and 678+ docs
Hybrid retrieval (vector + graph) for context-aware answers
AI-assisted code generation with governance guardrails

AI + Operators: Autonomous Operations

The Cloud Native AI whitepaper notes that AI can enhance cloud native operations through pattern analysis, anomaly detection, and natural language interfaces. The Operator whitepaper provides the control loop that acts on those insights.

Combine them and you get autonomous operations:

ML models detect that query latency is increasing
The operator's control loop receives this signal
Domain knowledge encoded in the operator determines the correct remediation (add a read replica, not just scale pods)
The action is executed, verified, and logged

This isn't science fiction, it's the logical extension of both whitepapers' recommendations.

The Unified Architecture

Here's how we think about the unified model at Scaletific.

Layer 1: Foundation

Kubernetes as the orchestration layer. Cloud provider services for managed databases, object storage, and networking. GPU scheduling for AI workloads via Volcano, Kueue, and Dynamic Resource Allocation.

Layer 2: Operators

Infrastructure operators (Terraform, Helm-based controllers). Application operators for database, cache, and message queue lifecycle. Governance operators for policy enforcement and compliance checking. AI/ML operators including Kubeflow Training Operator and KServe.

Layer 3: Intelligence

Vector databases (ChromaDB, Milvus) for semantic search. Graph databases (Neo4j) for relationship-aware retrieval. RAG pipelines for context-aware question answering. LLM integration with guardrails and observability. ML models for anomaly detection and pattern recognition.

Layer 4: Platform

Backstage portal for service discovery and provisioning. Golden path templates with governance validation. Self-service workflows backed by operators. AI-powered documentation and assistance. Observability dashboards via OpenTelemetry, Prometheus, and Grafana.

Cross-Cutting: Governance

30+ automated policies enforced at every layer
Architecture Decision Records capturing rationale
Certified scripts with CI-validated compliance
RBAC and security scoping per the Operator whitepaper's recommendations
Cost tracking and sustainability reporting per the CNAI whitepaper

Why a Unified Model Matters

Most organisations build these layers in silos. The platform team builds a portal. The ML team builds AI pipelines. The SRE team builds operators. Nobody talks to each other. This is the same problem that plagues platform engineer hiring, treating each domain as isolated rather than integrated.

The result is fragmentation, three separate systems with three separate interfaces, three separate governance models, and three separate failure modes.

A unified model means:

One governance framework, policies apply consistently across platform actions, AI pipeline outputs, and operator reconciliation
One observability stack, OpenTelemetry traces flow from the portal through the AI pipeline into the operator and back
One golden path, developers don't need to know which layer handles their request
One security model, RBAC, network policies, and audit logging applied uniformly

What We're Building With GoldenPath IDP

This isn't a whiteboard exercise. GoldenPath IDP is our production implementation of the unified model, and each layer is already delivering value.

The Platform Layer (Live)

Developers interact with a Backstage service catalog that lets them discover services, provision infrastructure, and follow golden path templates, all without filing tickets. Behind the scenes, 30+ governance policies run automatically in CI, validating every change against our architecture standards. There are 183 Architecture Decision Records that capture not just what we built, but why, making onboarding faster and decisions auditable.

This is the CNCF Platforms whitepaper in practice: self-service, documentation-first, governance by default.

The Automation Layer (Live)

Every infrastructure change flows through Terraform modules that act as our operators, declaring desired state and continuously reconciling it. Our 89 certified scripts encode operational procedures that used to live in people's heads: deployment sequences, migration steps, rollback procedures. CI-driven policy reconciliation ensures that what's deployed always matches what's declared.

This is the CNCF Operator whitepaper in practice: domain knowledge codified into software that runs 24/7.

The Intelligence Layer (In Development)

We're building a RAG pipeline that combines vector search (ChromaDB) with graph-based retrieval (Neo4j) to answer questions across our entire documentation corpus. Ask "how do I deploy to staging?" and it searches ADRs, runbooks, and past incident reports to give you a context-aware answer, not a generic wiki link.

This is the CNCF Cloud Native AI whitepaper in practice: AI workloads running on cloud native infrastructure, governed by the same policies as everything else.

What Comes Next

The real unlock is when these layers talk to each other. Imagine an AI agent that detects a governance violation in a pull request, explains why it violates the policy by citing the relevant ADR, and suggests a compliant alternative, all before a human reviewer even looks at it.

That's the unified model. That's where we're headed.

The Invitation

We believe the future of platform engineering is intelligent, automated, and continuously reconciled.

The three CNCF whitepapers provide the theoretical foundation. GoldenPath IDP is our proof that the unified model works in practice.

If you're building an Internal Developer Platform and wondering how AI fits in, or if you're building AI infrastructure and wondering how governance scales, we've been down both roads and found they converge.

Read the whitepapers that inform this model:

Want to explore the unified model for your organisation? Start a conversation, we'll show you what's possible.

Unifying Platforms, AI, and Operators: The Scaletific Model