SLIM MVP: Enterprise AI Agent Fleet — Multicluster Secure Communications

Related issue: #1372 — Epic: SLIM multicluster autoconfig installation for server fleets

Executive Summary
Business Problem
Solution: SLIM as the Communication Backbone
Architecture
Use Case: Enterprise Cluster-Monitoring Agent Fleet
Communication Flows
Security Model
Demo Scenario
Key Value Propositions
Implementation Notes

1. Executive Summary

This MVP demonstrates how SLIM (Secure Low-Latency Interactive Messaging) enables AI agents deployed across multiple Kubernetes clusters — including clusters hidden behind corporate VPNs and firewalls — to communicate securely without exposing any agent as a network server.

Agents subscribe to named channels (topics) matching the problem they need to solve. When a cluster event fires, the relevant agents wake up, collaborate through SLIM, and escalate to a human operator if needed. All of this works across network boundaries that would otherwise require complex firewall rules, VPN tunnels, or service mesh configuration per agent.

Core value proposition: SLIM turns the hardest part of multi-cluster agent communication — the network — into a non-problem, while making the system more secure, not less.

2. Business Problem

2.1 The Enterprise Fleet Reality

Large enterprises operate AI agent fleets across tens or hundreds of Kubernetes clusters. These clusters span different environments:

Environment	Network constraint
On-prem data centers	Corporate firewall / private network
Branch offices	VPN-only access
Cloud regions	Transit VPC, private endpoints
Edge / OT networks	Strictly air-gapped or NAT-only outbound

Each cluster runs specialized AI agents:

Performance agents — track CPU, memory, latency, and SLO compliance
Security agents — detect anomalies, policy violations, CVEs in running images
Remediation agents — attempt automated fixes (node drain, pod restart, config rollback)
Escalation agents — page a human operator when automated resolution is insufficient

2.2 What Makes Agent Communication Hard

Traditional approaches require agents to expose themselves as servers:

Agent A ──► opens TCP port ──► registered in service registry ──► firewall rule
Agent B ──► DNS lookup ──► TLS handshake ──► mTLS cert rotation ──► calls Agent A

This creates a dense web of operational complexity:

Every cluster behind a VPN requires explicit firewall/NAT rules per agent pair
Every agent needs a stable DNS name and TLS certificate, even for ephemeral workloads
Human operators wanting to observe or intervene must be given network-level access to the relevant cluster
Agent-to-agent topology changes (scale-up, migration) invalidate stale service registry entries and routing rules

2.3 Requirements for the MVP

Requirement	Description
Zero exposed ports per agent	Agents must not listen on any TCP/UDP port
Cross-cluster messaging	Agents on Cluster A and Cluster B communicate transparently
Works behind VPN / NAT	Outbound-only connectivity from cluster to SLIM overlay is sufficient
Event-driven wake-up	Agents are idle until a relevant event arrives on their channel
Multi-channel membership	A single agent can participate in multiple topic channels simultaneously
Human-in-the-loop	Operators can observe, validate, and act via the same channel mechanism
End-to-end encryption	Agent messages are encrypted with MLS; SLIM nodes cannot read payloads
Workload identity	SPIRE provides SPIFFE certificates — no shared secrets, no static API keys

3. Solution: SLIM as the Communication Backbone

3.1 What SLIM Provides

SLIM is a publish-subscribe message router with a session layer. Its architecture has three components that cleanly separate concerns:

flowchart TB
  subgraph APP["Agent / Application"]
    RPC["SLIMRPC (request/response, streaming)"]
    SL["Session Layer (MLS encryption, group mgmt)"]
    DPC["Data Plane Client (routing, transport)"]
    RPC --> SL --> DPC
  end
  DPC -->|"outbound connection only"| NL["SLIM Routing Node\n(cluster-local)"]
  NL -->|"mTLS (SPIRE-issued certs)"| NR["SLIM Routing Node\n(remote cluster)"]

3.2 The Channel Model

Every agent is identified by a hierarchical name: <org>/<namespace>/<agent-type>[#id].

Agents subscribe to a named channel topic. The SLIM controller sees these subscriptions and automatically creates routes between the cluster-local SLIM node and any remote nodes that have subscribers for the same topic.

channel: acme/monitoring/security-incident
          │       │            │
          │       │            └── topic (problem domain)
          │       └── namespace (cluster or team scope)
          └── org (enterprise identifier)

Key properties:

An agent subscribes with no knowledge of the remote agent’s address — only the channel name
The SLIM router handles delivery; the agent never opens a listening port
Multiple agents can subscribe to the same channel → multicast delivery
An agent can subscribe to multiple channels simultaneously

3.3 Why SLIM Solves the Network Problem

Traditional approach	SLIM approach
Agent opens a port and registers in a service registry	Agent connects outbound to the local SLIM node — no inbound port
Firewall rules required for every agent pair	Only the SLIM node itself needs one exposed endpoint per cluster
DNS + TLS cert per agent	Workload identity via SPIRE SVID — automatic, no manual cert management
Service mesh or VPN tunnel between every cluster pair	SLIM nodes federate; agents are completely unaware of cluster topology
Agent scale-out changes routing config	Agents with the same name auto-form a group; SLIM load-balances

4. Architecture

4.1 High-Level System Architecture

---
config:
  layout: elk
---
flowchart TB

  subgraph CLOUD["Cloud / Admin Cluster (admin.example)"]
    direction TB
    CTRL["SLIM Controller\n(route management)"]
    SPIRE_ADMIN["SPIRE Server (root)\ntrust domain: acme.example\nspire-root.admin.example:8081"]
    LB_CTRL["LoadBalancer\nslim-control.admin.example:50052"]
    LB_CTRL --> CTRL
  end

  subgraph CLUSTER_A["Cluster A — On-prem / VPN"]
    direction TB
    SPIRE_A["SPIRE Server (nested)\nno external endpoint"]
    subgraph SLIM_A_NS["slim namespace"]
      SLIM_A["SLIM Node (StatefulSet)\nslim.cluster-a.example:46357"]
    end
    subgraph AGENTS_A["default namespace"]
      PERF_A["Performance\nAgent"]
      SEC_A["Security\nAgent"]
      REMED_A["Remediation\nAgent"]
    end
    PERF_A -->|"outbound only"| SLIM_A
    SEC_A  -->|"outbound only"| SLIM_A
    REMED_A-->|"outbound only"| SLIM_A
  end

  subgraph CLUSTER_B["Cluster B — Cloud region"]
    direction TB
    SPIRE_B["SPIRE Server (nested)\nno external endpoint"]
    subgraph SLIM_B_NS["slim namespace"]
      SLIM_B["SLIM Node (StatefulSet)\nslim.cluster-b.example:46357"]
    end
    subgraph AGENTS_B["default namespace"]
      PERF_B["Performance\nAgent"]
      SEC_B["Security\nAgent"]
      REMED_B["Remediation\nAgent"]
    end
    PERF_B -->|"outbound only"| SLIM_B
    SEC_B  -->|"outbound only"| SLIM_B
    REMED_B-->|"outbound only"| SLIM_B
  end

  subgraph OPERATOR["Operator Workstation / Jump Host"]
    HUMAN["👤 Human Operator\n(observer / approver)"]
    HUMAN -->|"outbound to any SLIM node"| SLIM_A
  end

  SLIM_A <-->|"mTLS (SPIRE-issued certs)\noutbound from each side"| SLIM_B
  SLIM_A -->|"gRPC controller connection"| CTRL
  SLIM_B -->|"gRPC controller connection"| CTRL
  SPIRE_A -->|"nested upstream (outbound)"| SPIRE_ADMIN
  SPIRE_B -->|"nested upstream (outbound)"| SPIRE_ADMIN

SPIRE nested deployment: Workload-cluster SPIRE servers connect outbound to the admin root SPIRE server as nested (downstream) servers. Only the admin SPIRE server requires an external endpoint — workload SPIRE servers never need to be reachable from outside their own cluster. See SPIRE nested architecture.

4.2 Network Topology — What Crosses the Firewall

The diagram below shows exactly which connections must be allowed through firewalls or VPN gateways. Only SLIM nodes and SPIRE servers open outbound connections — agents and workload SPIRE servers never listen on externally reachable ports.

flowchart LR
  subgraph FW_A["Firewall — Cluster A"]
    direction TB
    note_a["Allow outbound to:\nslim-control.admin.example:50052\nslim.cluster-b.example:46357\nspire-root.admin.example:8081"]
  end
  subgraph FW_B["Firewall — Cluster B"]
    direction TB
    note_b["Allow outbound to:\nslim-control.admin.example:50052\nslim.cluster-a.example:46357\nspire-root.admin.example:8081"]
  end

  AGENT_A["Agent (Cluster A)\nno inbound port"] -->|outbound| SLIM_NODE_A["SLIM Node\n(cluster-a)"]
  SLIM_NODE_A -->|"mTLS outbound"| FW_A
  FW_A --> SLIM_NODE_B["SLIM Node\n(cluster-b)"]
  SLIM_NODE_B --> AGENT_B["Agent (Cluster B)\nno inbound port"]

  AGENT_B -->|outbound| SLIM_NODE_B
  SLIM_NODE_B -->|"mTLS outbound"| FW_B
  FW_B --> SLIM_NODE_A

  SPIRE_A["SPIRE Server\n(nested, cluster-a)"] -->|"nested upstream\noutbound"| FW_A
  SPIRE_B["SPIRE Server\n(nested, cluster-b)"] -->|"nested upstream\noutbound"| FW_B

Firewall rule summary:

Source	Destination	Port	Protocol	Purpose
Cluster A SLIM node	`slim.cluster-b.example`	46357	TCP/mTLS	Data plane inter-cluster
Cluster B SLIM node	`slim.cluster-a.example`	46357	TCP/mTLS	Data plane inter-cluster
Cluster A SLIM node	`slim-control.admin.example`	50052	TCP/gRPC+mTLS	Controller
Cluster B SLIM node	`slim-control.admin.example`	50052	TCP/gRPC+mTLS	Controller
Cluster A SPIRE Server (nested)	`spire-root.admin.example`	8081	TCP/mTLS	SPIRE nested upstream
Cluster B SPIRE Server (nested)	`spire-root.admin.example`	8081	TCP/mTLS	SPIRE nested upstream
Agents (all clusters)	local SLIM node	46357	TCP (local)	Agent ↔ SLIM (cluster-internal)

No agent-to-agent, no agent-to-internet, and no inbound rules for workload SPIRE servers are required.

4.3 Channel Subscription Model

sequenceDiagram
  participant CA as Security Agent (Cluster A)
  participant SA as SLIM Node A
  participant CTRL as Controller (Admin)
  participant SB as SLIM Node B
  participant CB as Security Agent (Cluster B)
  participant OP as Operator

  CA->>SA: subscribe(acme/monitoring/security-incident)
  SA->>CTRL: subscription update
  CTRL->>SB: configure route to Cluster A for channel
  CB->>SB: subscribe(acme/monitoring/security-incident)
  SB->>CTRL: subscription update
  CTRL->>SA: configure route to Cluster B for channel
  OP->>SA: subscribe(acme/monitoring/security-incident)
  Note over SA,SB: All three parties now receive messages on this channel

4.4 Multi-Channel Agent Participation

A single agent can join multiple channels, each scoped to a different problem domain:

flowchart TB
  AGENT["Security Agent (Cluster A)"]
  AGENT --> CH1["acme/monitoring/security-incident\n← security topics"]
  AGENT --> CH2["acme/remediation/cve-patch\n← remediation coordination"]
  AGENT --> CH3["acme/escalation/human-review\n← when it needs a human"]

This is achieved by the agent creating multiple SLIM sessions, each with its own channel name. The SLIM session layer manages group membership independently per channel.

5. Use Case: Enterprise Cluster-Monitoring Agent Fleet

5.1 Scenario Description

Company: ACME Corp Fleet: 20 Kubernetes clusters across 3 regions (US-East, EU-West, APAC) Problem: Security incident detected on a node in EU-West Cluster 7

Clusters in EU-West sit behind a corporate VPN; no inbound ports are permitted
Cluster 7 runs a Security Monitoring Agent and a Remediation Agent
The cloud (US-East admin cluster) hosts the SLIM Controller and an Escalation Handler
A human operator is on-call via their laptop connected to corporate VPN

5.2 Agent Roster

Agent	Channel(s)	Location	Description
`acme/eu-west/security-monitor`	`security-incident`, `escalation`	Cluster 7 (EU)	Detects anomalies, CVEs
`acme/eu-west/remediation`	`security-incident`, `cve-patch`	Cluster 7 (EU)	Automated fixes
`acme/us-east/security-monitor`	`security-incident`	Cluster 1 (US)	Cross-region correlation
`acme/admin/escalation-handler`	`escalation`	Admin cluster	LLM-backed escalation agent
`acme/admin/human-operator`	`escalation`, `security-incident`	Operator laptop	Human observer / approver

5.3 Event Types

Event	Trigger	Severity
Anomalous pod CPU spike	Metrics threshold	Low
CVE detected in running image	Image scan	Medium
Node compromise indicators	Audit log analysis	High
Policy violation cascade	OPA/Gatekeeper alerts	High
Agent escalation	Agent decision	Critical

6. Communication Flows

6.1 Flow A: Automated Security Incident Resolution

sequenceDiagram
  participant K8S as K8s Events (EU-West)
  participant SEC as Security Agent (EU)
  participant SA as SLIM Node A
  participant SB as SLIM Node B
  participant CORS as Security Agent (US-East)
  participant REM as Remediation Agent

  K8S->>SEC: CVE detected in nginx:1.21 — Critical
  SEC->>SA: publish(security-incident, cve=CVE-2024-XXXX image=nginx:1.21)
  SA->>SB: route message (mTLS)
  SB->>CORS: deliver
  SA->>REM: deliver (same cluster)
  CORS->>SB: reply(also_affected=[us-east-3])
  SB->>SA: route reply
  SA->>SEC: deliver reply
  REM->>SA: publish(cve-patch, action=rolling-update target=nginx:1.25)
  Note over SEC,REM: Remediation proceeds autonomously — no human intervention needed

6.2 Flow B: Human-in-the-Loop Escalation

sequenceDiagram
  participant REM as Remediation Agent
  participant SA as SLIM Node A
  participant SC as SLIM Node C (Admin)
  participant ESC as Escalation Handler
  participant OP as Human Operator

  REM->>SA: publish(escalation, reason=cannot-auto-patch severity=CRITICAL)
  SA->>SC: route to admin cluster
  SC->>ESC: deliver to escalation handler
  SC->>OP: deliver to operator (subscribed to channel)
  ESC->>SC: publish(paging operator, preparing runbook)
  OP->>SC: publish(decision=APPROVE_DRAIN node=eu-west-7-node-3)
  SC->>SA: route approval back
  SA->>REM: deliver operator approval
  REM->>SA: publish(security-incident, status=RESOLVED action=node-drain)
  Note over OP,SA: Operator never needed direct access to Cluster 7 — only to SLIM

6.3 Flow C: Agent Joining Multiple Channels

flowchart LR
  subgraph CLUSTER_7["Cluster 7 (EU-West)"]
    SEC_7["Security Agent\nacme/eu-west/security-monitor"]
    SLIM7["SLIM Node"]
    SEC_7 -->|"subscribe × 3"| SLIM7
  end

  CH1["Channel:\nacme/monitoring/security-incident"]
  CH2["Channel:\nacme/remediation/cve-patch"]
  CH3["Channel:\nacme/escalation/human-review"]

  SLIM7 --- CH1
  SLIM7 --- CH2
  SLIM7 --- CH3

  CH1 --- SEC_US["US-East\nSecurity Agent"]
  CH2 --- REM_7["Remediation Agent\n(same cluster)"]
  CH3 --- ESC_ADMIN["Escalation Handler\n(Admin cluster)"]
  CH3 --- OP["👤 Operator"]

6.4 Message Flow Timeline (Full Incident Lifecycle)

gantt
  title Security Incident Lifecycle — SLIM-enabled
  dateFormat  mm:ss
  axisFormat  %M:%S

  section Detection
  CVE scan triggers event        : 00:00, 5s
  Security Agent wakes up        : 00:05, 3s

  section Agent Collaboration
  Publish to security-incident   : 00:08, 2s
  Cross-cluster agents respond   : 00:10, 8s
  Remediation Agent attempts fix : 00:18, 15s

  section Escalation
  Remediation fails, escalate    : 00:33, 3s
  Escalation handler notified    : 00:36, 5s
  Operator receives page         : 00:41, 10s

  section Resolution
  Operator approves action       : 00:51, 5s
  Remediation executes drain     : 00:56, 20s
  Incident closed, channels idle : 01:16, 5s

7. Security Model

7.1 Layered Security Architecture

flowchart TB
  L4["Layer 4: Application-level authorization\nAgents validate sender SPIFFE SVID in message metadata"]
  L3["Layer 3: End-to-end MLS encryption (RFC 9420)\nSLIM routers cannot read agent payloads"]
  L2["Layer 2: Transport mTLS between SLIM nodes\nSPIRE-issued SVID certificates, auto-rotated"]
  L1["Layer 1: Workload identity — SPIFFE/SPIRE\nZero-trust · no static secrets · no shared API keys"]
  L4 --> L3 --> L2 --> L1

7.2 SPIRE Nested Deployment Across Clusters

SPIFFE peer federation requires every SPIRE server to expose an endpoint reachable from the other clusters — not viable in VPN-restricted or air-gapped environments.

SPIRE nested deployment solves this: workload-cluster SPIRE servers act as nested (downstream) servers and connect outbound to the admin root SPIRE server. Only the admin SPIRE server requires an external endpoint.

flowchart TB
  subgraph ADMIN["Admin Cluster — root SPIRE server\nspire-root.admin.example:8081 (public)"]
    SPIRE_ROOT["SPIRE Server (root)\ntrust domain: acme.example"]
    CTRL_SVID["SVID: spiffe://acme.example/slim/controller"]
    SPIRE_ROOT --- CTRL_SVID
  end

  subgraph CLUSTER_A["Cluster A — nested SPIRE server\n(no external endpoint required)"]
    SPIRE_NESTED_A["SPIRE Server (nested)\ntrust domain: acme.example"]
    NODE_A_SVID["SVID: spiffe://acme.example/cluster-a/slim/node-0"]
    AGENT_A_SVID["SVID: spiffe://acme.example/cluster-a/agent/security-monitor"]
    SPIRE_NESTED_A --- NODE_A_SVID
    SPIRE_NESTED_A --- AGENT_A_SVID
  end

  subgraph CLUSTER_B["Cluster B — nested SPIRE server\n(no external endpoint required)"]
    SPIRE_NESTED_B["SPIRE Server (nested)\ntrust domain: acme.example"]
    NODE_B_SVID["SVID: spiffe://acme.example/cluster-b/slim/node-0"]
    AGENT_B_SVID["SVID: spiffe://acme.example/cluster-b/agent/security-monitor"]
    SPIRE_NESTED_B --- NODE_B_SVID
    SPIRE_NESTED_B --- AGENT_B_SVID
  end

  SPIRE_NESTED_A -->|"outbound upstream connection"| SPIRE_ROOT
  SPIRE_NESTED_B -->|"outbound upstream connection"| SPIRE_ROOT

All SVIDs are issued under the single shared trust domain (acme.example). The root SPIRE server is the chain-of-trust anchor; nested servers delegate issuance to their local workloads. SLIM nodes on different clusters can mutually authenticate because they share the same trust domain and their certificates chain to the same root.

7.3 Security Properties

Property	Mechanism	Benefit
No exposed agent ports	SLIM outbound-only model	Eliminates entire attack surface class
Workload identity	SPIRE SVID (X.509 + JWT)	No static credentials — identity is cryptographic
Inter-node transport	mTLS with SPIRE-issued certs	Auto-rotated, zero-touch cert management
Agent payload privacy	MLS group encryption	SLIM routing nodes are zero-knowledge to payloads
Operator access control	Channel subscription + SVID	Operator only subscribes; no cluster-level access needed
Audit trail	SLIM controller events + SPIRE attestation	Full provenance of who joined which channel

8. Demo Scenario

8.1 Environment Setup

The demo uses three kind clusters to simulate the production environment:

Cluster	Role	Simulates
`kind-admin.example`	SLIM Controller + SPIRE Server	Cloud management plane
`kind-cluster-a.example`	SLIM nodes + Security & Remediation Agents	On-prem cluster (VPN-restricted)
`kind-cluster-b.example`	SLIM nodes + Security Agent	Cloud cluster (remote region)

# 1. Start clusters and install SPIRE (nested deployment)
sudo task multi-cluster:up

# 2. Deploy Controller on admin cluster
task controller:deploy

# 3. Deploy SLIM on workload clusters
task slim:deploy

# 4. Deploy agents
task demo:agents:deploy

8.2 Demo Script — Step by Step

Step 1: Show baseline — agents running, channels empty

# Show agents are running but NOT listening on any port
kubectl get pods -n default  # agents running
kubectl exec -it <security-agent-pod> -- ss -tlnp  # no open ports

Step 2: Inject a simulated CVE event

# Inject a CVE detection event into Security Agent on Cluster A
kubectl exec -it <security-agent-pod> -- \
  python3 inject_event.py --type cve --severity critical --image nginx:1.21

Step 3: Watch agents collaborate on the security-incident channel

# Follow the channel in real time (operator view)
slimctl channel subscribe acme/monitoring/security-incident --cluster admin.example

Expected output:

[00:00] eu-west/security-monitor  → CVE-2024-XXXX detected in nginx:1.21
[00:02] us-east/security-monitor  → Confirmed affected: us-east-3 also running nginx:1.21
[00:04] eu-west/remediation       → Attempting rolling update to nginx:1.25
[00:19] eu-west/remediation       → Node tainted, cannot auto-patch — escalating

Step 4: Observe escalation to human operator

# Operator joins escalation channel from their laptop
slimctl channel subscribe acme/escalation/human-review

Expected output:

[00:22] eu-west/security-monitor  → Escalation requested: node-drain required
[00:24] admin/escalation-handler  → Runbook loaded, paging operator
[APPROVAL REQUIRED]

Step 5: Operator approves and watches resolution

# Operator sends approval
slimctl channel publish acme/escalation/human-review \
  '{"decision": "APPROVE_DRAIN", "node": "eu-west-7-node-3"}'

Expected output:

[00:31] eu-west/remediation       → Node drain initiated
[00:51] eu-west/remediation       → Drain complete, workloads rescheduled
[00:52] eu-west/security-monitor  → Incident resolved — CVE remediated

8.3 Demo Talking Points

“Notice that no agent opened a port” — ss -tlnp shows nothing. Yet cross-cluster messaging worked transparently.
“The only firewall rules needed” — point to the two SLIM node endpoints, not the dozens of agent-pair rules a traditional approach would require.
“The operator joined from a laptop” — connected to corporate VPN, authenticated via SPIRE, and joined the channel. No jump host, no kubectl proxy, no cluster-level access.
“MLS means SLIM nodes are zero-knowledge” — the SLIM routers forwarded every byte without being able to read a single word of the agent messages.
“Adding a new cluster is trivial” — register it with SPIRE, deploy SLIM with the correct group name, point it at the controller. No routing rule changes in other clusters.

9. Key Value Propositions

9.1 Simplified Operations

Before SLIM	With SLIM
Firewall rules: O(n²) agent pairs	Firewall rules: O(n) SLIM nodes
DNS + cert per agent	Zero per-agent network config
Service registry maintenance	Channel names are self-describing, DNS-free
VPN access required per cluster	One SLIM endpoint per cluster suffices

9.2 Enhanced Security

Zero server exposure per agent — drastically reduces attack surface
End-to-end MLS encryption — infrastructure operators cannot read agent payloads
Cryptographic workload identity — SPIRE eliminates static credential sprawl
Principle of least privilege — operators observe only what they subscribe to

9.3 Developer Experience

Agents use high-level SLIMRPC or pub/sub APIs — no networking code
Language support: Python, Go, Java, .NET (C#), Kotlin, Rust
Protocol support: A2A, MCP, custom protobuf
Session types: point-to-point, multicast (group), streaming

9.4 Operational Scalability

Dynamic topology — agents join/leave channels without reconfiguration
Auto-routing — SLIM Controller creates inter-cluster routes on first subscription
Fleet-scale — tested with large StatefulSet deployments; single control plane for all
Multi-language agents — Python ML agent ↔ Go infrastructure agent ↔ Java app agent, all on the same channel

10. Implementation Notes

10.1 Tech Stack

Component	Technology
SLIM Node (data plane)	Rust — high-performance message router
SLIM Session Layer	Rust — MLS (RFC 9420) encryption + group management
SLIMRPC	Protobuf + generated stubs (all languages)
SLIM Controller	Go (Kubernetes controller pattern)
Workload identity	SPIRE / SPIFFE
Demo agents	Python (ADK or custom SLIM Python bindings)
Deployment	Kubernetes (Helm charts)
Observability	OpenTelemetry (SLIM has native OTEL support)

10.2 Key APIs Used in Demo Agents

import slim_bindings

# Agent startup — outbound connection only, no listening port
svc = slim_bindings.Service.new(slim_node_addr)
app = svc.create_app_with_secret(agent_name, shared_secret)
conn_id = svc.connect(slim_bindings.ClientConfig.new_with_tls(slim_node_addr))

# Subscribe to a channel
app.subscribe(app.name(), conn_id)

# Join the security-incident group channel
session_cfg = slim_bindings.SessionConfig(
    session_type=slim_bindings.SessionType.MULTICAST,
    mls_enabled=True,
)
session, done = await app.create_session(session_cfg, "acme/monitoring/security-incident")
await done  # wait for group to form

# Publish an event
await app.publish("acme/monitoring/security-incident", payload_bytes)

# Receive messages (event-driven)
async for msg in app.receive():
    await handle_event(msg)

10.3 Helm Deployment Snippet

# values-cluster-a.yaml
slim:
  config:
    services:
      slim/0:
        node_id: "${env:SLIM_SVC_ID}"
        group_name: "cluster-a.example"
        dataplane:
          servers:
            - endpoint: "0.0.0.0:46357"
              metadata:
                local_endpoint: "${env:MY_POD_IP}"
                external_endpoint: "slim.cluster-a.example:46357"
              tls:
                source:
                  type: spire
    controller:
      clients:
        - endpoint: "https://slim-control.admin.example:50052"
          tls:
            source:
              type: spire

10.4 Prerequisites for Demo Environment

kind v0.20+
kubectl v1.28+
helm v3.12+
task (Taskfile runner)
spire-server / spire-agent (deployed via Helm chart in clusters)
SLIM Helm charts (charts/slim, charts/slim-control-plane)
Docker for building demo agent images

Appendix A: Glossary

Term	Definition
SLIM	Secure Low-Latency Interactive Messaging — the transport framework
SLIMRPC	SLIM’s request/response RPC layer built on top of the session layer
MLS	Messaging Layer Security (RFC 9420) — E2E encryption for groups
SPIRE	SPIFFE Runtime Environment — issues SVID certificates to workloads
SPIFFE	Secure Production Identity Framework for Everyone
SVID	SPIFFE Verifiable Identity Document (X.509 cert or JWT)
Channel	A named pub/sub topic in SLIM (hierarchical: org/ns/topic)
Group session	A SLIM multicast session with multiple subscribers
Data plane	SLIM message routing layer (pure forwarding, zero payload inspection)
Control plane	SLIM management layer (route configuration, monitoring)
Session layer	SLIM encryption + group membership layer (sits above data plane)

Table of Contents

1. Executive Summary

2. Business Problem

2.1 The Enterprise Fleet Reality

2.2 What Makes Agent Communication Hard

2.3 Requirements for the MVP

3. Solution: SLIM as the Communication Backbone

3.1 What SLIM Provides

3.2 The Channel Model

3.3 Why SLIM Solves the Network Problem

4. Architecture

4.1 High-Level System Architecture

4.2 Network Topology — What Crosses the Firewall

4.3 Channel Subscription Model

4.4 Multi-Channel Agent Participation

5. Use Case: Enterprise Cluster-Monitoring Agent Fleet

5.1 Scenario Description

5.2 Agent Roster

5.3 Event Types

6. Communication Flows

6.1 Flow A: Automated Security Incident Resolution

6.2 Flow B: Human-in-the-Loop Escalation

6.3 Flow C: Agent Joining Multiple Channels

6.4 Message Flow Timeline (Full Incident Lifecycle)

7. Security Model

7.1 Layered Security Architecture

7.2 SPIRE Nested Deployment Across Clusters

7.3 Security Properties

8. Demo Scenario

8.1 Environment Setup

8.2 Demo Script — Step by Step

Step 1: Show baseline — agents running, channels empty

Step 2: Inject a simulated CVE event

Step 3: Watch agents collaborate on the security-incident channel

Step 4: Observe escalation to human operator

Step 5: Operator approves and watches resolution

8.3 Demo Talking Points

9. Key Value Propositions

9.1 Simplified Operations

9.2 Enhanced Security

9.3 Developer Experience

9.4 Operational Scalability

10. Implementation Notes

10.1 Tech Stack

10.2 Key APIs Used in Demo Agents

10.3 Helm Deployment Snippet

10.4 Prerequisites for Demo Environment

Appendix A: Glossary

Appendix B: References