SLIM MVP: Enterprise AI Agent Fleet — Multicluster Secure Communications
Related issue: #1372 — Epic: SLIM multicluster autoconfig installation for server fleets
Table of Contents
- Executive Summary
- Business Problem
- Solution: SLIM as the Communication Backbone
- Architecture
- Use Case: Enterprise Cluster-Monitoring Agent Fleet
- Communication Flows
- Security Model
- Demo Scenario
- Key Value Propositions
- Implementation Notes
1. Executive Summary
This MVP demonstrates how SLIM (Secure Low-Latency Interactive Messaging) enables AI agents deployed across multiple Kubernetes clusters — including clusters hidden behind corporate VPNs and firewalls — to communicate securely without exposing any agent as a network server.
Agents subscribe to named channels (topics) matching the problem they need to solve. When a cluster event fires, the relevant agents wake up, collaborate through SLIM, and escalate to a human operator if needed. All of this works across network boundaries that would otherwise require complex firewall rules, VPN tunnels, or service mesh configuration per agent.
Core value proposition: SLIM turns the hardest part of multi-cluster agent communication — the network — into a non-problem, while making the system more secure, not less.
2. Business Problem
2.1 The Enterprise Fleet Reality
Large enterprises operate AI agent fleets across tens or hundreds of Kubernetes clusters. These clusters span different environments:
| Environment | Network constraint |
|---|---|
| On-prem data centers | Corporate firewall / private network |
| Branch offices | VPN-only access |
| Cloud regions | Transit VPC, private endpoints |
| Edge / OT networks | Strictly air-gapped or NAT-only outbound |
Each cluster runs specialized AI agents:
- Performance agents — track CPU, memory, latency, and SLO compliance
- Security agents — detect anomalies, policy violations, CVEs in running images
- Remediation agents — attempt automated fixes (node drain, pod restart, config rollback)
- Escalation agents — page a human operator when automated resolution is insufficient
2.2 What Makes Agent Communication Hard
Traditional approaches require agents to expose themselves as servers:
Agent A ──► opens TCP port ──► registered in service registry ──► firewall rule
Agent B ──► DNS lookup ──► TLS handshake ──► mTLS cert rotation ──► calls Agent A
This creates a dense web of operational complexity:
- Every cluster behind a VPN requires explicit firewall/NAT rules per agent pair
- Every agent needs a stable DNS name and TLS certificate, even for ephemeral workloads
- Human operators wanting to observe or intervene must be given network-level access to the relevant cluster
- Agent-to-agent topology changes (scale-up, migration) invalidate stale service registry entries and routing rules
2.3 Requirements for the MVP
| Requirement | Description |
|---|---|
| Zero exposed ports per agent | Agents must not listen on any TCP/UDP port |
| Cross-cluster messaging | Agents on Cluster A and Cluster B communicate transparently |
| Works behind VPN / NAT | Outbound-only connectivity from cluster to SLIM overlay is sufficient |
| Event-driven wake-up | Agents are idle until a relevant event arrives on their channel |
| Multi-channel membership | A single agent can participate in multiple topic channels simultaneously |
| Human-in-the-loop | Operators can observe, validate, and act via the same channel mechanism |
| End-to-end encryption | Agent messages are encrypted with MLS; SLIM nodes cannot read payloads |
| Workload identity | SPIRE provides SPIFFE certificates — no shared secrets, no static API keys |
3. Solution: SLIM as the Communication Backbone
3.1 What SLIM Provides
SLIM is a publish-subscribe message router with a session layer. Its architecture has three components that cleanly separate concerns:
flowchart TB
subgraph APP["Agent / Application"]
RPC["SLIMRPC (request/response, streaming)"]
SL["Session Layer (MLS encryption, group mgmt)"]
DPC["Data Plane Client (routing, transport)"]
RPC --> SL --> DPC
end
DPC -->|"outbound connection only"| NL["SLIM Routing Node\n(cluster-local)"]
NL -->|"mTLS (SPIRE-issued certs)"| NR["SLIM Routing Node\n(remote cluster)"]
3.2 The Channel Model
Every agent is identified by a hierarchical name: <org>/<namespace>/<agent-type>[#id].
Agents subscribe to a named channel topic. The SLIM controller sees these subscriptions and automatically creates routes between the cluster-local SLIM node and any remote nodes that have subscribers for the same topic.
channel: acme/monitoring/security-incident
│ │ │
│ │ └── topic (problem domain)
│ └── namespace (cluster or team scope)
└── org (enterprise identifier)
Key properties:
- An agent subscribes with no knowledge of the remote agent’s address — only the channel name
- The SLIM router handles delivery; the agent never opens a listening port
- Multiple agents can subscribe to the same channel → multicast delivery
- An agent can subscribe to multiple channels simultaneously
3.3 Why SLIM Solves the Network Problem
| Traditional approach | SLIM approach |
|---|---|
| Agent opens a port and registers in a service registry | Agent connects outbound to the local SLIM node — no inbound port |
| Firewall rules required for every agent pair | Only the SLIM node itself needs one exposed endpoint per cluster |
| DNS + TLS cert per agent | Workload identity via SPIRE SVID — automatic, no manual cert management |
| Service mesh or VPN tunnel between every cluster pair | SLIM nodes federate; agents are completely unaware of cluster topology |
| Agent scale-out changes routing config | Agents with the same name auto-form a group; SLIM load-balances |
4. Architecture
4.1 High-Level System Architecture
---
config:
layout: elk
---
flowchart TB
subgraph CLOUD["Cloud / Admin Cluster (admin.example)"]
direction TB
CTRL["SLIM Controller\n(route management)"]
SPIRE_ADMIN["SPIRE Server (root)\ntrust domain: acme.example\nspire-root.admin.example:8081"]
LB_CTRL["LoadBalancer\nslim-control.admin.example:50052"]
LB_CTRL --> CTRL
end
subgraph CLUSTER_A["Cluster A — On-prem / VPN"]
direction TB
SPIRE_A["SPIRE Server (nested)\nno external endpoint"]
subgraph SLIM_A_NS["slim namespace"]
SLIM_A["SLIM Node (StatefulSet)\nslim.cluster-a.example:46357"]
end
subgraph AGENTS_A["default namespace"]
PERF_A["Performance\nAgent"]
SEC_A["Security\nAgent"]
REMED_A["Remediation\nAgent"]
end
PERF_A -->|"outbound only"| SLIM_A
SEC_A -->|"outbound only"| SLIM_A
REMED_A-->|"outbound only"| SLIM_A
end
subgraph CLUSTER_B["Cluster B — Cloud region"]
direction TB
SPIRE_B["SPIRE Server (nested)\nno external endpoint"]
subgraph SLIM_B_NS["slim namespace"]
SLIM_B["SLIM Node (StatefulSet)\nslim.cluster-b.example:46357"]
end
subgraph AGENTS_B["default namespace"]
PERF_B["Performance\nAgent"]
SEC_B["Security\nAgent"]
REMED_B["Remediation\nAgent"]
end
PERF_B -->|"outbound only"| SLIM_B
SEC_B -->|"outbound only"| SLIM_B
REMED_B-->|"outbound only"| SLIM_B
end
subgraph OPERATOR["Operator Workstation / Jump Host"]
HUMAN["👤 Human Operator\n(observer / approver)"]
HUMAN -->|"outbound to any SLIM node"| SLIM_A
end
SLIM_A <-->|"mTLS (SPIRE-issued certs)\noutbound from each side"| SLIM_B
SLIM_A -->|"gRPC controller connection"| CTRL
SLIM_B -->|"gRPC controller connection"| CTRL
SPIRE_A -->|"nested upstream (outbound)"| SPIRE_ADMIN
SPIRE_B -->|"nested upstream (outbound)"| SPIRE_ADMIN
SPIRE nested deployment: Workload-cluster SPIRE servers connect outbound to the admin root SPIRE server as nested (downstream) servers. Only the admin SPIRE server requires an external endpoint — workload SPIRE servers never need to be reachable from outside their own cluster. See SPIRE nested architecture.
4.2 Network Topology — What Crosses the Firewall
The diagram below shows exactly which connections must be allowed through firewalls or VPN gateways. Only SLIM nodes and SPIRE servers open outbound connections — agents and workload SPIRE servers never listen on externally reachable ports.
flowchart LR
subgraph FW_A["Firewall — Cluster A"]
direction TB
note_a["Allow outbound to:\nslim-control.admin.example:50052\nslim.cluster-b.example:46357\nspire-root.admin.example:8081"]
end
subgraph FW_B["Firewall — Cluster B"]
direction TB
note_b["Allow outbound to:\nslim-control.admin.example:50052\nslim.cluster-a.example:46357\nspire-root.admin.example:8081"]
end
AGENT_A["Agent (Cluster A)\nno inbound port"] -->|outbound| SLIM_NODE_A["SLIM Node\n(cluster-a)"]
SLIM_NODE_A -->|"mTLS outbound"| FW_A
FW_A --> SLIM_NODE_B["SLIM Node\n(cluster-b)"]
SLIM_NODE_B --> AGENT_B["Agent (Cluster B)\nno inbound port"]
AGENT_B -->|outbound| SLIM_NODE_B
SLIM_NODE_B -->|"mTLS outbound"| FW_B
FW_B --> SLIM_NODE_A
SPIRE_A["SPIRE Server\n(nested, cluster-a)"] -->|"nested upstream\noutbound"| FW_A
SPIRE_B["SPIRE Server\n(nested, cluster-b)"] -->|"nested upstream\noutbound"| FW_B
Firewall rule summary:
| Source | Destination | Port | Protocol | Purpose |
|---|---|---|---|---|
| Cluster A SLIM node | slim.cluster-b.example |
46357 | TCP/mTLS | Data plane inter-cluster |
| Cluster B SLIM node | slim.cluster-a.example |
46357 | TCP/mTLS | Data plane inter-cluster |
| Cluster A SLIM node | slim-control.admin.example |
50052 | TCP/gRPC+mTLS | Controller |
| Cluster B SLIM node | slim-control.admin.example |
50052 | TCP/gRPC+mTLS | Controller |
| Cluster A SPIRE Server (nested) | spire-root.admin.example |
8081 | TCP/mTLS | SPIRE nested upstream |
| Cluster B SPIRE Server (nested) | spire-root.admin.example |
8081 | TCP/mTLS | SPIRE nested upstream |
| Agents (all clusters) | local SLIM node | 46357 | TCP (local) | Agent ↔ SLIM (cluster-internal) |
No agent-to-agent, no agent-to-internet, and no inbound rules for workload SPIRE servers are required.
4.3 Channel Subscription Model
sequenceDiagram
participant CA as Security Agent (Cluster A)
participant SA as SLIM Node A
participant CTRL as Controller (Admin)
participant SB as SLIM Node B
participant CB as Security Agent (Cluster B)
participant OP as Operator
CA->>SA: subscribe(acme/monitoring/security-incident)
SA->>CTRL: subscription update
CTRL->>SB: configure route to Cluster A for channel
CB->>SB: subscribe(acme/monitoring/security-incident)
SB->>CTRL: subscription update
CTRL->>SA: configure route to Cluster B for channel
OP->>SA: subscribe(acme/monitoring/security-incident)
Note over SA,SB: All three parties now receive messages on this channel
4.4 Multi-Channel Agent Participation
A single agent can join multiple channels, each scoped to a different problem domain:
flowchart TB
AGENT["Security Agent (Cluster A)"]
AGENT --> CH1["acme/monitoring/security-incident\n← security topics"]
AGENT --> CH2["acme/remediation/cve-patch\n← remediation coordination"]
AGENT --> CH3["acme/escalation/human-review\n← when it needs a human"]
This is achieved by the agent creating multiple SLIM sessions, each with its own channel name. The SLIM session layer manages group membership independently per channel.
5. Use Case: Enterprise Cluster-Monitoring Agent Fleet
5.1 Scenario Description
Company: ACME Corp Fleet: 20 Kubernetes clusters across 3 regions (US-East, EU-West, APAC) Problem: Security incident detected on a node in EU-West Cluster 7
- Clusters in EU-West sit behind a corporate VPN; no inbound ports are permitted
- Cluster 7 runs a Security Monitoring Agent and a Remediation Agent
- The cloud (US-East admin cluster) hosts the SLIM Controller and an Escalation Handler
- A human operator is on-call via their laptop connected to corporate VPN
5.2 Agent Roster
| Agent | Channel(s) | Location | Description |
|---|---|---|---|
acme/eu-west/security-monitor |
security-incident, escalation |
Cluster 7 (EU) | Detects anomalies, CVEs |
acme/eu-west/remediation |
security-incident, cve-patch |
Cluster 7 (EU) | Automated fixes |
acme/us-east/security-monitor |
security-incident |
Cluster 1 (US) | Cross-region correlation |
acme/admin/escalation-handler |
escalation |
Admin cluster | LLM-backed escalation agent |
acme/admin/human-operator |
escalation, security-incident |
Operator laptop | Human observer / approver |
5.3 Event Types
| Event | Trigger | Severity |
|---|---|---|
| Anomalous pod CPU spike | Metrics threshold | Low |
| CVE detected in running image | Image scan | Medium |
| Node compromise indicators | Audit log analysis | High |
| Policy violation cascade | OPA/Gatekeeper alerts | High |
| Agent escalation | Agent decision | Critical |
6. Communication Flows
6.1 Flow A: Automated Security Incident Resolution
sequenceDiagram
participant K8S as K8s Events (EU-West)
participant SEC as Security Agent (EU)
participant SA as SLIM Node A
participant SB as SLIM Node B
participant CORS as Security Agent (US-East)
participant REM as Remediation Agent
K8S->>SEC: CVE detected in nginx:1.21 — Critical
SEC->>SA: publish(security-incident, cve=CVE-2024-XXXX image=nginx:1.21)
SA->>SB: route message (mTLS)
SB->>CORS: deliver
SA->>REM: deliver (same cluster)
CORS->>SB: reply(also_affected=[us-east-3])
SB->>SA: route reply
SA->>SEC: deliver reply
REM->>SA: publish(cve-patch, action=rolling-update target=nginx:1.25)
Note over SEC,REM: Remediation proceeds autonomously — no human intervention needed
6.2 Flow B: Human-in-the-Loop Escalation
sequenceDiagram
participant REM as Remediation Agent
participant SA as SLIM Node A
participant SC as SLIM Node C (Admin)
participant ESC as Escalation Handler
participant OP as Human Operator
REM->>SA: publish(escalation, reason=cannot-auto-patch severity=CRITICAL)
SA->>SC: route to admin cluster
SC->>ESC: deliver to escalation handler
SC->>OP: deliver to operator (subscribed to channel)
ESC->>SC: publish(paging operator, preparing runbook)
OP->>SC: publish(decision=APPROVE_DRAIN node=eu-west-7-node-3)
SC->>SA: route approval back
SA->>REM: deliver operator approval
REM->>SA: publish(security-incident, status=RESOLVED action=node-drain)
Note over OP,SA: Operator never needed direct access to Cluster 7 — only to SLIM
6.3 Flow C: Agent Joining Multiple Channels
flowchart LR
subgraph CLUSTER_7["Cluster 7 (EU-West)"]
SEC_7["Security Agent\nacme/eu-west/security-monitor"]
SLIM7["SLIM Node"]
SEC_7 -->|"subscribe × 3"| SLIM7
end
CH1["Channel:\nacme/monitoring/security-incident"]
CH2["Channel:\nacme/remediation/cve-patch"]
CH3["Channel:\nacme/escalation/human-review"]
SLIM7 --- CH1
SLIM7 --- CH2
SLIM7 --- CH3
CH1 --- SEC_US["US-East\nSecurity Agent"]
CH2 --- REM_7["Remediation Agent\n(same cluster)"]
CH3 --- ESC_ADMIN["Escalation Handler\n(Admin cluster)"]
CH3 --- OP["👤 Operator"]
6.4 Message Flow Timeline (Full Incident Lifecycle)
gantt
title Security Incident Lifecycle — SLIM-enabled
dateFormat mm:ss
axisFormat %M:%S
section Detection
CVE scan triggers event : 00:00, 5s
Security Agent wakes up : 00:05, 3s
section Agent Collaboration
Publish to security-incident : 00:08, 2s
Cross-cluster agents respond : 00:10, 8s
Remediation Agent attempts fix : 00:18, 15s
section Escalation
Remediation fails, escalate : 00:33, 3s
Escalation handler notified : 00:36, 5s
Operator receives page : 00:41, 10s
section Resolution
Operator approves action : 00:51, 5s
Remediation executes drain : 00:56, 20s
Incident closed, channels idle : 01:16, 5s
7. Security Model
7.1 Layered Security Architecture
flowchart TB
L4["Layer 4: Application-level authorization\nAgents validate sender SPIFFE SVID in message metadata"]
L3["Layer 3: End-to-end MLS encryption (RFC 9420)\nSLIM routers cannot read agent payloads"]
L2["Layer 2: Transport mTLS between SLIM nodes\nSPIRE-issued SVID certificates, auto-rotated"]
L1["Layer 1: Workload identity — SPIFFE/SPIRE\nZero-trust · no static secrets · no shared API keys"]
L4 --> L3 --> L2 --> L1
7.2 SPIRE Nested Deployment Across Clusters
SPIFFE peer federation requires every SPIRE server to expose an endpoint reachable from the other clusters — not viable in VPN-restricted or air-gapped environments.
SPIRE nested deployment solves this: workload-cluster SPIRE servers act as nested (downstream) servers and connect outbound to the admin root SPIRE server. Only the admin SPIRE server requires an external endpoint.
flowchart TB
subgraph ADMIN["Admin Cluster — root SPIRE server\nspire-root.admin.example:8081 (public)"]
SPIRE_ROOT["SPIRE Server (root)\ntrust domain: acme.example"]
CTRL_SVID["SVID: spiffe://acme.example/slim/controller"]
SPIRE_ROOT --- CTRL_SVID
end
subgraph CLUSTER_A["Cluster A — nested SPIRE server\n(no external endpoint required)"]
SPIRE_NESTED_A["SPIRE Server (nested)\ntrust domain: acme.example"]
NODE_A_SVID["SVID: spiffe://acme.example/cluster-a/slim/node-0"]
AGENT_A_SVID["SVID: spiffe://acme.example/cluster-a/agent/security-monitor"]
SPIRE_NESTED_A --- NODE_A_SVID
SPIRE_NESTED_A --- AGENT_A_SVID
end
subgraph CLUSTER_B["Cluster B — nested SPIRE server\n(no external endpoint required)"]
SPIRE_NESTED_B["SPIRE Server (nested)\ntrust domain: acme.example"]
NODE_B_SVID["SVID: spiffe://acme.example/cluster-b/slim/node-0"]
AGENT_B_SVID["SVID: spiffe://acme.example/cluster-b/agent/security-monitor"]
SPIRE_NESTED_B --- NODE_B_SVID
SPIRE_NESTED_B --- AGENT_B_SVID
end
SPIRE_NESTED_A -->|"outbound upstream connection"| SPIRE_ROOT
SPIRE_NESTED_B -->|"outbound upstream connection"| SPIRE_ROOT
All SVIDs are issued under the single shared trust domain (acme.example).
The root SPIRE server is the chain-of-trust anchor; nested servers delegate issuance to
their local workloads. SLIM nodes on different clusters can mutually authenticate because
they share the same trust domain and their certificates chain to the same root.
7.3 Security Properties
| Property | Mechanism | Benefit |
|---|---|---|
| No exposed agent ports | SLIM outbound-only model | Eliminates entire attack surface class |
| Workload identity | SPIRE SVID (X.509 + JWT) | No static credentials — identity is cryptographic |
| Inter-node transport | mTLS with SPIRE-issued certs | Auto-rotated, zero-touch cert management |
| Agent payload privacy | MLS group encryption | SLIM routing nodes are zero-knowledge to payloads |
| Operator access control | Channel subscription + SVID | Operator only subscribes; no cluster-level access needed |
| Audit trail | SLIM controller events + SPIRE attestation | Full provenance of who joined which channel |
8. Demo Scenario
8.1 Environment Setup
The demo uses three kind clusters to simulate the production environment:
| Cluster | Role | Simulates |
|---|---|---|
kind-admin.example |
SLIM Controller + SPIRE Server | Cloud management plane |
kind-cluster-a.example |
SLIM nodes + Security & Remediation Agents | On-prem cluster (VPN-restricted) |
kind-cluster-b.example |
SLIM nodes + Security Agent | Cloud cluster (remote region) |
# 1. Start clusters and install SPIRE (nested deployment)
sudo task multi-cluster:up
# 2. Deploy Controller on admin cluster
task controller:deploy
# 3. Deploy SLIM on workload clusters
task slim:deploy
# 4. Deploy agents
task demo:agents:deploy
8.2 Demo Script — Step by Step
Step 1: Show baseline — agents running, channels empty
# Show agents are running but NOT listening on any port
kubectl get pods -n default # agents running
kubectl exec -it <security-agent-pod> -- ss -tlnp # no open ports
Step 2: Inject a simulated CVE event
# Inject a CVE detection event into Security Agent on Cluster A
kubectl exec -it <security-agent-pod> -- \
python3 inject_event.py --type cve --severity critical --image nginx:1.21
Step 3: Watch agents collaborate on the security-incident channel
# Follow the channel in real time (operator view)
slimctl channel subscribe acme/monitoring/security-incident --cluster admin.example
Expected output:
[00:00] eu-west/security-monitor → CVE-2024-XXXX detected in nginx:1.21
[00:02] us-east/security-monitor → Confirmed affected: us-east-3 also running nginx:1.21
[00:04] eu-west/remediation → Attempting rolling update to nginx:1.25
[00:19] eu-west/remediation → Node tainted, cannot auto-patch — escalating
Step 4: Observe escalation to human operator
# Operator joins escalation channel from their laptop
slimctl channel subscribe acme/escalation/human-review
Expected output:
[00:22] eu-west/security-monitor → Escalation requested: node-drain required
[00:24] admin/escalation-handler → Runbook loaded, paging operator
[APPROVAL REQUIRED]
Step 5: Operator approves and watches resolution
# Operator sends approval
slimctl channel publish acme/escalation/human-review \
'{"decision": "APPROVE_DRAIN", "node": "eu-west-7-node-3"}'
Expected output:
[00:31] eu-west/remediation → Node drain initiated
[00:51] eu-west/remediation → Drain complete, workloads rescheduled
[00:52] eu-west/security-monitor → Incident resolved — CVE remediated
8.3 Demo Talking Points
-
“Notice that no agent opened a port” —
ss -tlnpshows nothing. Yet cross-cluster messaging worked transparently. -
“The only firewall rules needed” — point to the two SLIM node endpoints, not the dozens of agent-pair rules a traditional approach would require.
-
“The operator joined from a laptop” — connected to corporate VPN, authenticated via SPIRE, and joined the channel. No jump host, no kubectl proxy, no cluster-level access.
-
“MLS means SLIM nodes are zero-knowledge” — the SLIM routers forwarded every byte without being able to read a single word of the agent messages.
-
“Adding a new cluster is trivial” — register it with SPIRE, deploy SLIM with the correct group name, point it at the controller. No routing rule changes in other clusters.
9. Key Value Propositions
9.1 Simplified Operations
| Before SLIM | With SLIM |
|---|---|
| Firewall rules: O(n²) agent pairs | Firewall rules: O(n) SLIM nodes |
| DNS + cert per agent | Zero per-agent network config |
| Service registry maintenance | Channel names are self-describing, DNS-free |
| VPN access required per cluster | One SLIM endpoint per cluster suffices |
9.2 Enhanced Security
- Zero server exposure per agent — drastically reduces attack surface
- End-to-end MLS encryption — infrastructure operators cannot read agent payloads
- Cryptographic workload identity — SPIRE eliminates static credential sprawl
- Principle of least privilege — operators observe only what they subscribe to
9.3 Developer Experience
- Agents use high-level SLIMRPC or pub/sub APIs — no networking code
- Language support: Python, Go, Java, .NET (C#), Kotlin, Rust
- Protocol support: A2A, MCP, custom protobuf
- Session types: point-to-point, multicast (group), streaming
9.4 Operational Scalability
- Dynamic topology — agents join/leave channels without reconfiguration
- Auto-routing — SLIM Controller creates inter-cluster routes on first subscription
- Fleet-scale — tested with large StatefulSet deployments; single control plane for all
- Multi-language agents — Python ML agent ↔ Go infrastructure agent ↔ Java app agent, all on the same channel
10. Implementation Notes
10.1 Tech Stack
| Component | Technology |
|---|---|
| SLIM Node (data plane) | Rust — high-performance message router |
| SLIM Session Layer | Rust — MLS (RFC 9420) encryption + group management |
| SLIMRPC | Protobuf + generated stubs (all languages) |
| SLIM Controller | Go (Kubernetes controller pattern) |
| Workload identity | SPIRE / SPIFFE |
| Demo agents | Python (ADK or custom SLIM Python bindings) |
| Deployment | Kubernetes (Helm charts) |
| Observability | OpenTelemetry (SLIM has native OTEL support) |
10.2 Key APIs Used in Demo Agents
import slim_bindings
# Agent startup — outbound connection only, no listening port
svc = slim_bindings.Service.new(slim_node_addr)
app = svc.create_app_with_secret(agent_name, shared_secret)
conn_id = svc.connect(slim_bindings.ClientConfig.new_with_tls(slim_node_addr))
# Subscribe to a channel
app.subscribe(app.name(), conn_id)
# Join the security-incident group channel
session_cfg = slim_bindings.SessionConfig(
session_type=slim_bindings.SessionType.MULTICAST,
mls_enabled=True,
)
session, done = await app.create_session(session_cfg, "acme/monitoring/security-incident")
await done # wait for group to form
# Publish an event
await app.publish("acme/monitoring/security-incident", payload_bytes)
# Receive messages (event-driven)
async for msg in app.receive():
await handle_event(msg)
10.3 Helm Deployment Snippet
# values-cluster-a.yaml
slim:
config:
services:
slim/0:
node_id: "${env:SLIM_SVC_ID}"
group_name: "cluster-a.example"
dataplane:
servers:
- endpoint: "0.0.0.0:46357"
metadata:
local_endpoint: "${env:MY_POD_IP}"
external_endpoint: "slim.cluster-a.example:46357"
tls:
source:
type: spire
controller:
clients:
- endpoint: "https://slim-control.admin.example:50052"
tls:
source:
type: spire
10.4 Prerequisites for Demo Environment
kindv0.20+kubectlv1.28+helmv3.12+task(Taskfile runner)spire-server/spire-agent(deployed via Helm chart in clusters)- SLIM Helm charts (
charts/slim,charts/slim-control-plane) - Docker for building demo agent images
Appendix A: Glossary
| Term | Definition |
|---|---|
| SLIM | Secure Low-Latency Interactive Messaging — the transport framework |
| SLIMRPC | SLIM’s request/response RPC layer built on top of the session layer |
| MLS | Messaging Layer Security (RFC 9420) — E2E encryption for groups |
| SPIRE | SPIFFE Runtime Environment — issues SVID certificates to workloads |
| SPIFFE | Secure Production Identity Framework for Everyone |
| SVID | SPIFFE Verifiable Identity Document (X.509 cert or JWT) |
| Channel | A named pub/sub topic in SLIM (hierarchical: org/ns/topic) |
| Group session | A SLIM multicast session with multiple subscribers |
| Data plane | SLIM message routing layer (pure forwarding, zero payload inspection) |
| Control plane | SLIM management layer (route configuration, monitoring) |
| Session layer | SLIM encryption + group membership layer (sits above data plane) |