Kubernetes deployment
All split-brain workloads run in a single namespace, split-brain.
ScalarLM is not deployed by this chart — it is a separately
deployed service (its own Helm chart, possibly its own cluster); we
only configure SCALARLM_BASE_URL and a credential to reach it. No
GPU node pool is required for split-brain itself.
Namespace and RBAC
- Namespace:
split-brain. - ServiceAccount per component:
router-sa,classifier-sa,ui-sa,cloudflared-sa. No cluster-wide permissions; no Roles needed by v1.
Workloads
| Component | Kind | Replicas (base / dev) | Resources |
|---|---|---|---|
| router | Deployment | 3 / 1 | 250m / 512Mi requests, 1 / 1Gi limits |
| classifier | Deployment | 2 / 1 | 500m / 1Gi requests, 2 / 2Gi limits |
| ui | Deployment | 2 / 1 | 250m / 512Mi requests, 1 / 1Gi limits |
| cloudflared | Deployment | 2 / 2 | 100m / 128Mi requests, 500m / 256Mi limits |
All four run on CPU nodes — no GPU scheduling on our side. The dev overlay
pins router/classifier/ui to 1 replica because the Civo PVC is RWO (all
PVC-mounting pods land on one node); the UI must also stay at 1 as the single
writer of its PVC files.
HPA
Only the router subchart ships an hpa.yaml (gated on
autoscaling.enabled, CPU-target based) — and it's disabled in dev (fixed
replicaCount: 1). classifier and ui have no HPA. There is no Prometheus /
prometheus-adapter custom-metric autoscaling in this build.
PodDisruptionBudgets
PDBs exist per subchart but are disabled in dev (a PDB minAvailable: 2
conflicts with replicaCount: 1). cloudflared keeps a PDB (minAvailable: 1).
Services
| Name | Type | Port | Backed by |
|---|---|---|---|
| router | ClusterIP | 8080 | router pods |
| classifier | ClusterIP | 8080 | classifier pods |
| ui | ClusterIP | 8080 | ui pods |
No LoadBalancer and no Ingress. The router and UI are reachable
from outside the cluster only through cloudflared tunnels (see
cloudflare-tunnel.md).
Secrets
Three Secrets live in the namespace, created by the Helm chart
from values by default (global.secrets.create: true):
| Secret name | Keys | Consumed by | Required? |
|---|---|---|---|
anthropic-api-key |
ANTHROPIC_API_KEY |
router | yes |
cloudflared-credentials |
TUNNEL_TOKEN |
cloudflared | yes |
scalarlm-credentials |
SCALARLM_API_KEY |
router | only when router.config.scalarlmBaseUrl is set |
google-oauth |
clientSecret, sessionSecret |
ui | only when ui Google sign-in is enabled |
The Helm chart materializes these from global.secrets.* values
supplied at install time (typically via a gitignored
values-*.secrets.yaml file). See
helm.md § Secrets for the values schema and the
"keep secret material out of git" convention.
Operators who manage secrets out of band (sealed-secrets,
external-secrets-operator, vault-injector) set
global.secrets.create: false; the chart skips the Secret resources
and the operator creates Secrets with the names above themselves.
All consumed via envFrom / volume mounts; we do not bake
credentials into images.
Router-client bearer tokens (sbk_*) are not Kubernetes Secrets
— they live as sha256(token) files on the shared PVC, issued at
runtime by the UI (see ui.md).
ConfigMaps
router-config— non-secret env: classifier threshold + endpoint, in-flight cap, seed Anthropic model, ScalarLM URL/model, cache flags.classifier-config— model kind, MiniLM head path + version, keyword list path.ui-config— classifier base URL, Cloudflare Access team domain + audience, and Google sign-in settings (client id, redirect URL, allowed domains/emails, session length).- cloudflared has no ConfigMap — it runs in token mode (ingress rules in the Cloudflare dashboard).
Persistent storage
One PVC.
split-brain-data — access mode from global.storage.accessMode
(default ReadWriteMany; the dev overlay uses ReadWriteOnce), sized
for traffic + audit retention. Mounted at /var/split-brain/ in every pod
that needs durable state. Holds:
audit/— append-only per-pod request audit log written by the router (RW), read by the ui Request explorer (RO).tokens/— one JSON file per active router API token; written by ui (create/revoke) and router (last-used flush).bootstrap/— proprietary docs staged in the UI before training.heads/— the trained classifier head the UI writes; the classifier loads it on/reload.labels/— operator-curated training labels (labels.jsonl).limits/— per-user token limits (limits.json); ui writes, router reads.settings/— runtime settings (settings.json, e.g. the default Claude model); ui writes, router reads.usage/— per-pod daily token usage ledgers written by the router.
With ReadWriteMany (NFS, CephFS, EFS, Filestore, Azure Files) the
PVC-mounting components can scale horizontally. The current Civo cluster only
offers ReadWriteOnce block storage (civo-volume), so the dev overlay
runs them at a single replica on one node. The chart does not pick a class;
the operator sets global.storage.storageClassName. No S3/object store is
used; all durable state is on the PVC.
The classifier model is small and ships inside the container image. It does not use a PVC.
Health probes
| Component | Liveness | Readiness | Initial delay |
|---|---|---|---|
| router | GET /healthz |
GET /readyz |
5s |
| classifier | GET /healthz |
GET /readyz |
5s |
| ui | GET /healthz |
GET /readyz |
5s |
| cloudflared | GET /ready |
GET /ready |
5s |
Network policies
Default deny in split-brain. Explicit allows:
router→classifier:8080, the external ScalarLM URL, and egress toapi.anthropic.com:443.classifier→ nothing (no egress beyond DNS).ui→classifier:8080(the probe view and the/reloadafter training).cloudflared→router:8080,ui:8080, and egress to Cloudflare edge (*.cloudflare.com:7844).
Rollouts
- All deployments: RollingUpdate, maxSurge 1, maxUnavailable 0.
Layout
Workloads are deployed via a Helm umbrella chart with one subchart per component. See helm.md for the chart structure, values schema, and release process. This document specifies what each workload looks like in the cluster; the chart is how we get it there.