Helm charts
We ship the system as a single umbrella Helm chart, split-brain,
with one subchart per component. Helm is the only supported deployment
path — there are no raw manifests to apply by hand.
Why Helm
- One
helm install(or Argo CDApplication) brings the whole system up; onehelm upgraderolls it forward. - Per-environment values (
dev,staging,prod) live invalues-<env>.yaml. The structure is identical so diffs between environments are obvious. helm templategives reviewers a deterministic rendered diff in CI before any change merges.
ScalarLM is not in this chart. It is a separately deployed
service with its own upstream Helm chart; we configure the router
to talk to it via SCALARLM_BASE_URL + a credential Secret. The
operator can deploy ScalarLM in another namespace, another cluster,
or any way they like — split-brain just needs the URL.
Chart layout
charts/split-brain/
Chart.yaml # umbrella
values.yaml # baseline defaults
values-dev.yaml # per-env overlays (in repo)
values-staging.yaml
values-prod.yaml
templates/
namespace.yaml
pvc.yaml # split-brain-data (RWX default; dev overlay uses RWO)
secrets.yaml # anthropic / scalarlm / cloudflared / google Secrets
_helpers.tpl # naming, labels, image refs
charts/ # first-party subcharts in this repo
router/
Chart.yaml
values.yaml
templates/
deployment.yaml
service.yaml
configmap.yaml
serviceaccount.yaml
hpa.yaml # gated on .Values.autoscaling.enabled
pdb.yaml
networkpolicy.yaml
_helpers.tpl
classifier/
... # mirrors router/
ui/
... # mirrors router/
cloudflared/
Chart.yaml
values.yaml
templates/
deployment.yaml # token mode: TUNNEL_TOKEN env, no config file
serviceaccount.yaml
pdb.yaml
networkpolicy.yaml
Only the router subchart ships an hpa.yaml (gated on
autoscaling.enabled); classifier and ui scale by fixed replicaCount.
cloudflared has no ConfigMap — it runs in token mode and pulls its
ingress rules from the Cloudflare dashboard at runtime (see
cloudflare-tunnel.md).
All four first-party subcharts live in-repo. They are not published independently — the umbrella is the only release artifact.
Values structure
The umbrella values.yaml exposes a flat top-level key per subchart.
Operators override values like this:
global:
image:
registry: ghcr.io/<org>/split-brain
pullPolicy: IfNotPresent
pullSecrets: []
domain: router.example.com # used by cloudflared ingress rules
storage:
pvcName: split-brain-data # shared PVC for audit log, tokens, labels, usage, settings...
accessMode: ReadWriteMany # default; the dev overlay sets ReadWriteOnce (Civo single-node)
storageClassName: "" # RWX class (NFS/CephFS/EFS/...), or a block class for RWO
size: 500Gi # audit log dominates; size for your traffic + retention
secrets:
# The chart creates the three Secrets the subcharts consume.
# Material lives in a separate *.secrets.yaml file (gitignored);
# see the Secrets section below for the threat model.
create: true
anthropic:
apiKey: "" # supplied via values-*.secrets.yaml
scalarlm:
apiKey: "" # optional; only when router.config.scalarlmBaseUrl set
cloudflared:
tunnelToken: "" # base64 token from Cloudflare's "Install connector" panel
router:
replicaCount: 3
image:
repository: router # joined with global.image.registry
tag: "" # defaults to .Chart.AppVersion
resources:
requests: {cpu: 250m, memory: 512Mi}
limits: {cpu: 1, memory: 1Gi}
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
config:
classifierThreshold: 0.4
maxInflight: 256
anthropicModel: claude-sonnet-4-6 # seed default; live default is admin-tunable (settings.json)
scalarlmBaseUrl: https://scalarlm.internal.example.com/v1
scalarlmModel: auto # discover the served model from ScalarLM's /v1/models; or pin an explicit id
# No per-subchart `secrets:` block — the umbrella's global.secrets
# (above) is the single source of truth; subcharts reference the
# generated Secret names by their fixed names (anthropic-api-key,
# scalarlm-credentials, cloudflared-credentials).
classifier:
replicaCount: 2
config:
keywordsPath: "" # optional; if set, mount a ConfigMap with the keyword list
ui:
replicaCount: 2
config:
# Cloudflare Access (edge auth) — optional
cfTeamDomain: "" # https://<team>.cloudflareaccess.com
cfJwtAudience: ""
devMode: false
# Google sign-in (app-level auth — see docs/google-auth.md). Active once
# googleClientId is set AND the google-oauth Secret exists.
googleClientId: ""
oauthRedirectUrl: "https://split-brain-ui.scalarxlm.com/auth/google/callback"
allowedDomains: "smasint.com,relational.ai"
allowedEmails: "[email protected]"
sessionMaxAgeDays: "30"
cloudflared:
replicaCount: 2
tunnel:
credentialsSecret: cloudflared-credentials
# No ingress list here — token mode pulls hostnames/services from the
# Cloudflare dashboard at runtime (see docs/cloudflare-tunnel.md).
Single-writer / RWO note. The base values show
router.replicaCount: 3andui.replicaCount: 2, but the components that mount the PVC must run 1 replica on a ReadWriteOnce volume — thedevoverlay setsreplicaCount: 1for router/classifier/ui. The UI in particular is the single writer oflabels.jsonl/limits.json/settings.json, so it must stay at 1 while those live on the PVC. There are no ServiceMonitor / PodMonitor resources in the chart.
Conventions
global.image.registryis joined with each component'simage.repository. No component references a full image string.- Image tags default to
.Chart.AppVersion— bumping the umbrella'sappVersionrolls all four images forward as a unit. Operators can pin individual components by setting<component>.image.tag. - Secrets are referenced by name, never inlined. The chart will refuse to render if a referenced secret name is empty.
Secrets
The chart creates the Secrets the subcharts consume — operators
put the secret material in the values file. There are two modes,
selected by global.secrets.create:
Chart-creates-Secrets mode (default, create: true)
The chart renders three Secret resources from values:
| Secret name | Keys | Source value | Required? |
|---|---|---|---|
anthropic-api-key |
ANTHROPIC_API_KEY |
global.secrets.anthropic.apiKey |
yes |
cloudflared-credentials |
TUNNEL_TOKEN |
global.secrets.cloudflared.tunnelToken (token mode — used verbatim as the TUNNEL_TOKEN env) |
yes |
scalarlm-credentials |
SCALARLM_API_KEY |
global.secrets.scalarlm.apiKey |
only if router.config.scalarlmBaseUrl is set |
google-oauth |
clientSecret, sessionSecret |
global.secrets.google.* |
only if ui googleClientId is set |
If a required value is empty, the chart refuses to render with a clear message naming the missing field — silent broken-Secret installs are not allowed.
The scalarlm-credentials Secret is created only when its value is
non-empty (ScalarLM deployments without auth are valid).
Where the secret material lives
NEVER commit a values file containing real secret material to git. The convention this repo uses:
values-prod.yaml # ← committed; structure + non-secret config
values-prod.secrets.yaml # ← gitignored (*.secrets.yaml); secret values only
Install/upgrade passes both files; Helm deep-merges them with later files winning:
helm upgrade --install split-brain charts/split-brain \
-f values-prod.yaml \
-f values-prod.secrets.yaml
*.secrets.yaml and *.secrets.yml are in the repo's .gitignore.
Operators store the production secret overlay in their team's
credential vault (1Password, AWS Secrets Manager, etc.) and check
it out only at deploy time.
A minimal values-prod.secrets.yaml:
global:
secrets:
anthropic:
apiKey: sk-ant-...
cloudflared:
# Paste the base64 token verbatim from the Cloudflare dashboard
# (Zero Trust → Networks → Tunnels → <your tunnel> → Install
# connector → the string after `cloudflared service install`). The
# chart stores it as the `TUNNEL_TOKEN` env (token mode); cloudflared
# fetches its ingress rules from Cloudflare at runtime.
tunnelToken: eyJh...
scalarlm:
apiKey: sk-...
Bring-your-own-Secret mode (create: false)
For teams that manage secrets out of band (sealed-secrets,
external-secrets-operator, vault-injector, or kubectl create secret
manually), set global.secrets.create: false. The chart skips the
Secret resources entirely and the subcharts continue to reference the
same names as in the table above — the operator is responsible for
having created Secrets with those exact names.
Router-client bearer tokens are not Helm secrets
The sbk_<…> tokens clients put in Authorization: Bearer … are
not Kubernetes Secrets. They are self-served by authenticated
users through the UI's Tokens view (see ui.md)
and persisted as sha256(token) in one file per token on the
shared PVC. This keeps human credential lifecycle (create / list /
revoke) out of the operator's path and removes a class of "the
team is sharing one key in a chat" failures.
Storage
All durable state — audit log, router tokens, bootstrap source
documents — lives on a single shared PVC mounted at
/var/split-brain/ in each pod that needs it. The chart does
not use S3 or any object store; this is a project-wide
choice (see global.storage in the values structure).
| Path | Writers | Readers |
|---|---|---|
/var/split-brain/audit/ |
router | ui (RO) |
/var/split-brain/tokens/ |
ui, router (last-used) | router |
/var/split-brain/bootstrap/ |
ui (upload) | ui (training) |
/var/split-brain/heads/ |
ui (trained head) | classifier (/reload) |
/var/split-brain/labels/ |
ui | ui (training) |
/var/split-brain/limits/ |
ui (admin) | router (quota) |
/var/split-brain/settings/ |
ui (admin) | router (default model) |
/var/split-brain/usage/ |
router (per-pod) | ui (admin, RO) |
The router mounts the whole PVC; the UI mounts the individual subPaths above (some RO) because its root filesystem is read-only.
The PVC's access mode is set by global.storage.accessMode. ReadWriteMany
(NFS, CephFS, EFS, …) lets the PVC-mounting components scale horizontally.
The current Civo deployment only has ReadWriteOnce block storage, so the
dev overlay sets accessMode: ReadWriteOnce and pins router/classifier/ui
to a single replica (all scheduled to the same node via WaitForFirstConsumer).
Backup is the operator's responsibility (PV snapshots or external sync) — the chart does not implement it.
The classifier subchart must not receive anthropic-api-key.
A pre-install validation template fails rendering if it is
referenced from the classifier subchart's secret map — enforcing
the project invariant that no LLM step in classifier retraining
ever calls Claude (see
classifier.md).
A pre-install validation template fails fast with a clear message if any are missing.
Templating discipline
- Every resource gets a name from
include "split-brain.fullname"and labels frominclude "split-brain.labels". Both live in the umbrella_helpers.tpland are reused by subcharts. - No string concatenation for image refs — use the
split-brain.imagehelper that takes a component name. - Conditional resources (e.g.
ServiceMonitor,HorizontalPodAutoscaler) are guarded by a single boolean in values. No multi-level conditional nesting. - We do not template Secret resources. Ever.
helm lint --strictandhelm template ... | kubeconformrun in CI on every PR.
Release process
Current reality: there is no CI image build or Argo CD in this repo yet. Releases are driven by the
./split-brainCLI —./split-brain build --remote(builds amd64 images on the build host and pushes them) then./split-brain deploy <env>(helm upgrade + rollout). See cli.md and deploy.md. The flow below is the target.
- Bump
appVersionandversioninChart.yaml. SemVer for the chart, image tag forappVersion. Bump both even if only one component changed — the umbrella is the unit of release. - CI builds and pushes all four images at the new tag.
- CI publishes the chart to an OCI registry
(
oci://ghcr.io/<org>/split-brain-charts). We do not publish to the legacyindex.yamlrepo format. - Argo CD (or
helm upgrade) picks up the new chart version per environment, gated by whatever approval policy the environment has.
Chart.lock is committed; subchart upgrades are deliberate PRs, not
silent floats.
Local development
helm dep update charts/split-brain
helm template demo charts/split-brain \
-f charts/split-brain/values-dev.yaml > /tmp/rendered.yaml
kubectl apply --dry-run=server -f /tmp/rendered.yaml
For a real local install:
helm install demo charts/split-brain \
-n split-brain --create-namespace \
-f charts/split-brain/values-dev.yaml
The dev overlay points scalarlmBaseUrl at a developer-local
ScalarLM (or a shared dev one) so the whole stack can come up on a
kind/minikube cluster without provisioning GPUs.
What lives in this repo vs upstream
| In this repo | Upstream / out of repo |
|---|---|
| Umbrella chart and four first-party subcharts (router, classifier, ui, cloudflared) | ScalarLM (separate Helm chart, separate deployment) |
| Per-env values overlays (dev/staging/prod) | Secret material |
| CI for lint, template, kubeconform | Argo CD Application manifests (separate infra repo) |
| Cloudflare DNS records and Access policies |