split-brain

Sign in

Deploying — and the cluster targeting guard

./split-brain deploy runs helm upgrade --install for the umbrella chart, then rolls the app deployments so a same-tag image rebuild is re-pulled (the dev flow pins :v0.0.1 + pullPolicy: Always). destroy uninstalls the release and, by default, deletes the data PVC.

The usual flows:

./split-brain deploy                  # helm upgrade + roll the dev release
./split-brain deploy --build --remote # build images on $BLACKWELL_HOST, push, deploy, roll
./split-brain deploy dev --dry-run    # render + server-validate, no changes

Cluster targeting guard

helm and kubectl fall back to the ambient kubectl current-context when no context is given. If KUBECONFIG is unset/empty (or the current context was changed elsewhere), that can be a completely different cluster — silently.

To make that impossible, deploy and destroy resolve and verify a kube-context before any cluster call:

  1. Resolve the target context, in order: --context <name><ENV>_KUBE_CONTEXT (e.g. DEV_KUBE_CONTEXT) → KUBE_CONTEXT. The CLI sources a gitignored .env for these, so the guard works even in non-direnv shells (cron, CI, an agent's non-interactive shell).
  2. Verify the context exists in the active KUBECONFIG. If not, the command aborts with instructions — it never falls through to the ambient current-context.
  3. Target explicitly: every helm/kubectl call is passed --kube-context <name>, so the resolved context is authoritative.

Configure it once. Either is enough; both is belt-and-suspenders:

  • .env (sourced by the CLI, all shells) — copy .env.example:

bash DEV_KUBE_CONTEXT=split-brain # PROD_KUBE_CONTEXT=...

  • .envrc (direnv, interactive shells) — already sets KUBECONFIG; it now also exports DEV_KUBE_CONTEXT. See .envrc.example.

The context named here must be visible in your KUBECONFIG. For the Civo dev cluster, its kubeconfig holds a context literally named split-brain:

export KUBECONFIG=$HOME/keys/civo-split-brain-kubeconfig

Override per-invocation with --context:

./split-brain deploy prod --context split-brain-prod

Why the guard exists (2026-06-04 incident)

A ./split-brain deploy ran in a shell where direnv had not loaded (KUBECONFIG empty), so helm/kubectl used the ambient current-context — a different Civo cluster (solitary-surf-89820118). The full stack helm-installed there, including a cloudflared replica. Because that replica used the same tunnel token as production, Cloudflare registered it as a second origin and load-balanced *.scalarxlm.com traffic across both clusters — half of it hitting the wrong cluster and failing with connection refused. A live, intermittent production outage caused purely by targeting the wrong cluster.

Recovery was helm uninstall + namespace delete on the wrong cluster, which deregistered the rogue tunnel replica and restored all traffic to the correct cluster.

Two compounding lessons, both now mitigated:

  • Never rely on the ambient context for a mutating command. → the guard above (fail-closed: abort, don't guess).
  • A shared tunnel token means any cluster running cloudflared with it becomes a live origin. Be deliberate about where that secret is installed. See cloudflare-tunnel.md.