split-brain

Sign in

Operator CLI (./split-brain)

A single script at the repo root that wraps the routine operator tasks: build the three images, push them, install or upgrade the Helm chart, tail logs, run the classifier bootstrap. Generated by bashly from a small YAML + per-command shell files under cmd/. The generated ./split-brain is committed at the repo root and run directly — bashly (via Docker) is only needed to regenerate it after editing cmd/.

Goal

$ ./split-brain --help
split-brain — operator CLI for split-brain

Usage: split-brain COMMAND [OPTIONS]

Commands:
  build       Build Docker images (router, classifier, ui)
  push        Push Docker images to the registry
  deploy      Install or upgrade the Helm chart
  destroy     Uninstall the Helm release
  status      Show pod status in the namespace
  logs        Tail logs for a component
  bootstrap   Train the classifier head from a docs directory
  template    Render Helm manifests locally (helm template wrapper)
  lint        Lint everything (ruff, helm lint, dockerfile syntax)

No Python, no Node, no Make required to run the script. Operators need only bash + the tools each command shells out to (docker, kubectl, helm, uv for bootstrap).

Why bashly

We considered four alternatives:

Tool Why we rejected
Makefile Great at "build artifact X from Y," bad at UX. No --help per target, no flag validation, no subcommand grouping.
Taskfile (go-task) Same UX problems as Make; also adds a runtime dep (the task binary).
Plain shell Argparse, validation, and --help rendering all become our problem. Easy to get wrong; hard to maintain consistency across commands.
Python click / Go cobra More structured than shell but require a runtime: Python venv on every operator's box, or a per-OS compile/release pipeline. Heavy for what amounts to wrapping six binaries.

bashly hits a sweet spot: declarative YAML for the command surface, generated bash that is human-readable and set -e-safe, no runtime beyond bash itself. The build-time dependency (bashly) only matters for developing the CLI; the generated script is committed and self-contained.

Choosing the cluster (direnv)

The CLI shells out to kubectl and helm; both honor the KUBECONFIG env var. To pin which cluster this project targets without remembering --context or --kubeconfig on every command, the repo ships an .envrc.example you copy to .envrc:

cp .envrc.example .envrc
# edit .envrc to point KUBECONFIG at the right file
direnv allow

After that, every shell you cd into this directory automatically exports KUBECONFIG (and anything else you set there). cd out and the exports vanish, so a different project's kubectl is unaffected. This is the per-project safety net that tools like kubectx can't provide — kubectx writes the active context into the shared ~/.kube/config, which leaks to every other shell on the machine.

.envrc and kubeconfig.yaml are both in .gitignore. The committed .envrc.example is just a template; operators stage their actual kubeconfig out of band (commit it to a credential vault, not to git).

Install direnv: brew install direnv (macOS) or your package manager, then add eval "$(direnv hook zsh)" (or bash) to your shell rc. First-time entry into any new .envrc requires explicit direnv allow — a freshly-cloned repo can't run arbitrary code in your shell without your consent.

CLI surface

Each command is described below with its arguments, expected preconditions, and an example invocation.

build

Builds Docker images. The build context is the repo root (each docker/<name>/Dockerfile COPYs <name>/src etc.).

$ ./split-brain build [IMAGE] [--tag=TAG] [--registry=URL]
                      [--platform=P] [--remote] [--push]
Arg/Flag Default Notes
IMAGE (positional) all One of router / classifier / ui / all.
--tag v0.0.1 Image tag (the dev cluster iterates on v0.0.1).
--registry env REGISTRY or docker.io/gdiamos The dev cluster pulls from docker.io/gdiamos.
--platform linux/amd64 The cluster nodes are amd64. Building on an arm64 host without this produces images that deploy but crash-loop with exec format error.
--remote off Build on $BLACKWELL_HOST (default normal@blackwell) and push. Implies --push.
--push off If set, runs push after each successful build.

Locally, wraps docker buildx build --platform linux/amd64 (so an arm64 laptop still produces amd64 images — via qemu, which is slow).

--remote rsyncs the working tree to a fast amd64 host (blackwell) and runs docker build + docker push there natively — much faster than local qemu cross-builds, and the recommended path for the torch-based classifier / ui images. The remote is already logged into the registry as gdiamos.

$ ./split-brain build router --tag=dev
$ ./split-brain build --tag=v0.2.0 --push       # local buildx (amd64)
$ ./split-brain build ui --remote               # native amd64 build on blackwell

push

Pushes already-built images. Useful when build was run without --push (e.g., on a developer laptop without registry credentials, followed by a push from CI).

$ ./split-brain push --image=all --tag=v0.2.0

deploy

Installs or upgrades the Helm release. Idempotent.

$ ./split-brain deploy [ENV] [--namespace=NS] [--release=NAME]
                       [--tag=TAG] [--values-secrets=PATH]
                       [--build] [--remote] [--no-restart]
                       [--dry-run] [--wait] [--no-secrets-check]

ENV is a positional (dev or prod, default dev). The default flags are tuned so a bare ./split-brain deploy reproduces the live dev release exactly: values-dev.yaml + values.secrets.yaml against release split-brain in namespace split-brain.

Arg/Flag Default Notes
ENV (positional) dev Selects charts/split-brain/values-<env>.yaml overlay.
--namespace / -n split-brain Passed with --create-namespace so helm ensures it exists.
--release split-brain Helm release name.
--tag (unset) When set, overrides global.image.tag via --set; otherwise the values file pins it (v0.0.1).
--values-secrets charts/split-brain/values.secrets.yaml The gitignored *.secrets.yaml overlay holding global.secrets.*. Forwarded to helm as an extra -f (resolved against the repo root if relative).
--build off Build + push router/classifier/ui (at --tag) before upgrading.
--remote off With --build, build on $BLACKWELL_HOST (native amd64) instead of locally.
--no-restart off Skip the post-upgrade rollout restart (see below).
--dry-run off helm upgrade --dry-run.
--wait off helm upgrade --wait --timeout 5m.
--no-secrets-check off Skip the secrets-overlay requirement (use with global.secrets.create=false and Secrets managed out of band).

Rollout restart. The dev flow pins a fixed tag (v0.0.1) with pullPolicy: Always, so a rebuilt image keeps the same tag and the rendered pod template is unchanged — helm upgrade alone would not roll the pods, and the new code would never run. So after a non-dry-run upgrade deploy runs kubectl rollout restart on the router, classifier, and ui deployments (cloudflared is left alone to keep the tunnel up), forcing a re-pull. Pass --no-restart for a config-only change where you don't want to bounce pods.

Preflight before invoking helm: the CLI checks that the selected values-<env>.yaml exists, and — unless --no-secrets-check — that the --values-secrets overlay exists. It does not read or modify the secrets file; it just forwards it to helm as an extra -f. (RBAC and image-presence checks are intentionally left to helm/kubelet rather than duplicated here.)

$ ./split-brain deploy                                  # reproduce the live dev release
$ ./split-brain deploy dev --dry-run                    # render + server-validate only
$ ./split-brain deploy prod --tag=v0.2.0 --wait \
      --values-secrets=values-prod.secrets.yaml
$ ./split-brain deploy prod --no-secrets-check          # sealed-secrets / vault

destroy

$ ./split-brain destroy [--namespace=NS] [--release=NAME] [--keep-pvc]

helm uninstall + (by default) deletes the split-brain-data PVC. --keep-pvc retains it — useful if you want to redeploy quickly without losing audit logs or tokens.

The command prompts for explicit confirmation (yes/N) before running because the audit log is destroyed with the PVC.

status

$ ./split-brain status [--namespace=NS]

Runs kubectl get pods,svc,pvc -n NS -o wide and a short summary: which images are running, how many replicas of each component are ready, whether the cloudflared tunnel is connected.

logs

$ ./split-brain logs COMPONENT [-f] [--tail=N]

Positional COMPONENT is one of router / classifier / ui / cloudflared. Wraps kubectl logs -l app.kubernetes.io/component=COMPONENT so output is aggregated across pods.

bootstrap

Runs the classifier bootstrap (chunk → generate → train → save head) from the host. The trained head is saved locally; the operator mounts or copies it into the running classifier via the Helm chart's keywords/head ConfigMap path.

$ ./split-brain bootstrap --docs=PATH [--general=PATH] [--output=PATH]
                          [--generator=heuristic|scalarlm] [--epochs=N]
Flag Default Notes
--docs required Directory of proprietary .md / .txt files.
--general classifier/src/classifier/bootstrap/corpus/public_general.jsonl JSONL of public general prompts.
--output ./head.safetensors Where to write the trained head.
--generator heuristic heuristic (deterministic, no LLM) or scalarlm (needs SCALARLM_BASE_URL).
--epochs 50 Training epochs.

Internally runs uv run --directory classifier python -m classifier.bootstrap.main ... so the operator doesn't need to know about the package layout.

template

$ ./split-brain template [--env=ENV] [--output=PATH]

Wraps helm template for inspecting what deploy would produce. Default output: stdout. With --output FILE, writes there. Useful in code review of chart changes (git diff two renders).

lint

$ ./split-brain lint [--target=python|helm|docker|all]
Target Runs
python uv run ruff check and uv run pytest -q in each of router/, classifier/, ui/.
helm helm lint charts/split-brain and helm template demo charts/split-brain > /dev/null.
docker hadolint on each Dockerfile if installed; otherwise skipped with a note.
all All of the above.

Bashly source layout

Modeled on the cmd/ + committed-script pattern used in the neighbouring orbital and scalarlm repos.

split-brain               # GENERATED CLI — committed at repo root, run directly
cmd/
  bashly.yml              # the command tree definition (source of truth)
  bashly-settings.yml     # source_dir: cmd, target_dir: . (repo root)
  bashly.sh               # `cmd/bashly.sh generate` — dockerized regenerator
  lib/
    colors.sh             # color helpers + die() / require_tool()
  build_command.sh
  push_command.sh
  deploy_command.sh
  destroy_command.sh
  status_command.sh
  logs_command.sh
  bootstrap_command.sh
  template_command.sh
  lint_command.sh

bashly.yml is the single source of truth for command names, args, flags, defaults, validation, and help text. Each command gets a matching *_command.sh whose body is inlined into the generated script. lib/colors.sh is bundled into every command via bashly's lib_dir, so helpers like green_bold, die, and require_tool are available everywhere.

Build process

Regeneration goes through cmd/bashly.sh, which runs the dannyben/bashly Docker image so no Ruby/gem install is needed:

$ cmd/bashly.sh generate     # reads cmd/, writes ./split-brain at the repo root

The wrapper mounts the repo at /app, points bashly at cmd/bashly-settings.yml (which sets source_dir: cmd, target_dir: .), and writes ./split-brain. We commit that generated script directly. It should never be hand-edited — edit cmd/ and regenerate. (A CI check can re-run cmd/bashly.sh generate and git diff --exit-code ./split-brain to enforce this.)

This means operators never need bashly or Docker to run the CLI — only bash plus the tools each command shells out to. Developers who change the CLI need Docker (for dannyben/bashly) to regenerate.

Distribution

Audience What they install
Operator running the CLI bash 4+ (default on most Linux/macOS) + the tools each command calls (docker, kubectl, helm, and uv only for bootstrap).
Developer modifying the CLI bashly (Ruby gem or Docker image), in addition to the operator deps.
CI bashly (Docker image), plus the deps for whatever lint/test the CI runs.

The generated ./split-brain script is committed at the repo root. cmd/bashly.sh generate is the only way to regenerate it; the file should never be hand-edited (CI enforces this via diff).

Out of scope

  • Secret material. In the dev flow the chart creates the three Secrets from global.secrets.* (set in the gitignored values.secrets.yaml); deploy just checks that overlay exists and forwards it. For out-of-band secrets (sealed-secrets, external-secrets-operator, vault-injector, or kubectl create secret) run with --no-secrets-check and global.secrets.create=false.
  • Cloudflare tunnel creation. cloudflared tunnel create runs on an admin workstation; covered in cloudflare-tunnel.md.
  • ScalarLM deployment. ScalarLM ships its own Helm chart and is deployed independently; see architecture.md.
  • CI/CD orchestration. The CLI is for human operators. CI uses the same primitives (docker buildx, helm upgrade) but typically via its own workflow files, not by calling ./split-brain — primarily so CI can parallelize and cache differently than the serialized human-CLI flow.

Why not Make for everything?

Considered. Make is fine for two-thirds of what this script does (build/lint), but the runtime piece — deploy, status, logs, bootstrap — benefits from real subcommand UX (flag validation, mutually exclusive options, --help per command). Make targets that take arguments are awkward (make deploy ENV=prod is OK; make deploy --env=prod --wait is a syntax error). bashly gives us proper flag parsing for the cost of a small generator dependency.