Operator CLI (`./split-brain`)

A single script at the repo root that wraps the routine operator tasks: build the three images, push them, install or upgrade the Helm chart, tail logs, run the classifier bootstrap. Generated by bashly from a small YAML + per-command shell files under cmd/. The generated ./split-brain is committed at the repo root and run directly — bashly (via Docker) is only needed to regenerate it after editing cmd/.

Goal

$ ./split-brain --help
split-brain — operator CLI for split-brain

Usage: split-brain COMMAND [OPTIONS]

Commands:
  build       Build Docker images (router, classifier, ui)
  push        Push Docker images to the registry
  deploy      Install or upgrade the Helm chart
  destroy     Uninstall the Helm release
  status      Show pod status in the namespace
  logs        Tail logs for a component
  bootstrap   Train the classifier head from a docs directory
  template    Render Helm manifests locally (helm template wrapper)
  lint        Lint everything (ruff, helm lint, dockerfile syntax)

No Python, no Node, no Make required to run the script. Operators need only bash + the tools each command shells out to (docker, kubectl, helm, uv for bootstrap).

Why bashly

We considered four alternatives:

Tool	Why we rejected
Makefile	Great at "build artifact X from Y," bad at UX. No `--help` per target, no flag validation, no subcommand grouping.
Taskfile (go-task)	Same UX problems as Make; also adds a runtime dep (the `task` binary).
Plain shell	Argparse, validation, and `--help` rendering all become our problem. Easy to get wrong; hard to maintain consistency across commands.
Python click / Go cobra	More structured than shell but require a runtime: Python venv on every operator's box, or a per-OS compile/release pipeline. Heavy for what amounts to wrapping six binaries.

bashly hits a sweet spot: declarative YAML for the command surface, generated bash that is human-readable and set -e-safe, no runtime beyond bash itself. The build-time dependency (bashly) only matters for developing the CLI; the generated script is committed and self-contained.

Choosing the cluster (direnv)

The CLI shells out to kubectl and helm; both honor the KUBECONFIG env var. To pin which cluster this project targets without remembering --context or --kubeconfig on every command, the repo ships an .envrc.example you copy to .envrc:

cp .envrc.example .envrc
# edit .envrc to point KUBECONFIG at the right file
direnv allow

After that, every shell you cd into this directory automatically exports KUBECONFIG (and anything else you set there). cd out and the exports vanish, so a different project's kubectl is unaffected. This is the per-project safety net that tools like kubectx can't provide — kubectx writes the active context into the shared ~/.kube/config, which leaks to every other shell on the machine.

.envrc and kubeconfig.yaml are both in .gitignore. The committed .envrc.example is just a template; operators stage their actual kubeconfig out of band (commit it to a credential vault, not to git).

Install direnv: brew install direnv (macOS) or your package manager, then add eval "$(direnv hook zsh)" (or bash) to your shell rc. First-time entry into any new .envrc requires explicit direnv allow — a freshly-cloned repo can't run arbitrary code in your shell without your consent.

CLI surface

Each command is described below with its arguments, expected preconditions, and an example invocation.

`build`

Builds Docker images. The build context is the repo root (each docker/<name>/Dockerfile COPYs <name>/src etc.).

$ ./split-brain build [IMAGE] [--tag=TAG] [--registry=URL]
                      [--platform=P] [--remote] [--push]

Arg/Flag	Default	Notes
`IMAGE` (positional)	`all`	One of `router` / `classifier` / `ui` / `all`.
`--tag`	`v0.0.1`	Image tag (the dev cluster iterates on `v0.0.1`).
`--registry`	env `REGISTRY` or `docker.io/gdiamos`	The dev cluster pulls from `docker.io/gdiamos`.
`--platform`	`linux/amd64`	The cluster nodes are amd64. Building on an arm64 host without this produces images that deploy but crash-loop with `exec format error`.
`--remote`	off	Build on `$BLACKWELL_HOST` (default `normal@blackwell`) and push. Implies `--push`.
`--push`	off	If set, runs `push` after each successful build.

Locally, wraps docker buildx build --platform linux/amd64 (so an arm64 laptop still produces amd64 images — via qemu, which is slow).

--remote rsyncs the working tree to a fast amd64 host (blackwell) and runs docker build + docker push there natively — much faster than local qemu cross-builds, and the recommended path for the torch-based classifier / ui images. The remote is already logged into the registry as gdiamos.

$ ./split-brain build router --tag=dev
$ ./split-brain build --tag=v0.2.0 --push       # local buildx (amd64)
$ ./split-brain build ui --remote               # native amd64 build on blackwell

`push`

Pushes already-built images. Useful when build was run without --push (e.g., on a developer laptop without registry credentials, followed by a push from CI).

$ ./split-brain push --image=all --tag=v0.2.0

`deploy`

Installs or upgrades the Helm release. Idempotent.

$ ./split-brain deploy [ENV] [--namespace=NS] [--release=NAME]
                       [--tag=TAG] [--values-secrets=PATH]
                       [--build] [--remote] [--no-restart]
                       [--dry-run] [--wait] [--no-secrets-check]

ENV is a positional (dev or prod, default dev). The default flags are tuned so a bare ./split-brain deploy reproduces the live dev release exactly: values-dev.yaml + values.secrets.yaml against release split-brain in namespace split-brain.

Arg/Flag	Default	Notes
`ENV` (positional)	`dev`	Selects `charts/split-brain/values-<env>.yaml` overlay.
`--namespace` / `-n`	`split-brain`	Passed with `--create-namespace` so helm ensures it exists.
`--release`	`split-brain`	Helm release name.
`--tag`	(unset)	When set, overrides `global.image.tag` via `--set`; otherwise the values file pins it (`v0.0.1`).
`--values-secrets`	`charts/split-brain/values.secrets.yaml`	The gitignored `.secrets.yaml` overlay holding `global.secrets.`. Forwarded to helm as an extra `-f` (resolved against the repo root if relative).
`--build`	off	Build + push `router`/`classifier`/`ui` (at `--tag`) before upgrading.
`--remote`	off	With `--build`, build on `$BLACKWELL_HOST` (native amd64) instead of locally.
`--no-restart`	off	Skip the post-upgrade rollout restart (see below).
`--dry-run`	off	`helm upgrade --dry-run`.
`--wait`	off	`helm upgrade --wait --timeout 5m`.
`--no-secrets-check`	off	Skip the secrets-overlay requirement (use with `global.secrets.create=false` and Secrets managed out of band).

Rollout restart. The dev flow pins a fixed tag (v0.0.1) with pullPolicy: Always, so a rebuilt image keeps the same tag and the rendered pod template is unchanged — helm upgrade alone would not roll the pods, and the new code would never run. So after a non-dry-run upgrade deploy runs kubectl rollout restart on the router, classifier, and ui deployments (cloudflared is left alone to keep the tunnel up), forcing a re-pull. Pass --no-restart for a config-only change where you don't want to bounce pods.

Preflight before invoking helm: the CLI checks that the selected values-<env>.yaml exists, and — unless --no-secrets-check — that the --values-secrets overlay exists. It does not read or modify the secrets file; it just forwards it to helm as an extra -f. (RBAC and image-presence checks are intentionally left to helm/kubelet rather than duplicated here.)

$ ./split-brain deploy                                  # reproduce the live dev release
$ ./split-brain deploy dev --dry-run                    # render + server-validate only
$ ./split-brain deploy prod --tag=v0.2.0 --wait \
      --values-secrets=values-prod.secrets.yaml
$ ./split-brain deploy prod --no-secrets-check          # sealed-secrets / vault

`destroy`

$ ./split-brain destroy [--namespace=NS] [--release=NAME] [--keep-pvc]

helm uninstall + (by default) deletes the split-brain-data PVC. --keep-pvc retains it — useful if you want to redeploy quickly without losing audit logs or tokens.

The command prompts for explicit confirmation (yes/N) before running because the audit log is destroyed with the PVC.

`status`

$ ./split-brain status [--namespace=NS]

Runs kubectl get pods,svc,pvc -n NS -o wide and a short summary: which images are running, how many replicas of each component are ready, whether the cloudflared tunnel is connected.

`logs`

$ ./split-brain logs COMPONENT [-f] [--tail=N]

Positional COMPONENT is one of router / classifier / ui / cloudflared. Wraps kubectl logs -l app.kubernetes.io/component=COMPONENT so output is aggregated across pods.

`bootstrap`

Runs the classifier bootstrap (chunk → generate → train → save head) from the host. The trained head is saved locally; the operator mounts or copies it into the running classifier via the Helm chart's keywords/head ConfigMap path.

$ ./split-brain bootstrap --docs=PATH [--general=PATH] [--output=PATH]
                          [--generator=heuristic|scalarlm] [--epochs=N]

Flag	Default	Notes
`--docs`	required	Directory of proprietary `.md` / `.txt` files.
`--general`	`classifier/src/classifier/bootstrap/corpus/public_general.jsonl`	JSONL of public general prompts.
`--output`	`./head.safetensors`	Where to write the trained head.
`--generator`	`heuristic`	`heuristic` (deterministic, no LLM) or `scalarlm` (needs `SCALARLM_BASE_URL`).
`--epochs`	50	Training epochs.

Internally runs uv run --directory classifier python -m classifier.bootstrap.main ... so the operator doesn't need to know about the package layout.

`template`

$ ./split-brain template [--env=ENV] [--output=PATH]

Wraps helm template for inspecting what deploy would produce. Default output: stdout. With --output FILE, writes there. Useful in code review of chart changes (git diff two renders).

`lint`

$ ./split-brain lint [--target=python|helm|docker|all]

Target	Runs
`python`	`uv run ruff check` and `uv run pytest -q` in each of `router/`, `classifier/`, `ui/`.
`helm`	`helm lint charts/split-brain` and `helm template demo charts/split-brain > /dev/null`.
`docker`	`hadolint` on each Dockerfile if installed; otherwise skipped with a note.
`all`	All of the above.

Bashly source layout

Modeled on the cmd/ + committed-script pattern used in the neighbouring orbital and scalarlm repos.

split-brain               # GENERATED CLI — committed at repo root, run directly
cmd/
  bashly.yml              # the command tree definition (source of truth)
  bashly-settings.yml     # source_dir: cmd, target_dir: . (repo root)
  bashly.sh               # `cmd/bashly.sh generate` — dockerized regenerator
  lib/
    colors.sh             # color helpers + die() / require_tool()
  build_command.sh
  push_command.sh
  deploy_command.sh
  destroy_command.sh
  status_command.sh
  logs_command.sh
  bootstrap_command.sh
  template_command.sh
  lint_command.sh

bashly.yml is the single source of truth for command names, args, flags, defaults, validation, and help text. Each command gets a matching *_command.sh whose body is inlined into the generated script. lib/colors.sh is bundled into every command via bashly's lib_dir, so helpers like green_bold, die, and require_tool are available everywhere.

Build process

Regeneration goes through cmd/bashly.sh, which runs the dannyben/bashly Docker image so no Ruby/gem install is needed:

$ cmd/bashly.sh generate     # reads cmd/, writes ./split-brain at the repo root

The wrapper mounts the repo at /app, points bashly at cmd/bashly-settings.yml (which sets source_dir: cmd, target_dir: .), and writes ./split-brain. We commit that generated script directly. It should never be hand-edited — edit cmd/ and regenerate. (A CI check can re-run cmd/bashly.sh generate and git diff --exit-code ./split-brain to enforce this.)

This means operators never need bashly or Docker to run the CLI — only bash plus the tools each command shells out to. Developers who change the CLI need Docker (for dannyben/bashly) to regenerate.

Distribution

Audience	What they install
Operator running the CLI	bash 4+ (default on most Linux/macOS) + the tools each command calls (`docker`, `kubectl`, `helm`, and `uv` only for `bootstrap`).
Developer modifying the CLI	bashly (Ruby gem or Docker image), in addition to the operator deps.
CI	bashly (Docker image), plus the deps for whatever lint/test the CI runs.

The generated ./split-brain script is committed at the repo root. cmd/bashly.sh generate is the only way to regenerate it; the file should never be hand-edited (CI enforces this via diff).

Out of scope

Secret material. In the dev flow the chart creates the three Secrets from global.secrets.* (set in the gitignored values.secrets.yaml); deploy just checks that overlay exists and forwards it. For out-of-band secrets (sealed-secrets, external-secrets-operator, vault-injector, or kubectl create secret) run with --no-secrets-check and global.secrets.create=false.
Cloudflare tunnel creation. cloudflared tunnel create runs on an admin workstation; covered in cloudflare-tunnel.md.
ScalarLM deployment. ScalarLM ships its own Helm chart and is deployed independently; see architecture.md.
CI/CD orchestration. The CLI is for human operators. CI uses the same primitives (docker buildx, helm upgrade) but typically via its own workflow files, not by calling ./split-brain — primarily so CI can parallelize and cache differently than the serialized human-CLI flow.

Why not Make for everything?

Considered. Make is fine for two-thirds of what this script does (build/lint), but the runtime piece — deploy, status, logs, bootstrap — benefits from real subcommand UX (flag validation, mutually exclusive options, --help per command). Make targets that take arguments are awkward (make deploy ENV=prod is OK; make deploy --env=prod --wait is a syntax error). bashly gives us proper flag parsing for the cost of a small generator dependency.

Operator CLI (./split-brain)

Goal

Why bashly

Choosing the cluster (direnv)

CLI surface

build

push

deploy

destroy

status

logs

bootstrap

template

lint