Operator CLI (./split-brain)
A single script at the repo root that wraps the routine operator tasks:
build the three images, push them, install or upgrade the Helm chart,
tail logs, run the classifier bootstrap. Generated by
bashly from a small YAML + per-command
shell files under cmd/. The generated ./split-brain is committed
at the repo root and run directly — bashly (via Docker) is only
needed to regenerate it after editing cmd/.
Goal
$ ./split-brain --help
split-brain — operator CLI for split-brain
Usage: split-brain COMMAND [OPTIONS]
Commands:
build Build Docker images (router, classifier, ui)
push Push Docker images to the registry
deploy Install or upgrade the Helm chart
destroy Uninstall the Helm release
status Show pod status in the namespace
logs Tail logs for a component
bootstrap Train the classifier head from a docs directory
template Render Helm manifests locally (helm template wrapper)
lint Lint everything (ruff, helm lint, dockerfile syntax)
No Python, no Node, no Make required to run the script. Operators
need only bash + the tools each command shells out to (docker,
kubectl, helm, uv for bootstrap).
Why bashly
We considered four alternatives:
| Tool | Why we rejected |
|---|---|
| Makefile | Great at "build artifact X from Y," bad at UX. No --help per target, no flag validation, no subcommand grouping. |
| Taskfile (go-task) | Same UX problems as Make; also adds a runtime dep (the task binary). |
| Plain shell | Argparse, validation, and --help rendering all become our problem. Easy to get wrong; hard to maintain consistency across commands. |
| Python click / Go cobra | More structured than shell but require a runtime: Python venv on every operator's box, or a per-OS compile/release pipeline. Heavy for what amounts to wrapping six binaries. |
bashly hits a sweet spot: declarative YAML for the command surface,
generated bash that is human-readable and set -e-safe, no runtime
beyond bash itself. The build-time dependency (bashly) only matters
for developing the CLI; the generated script is committed and
self-contained.
Choosing the cluster (direnv)
The CLI shells out to kubectl and helm; both honor the
KUBECONFIG env var. To pin which cluster this project targets
without remembering --context or --kubeconfig on every command,
the repo ships an .envrc.example you copy to .envrc:
cp .envrc.example .envrc
# edit .envrc to point KUBECONFIG at the right file
direnv allow
After that, every shell you cd into this directory automatically
exports KUBECONFIG (and anything else you set there). cd out
and the exports vanish, so a different project's kubectl is
unaffected. This is the per-project safety net that tools like
kubectx can't provide — kubectx writes the active context into
the shared ~/.kube/config, which leaks to every other shell on
the machine.
.envrc and kubeconfig.yaml are both in .gitignore. The
committed .envrc.example is just a template; operators stage
their actual kubeconfig out of band (commit it to a credential
vault, not to git).
Install direnv: brew install direnv (macOS) or your package
manager, then add eval "$(direnv hook zsh)" (or bash) to your
shell rc. First-time entry into any new .envrc requires explicit
direnv allow — a freshly-cloned repo can't run arbitrary code
in your shell without your consent.
CLI surface
Each command is described below with its arguments, expected preconditions, and an example invocation.
build
Builds Docker images. The build context is the repo root (each
docker/<name>/Dockerfile COPYs <name>/src etc.).
$ ./split-brain build [IMAGE] [--tag=TAG] [--registry=URL]
[--platform=P] [--remote] [--push]
| Arg/Flag | Default | Notes |
|---|---|---|
IMAGE (positional) |
all |
One of router / classifier / ui / all. |
--tag |
v0.0.1 |
Image tag (the dev cluster iterates on v0.0.1). |
--registry |
env REGISTRY or docker.io/gdiamos |
The dev cluster pulls from docker.io/gdiamos. |
--platform |
linux/amd64 |
The cluster nodes are amd64. Building on an arm64 host without this produces images that deploy but crash-loop with exec format error. |
--remote |
off | Build on $BLACKWELL_HOST (default normal@blackwell) and push. Implies --push. |
--push |
off | If set, runs push after each successful build. |
Locally, wraps docker buildx build --platform linux/amd64 (so an
arm64 laptop still produces amd64 images — via qemu, which is slow).
--remote rsyncs the working tree to a fast amd64 host (blackwell)
and runs docker build + docker push there natively — much faster
than local qemu cross-builds, and the recommended path for the
torch-based classifier / ui images. The remote is already logged
into the registry as gdiamos.
$ ./split-brain build router --tag=dev
$ ./split-brain build --tag=v0.2.0 --push # local buildx (amd64)
$ ./split-brain build ui --remote # native amd64 build on blackwell
push
Pushes already-built images. Useful when build was run without
--push (e.g., on a developer laptop without registry credentials,
followed by a push from CI).
$ ./split-brain push --image=all --tag=v0.2.0
deploy
Installs or upgrades the Helm release. Idempotent.
$ ./split-brain deploy [ENV] [--namespace=NS] [--release=NAME]
[--tag=TAG] [--values-secrets=PATH]
[--build] [--remote] [--no-restart]
[--dry-run] [--wait] [--no-secrets-check]
ENV is a positional (dev or prod, default dev). The default
flags are tuned so a bare ./split-brain deploy reproduces the live
dev release exactly: values-dev.yaml + values.secrets.yaml against
release split-brain in namespace split-brain.
| Arg/Flag | Default | Notes |
|---|---|---|
ENV (positional) |
dev |
Selects charts/split-brain/values-<env>.yaml overlay. |
--namespace / -n |
split-brain |
Passed with --create-namespace so helm ensures it exists. |
--release |
split-brain |
Helm release name. |
--tag |
(unset) | When set, overrides global.image.tag via --set; otherwise the values file pins it (v0.0.1). |
--values-secrets |
charts/split-brain/values.secrets.yaml |
The gitignored *.secrets.yaml overlay holding global.secrets.*. Forwarded to helm as an extra -f (resolved against the repo root if relative). |
--build |
off | Build + push router/classifier/ui (at --tag) before upgrading. |
--remote |
off | With --build, build on $BLACKWELL_HOST (native amd64) instead of locally. |
--no-restart |
off | Skip the post-upgrade rollout restart (see below). |
--dry-run |
off | helm upgrade --dry-run. |
--wait |
off | helm upgrade --wait --timeout 5m. |
--no-secrets-check |
off | Skip the secrets-overlay requirement (use with global.secrets.create=false and Secrets managed out of band). |
Rollout restart. The dev flow pins a fixed tag (v0.0.1) with
pullPolicy: Always, so a rebuilt image keeps the same tag and the
rendered pod template is unchanged — helm upgrade alone would not
roll the pods, and the new code would never run. So after a non-dry-run
upgrade deploy runs kubectl rollout restart on the router,
classifier, and ui deployments (cloudflared is left alone to keep
the tunnel up), forcing a re-pull. Pass --no-restart for a
config-only change where you don't want to bounce pods.
Preflight before invoking helm: the CLI checks that the selected
values-<env>.yaml exists, and — unless --no-secrets-check — that the
--values-secrets overlay exists. It does not read or modify the
secrets file; it just forwards it to helm as an extra -f. (RBAC and
image-presence checks are intentionally left to helm/kubelet rather than
duplicated here.)
$ ./split-brain deploy # reproduce the live dev release
$ ./split-brain deploy dev --dry-run # render + server-validate only
$ ./split-brain deploy prod --tag=v0.2.0 --wait \
--values-secrets=values-prod.secrets.yaml
$ ./split-brain deploy prod --no-secrets-check # sealed-secrets / vault
destroy
$ ./split-brain destroy [--namespace=NS] [--release=NAME] [--keep-pvc]
helm uninstall + (by default) deletes the split-brain-data PVC.
--keep-pvc retains it — useful if you want to redeploy quickly
without losing audit logs or tokens.
The command prompts for explicit confirmation (yes/N) before
running because the audit log is destroyed with the PVC.
status
$ ./split-brain status [--namespace=NS]
Runs kubectl get pods,svc,pvc -n NS -o wide and a short summary:
which images are running, how many replicas of each component are
ready, whether the cloudflared tunnel is connected.
logs
$ ./split-brain logs COMPONENT [-f] [--tail=N]
Positional COMPONENT is one of router / classifier / ui /
cloudflared. Wraps kubectl logs -l app.kubernetes.io/component=COMPONENT
so output is aggregated across pods.
bootstrap
Runs the classifier bootstrap (chunk → generate → train → save head) from the host. The trained head is saved locally; the operator mounts or copies it into the running classifier via the Helm chart's keywords/head ConfigMap path.
$ ./split-brain bootstrap --docs=PATH [--general=PATH] [--output=PATH]
[--generator=heuristic|scalarlm] [--epochs=N]
| Flag | Default | Notes |
|---|---|---|
--docs |
required | Directory of proprietary .md / .txt files. |
--general |
classifier/src/classifier/bootstrap/corpus/public_general.jsonl |
JSONL of public general prompts. |
--output |
./head.safetensors |
Where to write the trained head. |
--generator |
heuristic |
heuristic (deterministic, no LLM) or scalarlm (needs SCALARLM_BASE_URL). |
--epochs |
50 | Training epochs. |
Internally runs uv run --directory classifier python -m classifier.bootstrap.main ...
so the operator doesn't need to know about the package layout.
template
$ ./split-brain template [--env=ENV] [--output=PATH]
Wraps helm template for inspecting what deploy would produce.
Default output: stdout. With --output FILE, writes there. Useful
in code review of chart changes (git diff two renders).
lint
$ ./split-brain lint [--target=python|helm|docker|all]
| Target | Runs |
|---|---|
python |
uv run ruff check and uv run pytest -q in each of router/, classifier/, ui/. |
helm |
helm lint charts/split-brain and helm template demo charts/split-brain > /dev/null. |
docker |
hadolint on each Dockerfile if installed; otherwise skipped with a note. |
all |
All of the above. |
Bashly source layout
Modeled on the cmd/ + committed-script pattern used in the
neighbouring orbital and scalarlm repos.
split-brain # GENERATED CLI — committed at repo root, run directly
cmd/
bashly.yml # the command tree definition (source of truth)
bashly-settings.yml # source_dir: cmd, target_dir: . (repo root)
bashly.sh # `cmd/bashly.sh generate` — dockerized regenerator
lib/
colors.sh # color helpers + die() / require_tool()
build_command.sh
push_command.sh
deploy_command.sh
destroy_command.sh
status_command.sh
logs_command.sh
bootstrap_command.sh
template_command.sh
lint_command.sh
bashly.yml is the single source of truth for command names, args,
flags, defaults, validation, and help text. Each command gets a
matching *_command.sh whose body is inlined into the generated
script. lib/colors.sh is bundled into every command via bashly's
lib_dir, so helpers like green_bold, die, and require_tool
are available everywhere.
Build process
Regeneration goes through cmd/bashly.sh, which runs the
dannyben/bashly Docker image so no Ruby/gem install is needed:
$ cmd/bashly.sh generate # reads cmd/, writes ./split-brain at the repo root
The wrapper mounts the repo at /app, points bashly at
cmd/bashly-settings.yml (which sets source_dir: cmd,
target_dir: .), and writes ./split-brain. We commit that generated
script directly. It should never be hand-edited — edit cmd/ and
regenerate. (A CI check can re-run cmd/bashly.sh generate and
git diff --exit-code ./split-brain to enforce this.)
This means operators never need bashly or Docker to run the CLI —
only bash plus the tools each command shells out to. Developers who
change the CLI need Docker (for dannyben/bashly) to regenerate.
Distribution
| Audience | What they install |
|---|---|
| Operator running the CLI | bash 4+ (default on most Linux/macOS) + the tools each command calls (docker, kubectl, helm, and uv only for bootstrap). |
| Developer modifying the CLI | bashly (Ruby gem or Docker image), in addition to the operator deps. |
| CI | bashly (Docker image), plus the deps for whatever lint/test the CI runs. |
The generated ./split-brain script is committed at the repo root.
cmd/bashly.sh generate is the only way to regenerate it; the file
should never be hand-edited (CI enforces this via diff).
Out of scope
- Secret material. In the dev flow the chart creates the three
Secrets from
global.secrets.*(set in the gitignoredvalues.secrets.yaml);deployjust checks that overlay exists and forwards it. For out-of-band secrets (sealed-secrets, external-secrets-operator, vault-injector, orkubectl create secret) run with--no-secrets-checkandglobal.secrets.create=false. - Cloudflare tunnel creation.
cloudflared tunnel createruns on an admin workstation; covered in cloudflare-tunnel.md. - ScalarLM deployment. ScalarLM ships its own Helm chart and is deployed independently; see architecture.md.
- CI/CD orchestration. The CLI is for human operators. CI uses
the same primitives (
docker buildx,helm upgrade) but typically via its own workflow files, not by calling./split-brain— primarily so CI can parallelize and cache differently than the serialized human-CLI flow.
Why not Make for everything?
Considered. Make is fine for two-thirds of what this script does
(build/lint), but the runtime piece — deploy, status, logs,
bootstrap — benefits from real subcommand UX (flag validation,
mutually exclusive options, --help per command). Make targets
that take arguments are awkward (make deploy ENV=prod is OK; make
deploy --env=prod --wait is a syntax error). bashly gives us
proper flag parsing for the cost of a small generator dependency.