UI
split-brain ships a small internal web UI for operators and ML engineers. It is not the way end users send prompts — clients use the OpenAI-compatible API directly. The UI is a window into the audit log and the routing system's live behavior, and a tool for the people who maintain the classifier.
Implementation note. This doc was written ahead of the build and parts describe a different design than what shipped. What exists today: a Request explorer with an audit-log-derived metrics strip (no Prometheus), a request detail page with per-span labeling, a classifier probe (single + batch), a Labels view, a Bootstrap (train) view, Tokens, an Admin page that does edit per-user token limits and the default Claude model, and rendered Docs. Auth is app-level Google sign-in (hand-rolled OIDC; see google-auth.md), not Cloudflare Access. The sections below are corrected where they diverge; clearly-unbuilt pieces are marked Planned.
Goals
- Make routing decisions visible. Operators should be able to see, for any given request, exactly why it went to the backend it did.
- Make the audit log explorable. The router writes every request to durable storage; the UI is how a human reads it back.
- Make the classifier inspectable. ML engineers should be able to paste a prompt and see what the live classifier (and prior versions) would do with it — without going through the router.
Explicit non-goals:
- Not a chat client. We do not run a hosted playground. Existing OpenAI-compatible clients work against the router unmodified; we do not duplicate them.
- Limited runtime tuning. τ (the classifier band) still changes via Helm
- deploy. But the Admin page (admins only) does edit two runtime knobs that are safe and frequently needed: per-user daily token limits and the default Claude model — both persisted to the PVC and read live by the router (see token-limits.md, default-model.md).
- Not customer-facing. The UI is internal-only; there is no per-customer view or self-serve account page.
Users and what they need
| User | Primary task | Frequency |
|---|---|---|
| SRE / on-call | Is the system healthy? Why is X failing? | Live, incidents |
| Operator | Did anything route to Claude that shouldn't have? | Daily audit |
| ML engineer | Why did the classifier label this prompt that way? | Investigations |
| Domain reviewer | Label these uncertain cases for the anchor set | Weekly batch |
| Compliance reviewer | Show me everything client X sent last quarter | On request |
The v1 UI serves the first three. The fourth and fifth are sketched in v2.
Views
1. Metrics strip (on the Request explorer)
There is no separate Prometheus dashboard. Instead the Request explorer
carries a summary strip computed from the audit log for the active filter
+ time window: totals, the backend split (claude / scalarlm), decision mix,
and error/latency rollups, with each card linking to the filtered request
list. Because scanning the whole log is too slow to block the page, the strip
loads asynchronously (htmx fragment, /requests-metrics → _metrics.html)
while the paginated table renders immediately.
2. Request explorer
Searchable, filterable view of the audit log.
Filter bar: time range, client key (fingerprint, not raw key), classifier label, backend, status, free-text substring on the prompt.
Result list: one row per request, columns:
received_at | client | model_requested | classifier_label (confidence) | backend | status | total_ms
Sortable. Page size 50. Cursor-based pagination over the audit log's date/hour directory layout on the PVC (no full scans).
Detail view — click into a row:
- Request metadata: owner, token id, requested model, streaming, extended-thinking (requested / in response), status.
- Classifier decision: label,
p_novel, model version, classifier ms. - Backend: routing decision (general / novel / uncertain / forced), chosen backend, the model actually used, token counts (incl. cached), latency.
- The prompt (
messages) and response, collapsed by default. - A "Label spans for finetuning" section — see § Per-span labeling.
There is currently no per-field view_prompts permission or redaction
gate: any authenticated (allowlisted) user can see prompt/response content.
The allowlist is the access control. (A finer prompt-content tier is
Planned — see Privacy below.)
Per-span labeling
The router classifies a request over every user text block and
tool_result span and routes on the max p_novel (see
classifier.md), so labels are created at the same
granularity. The detail page lists each span with its current label and a
"novel?" checkbox, plus two buttons:
- general (all spans) — asserts the whole request is benign; labels every
span
general. - novel (checked spans) — labels only the spans you check
novel, so benign boilerplate isn't dragged into the novel class.
Labels are keyed by span text (not request id), so identical spans across requests share one label and re-labeling overrides. They feed the next classifier bootstrap (oversampled — see classifier.md).
3. Classifier probe
A text area + Classify. Sends the entered text to the live classifier and
shows the label, p_novel, and model version, plus a router-decision
preview by τ: a bar showing which backend the band rule would pick at
several candidate τ values, so a borderline prompt's threshold sensitivity is
visible at a glance.
Batch probe — a collapsible panel scores many prompts at once (one per
line, capped at 100), returning p_novel, label, and routing for each. This
is the tool to mirror the router's per-span max: paste each span of a request
on its own line to see exactly which one trips routing.
The probe does not route to any backend — it is a classifier-only call, safe with arbitrary text, and produces no router audit entry. (Comparing against pinned historical model versions and token-level saliency are Planned, not built.)
4. Labels
The curated training set behind the data flywheel. Lists every label
(general/novel) created from the Request explorer, with counts, who
labeled it and when, a remove action, and a Download training data
(.jsonl) button that emits drop-in {"text","label"} rows for the next
bootstrap.
5. Bootstrap (train)
Where the classifier head is (re)trained. The operator uploads/stages docs,
clicks Train, and the UI runs the bootstrap pipeline in-process (chunk →
generate → assemble dataset with oversampled human labels → train the head →
save → POST /reload on the classifier), streaming live progress. After a
run it auto-evaluates the new head against a probe set so the operator sees
whether routing improved.
6. Tokens
Per-user self-service for router API tokens. A token issued
here is the bearer credential clients put in
Authorization: Bearer sbk_<…> when calling the router's
/v1/chat/completions endpoint — i.e., the token authorizes LLM
calls through split-brain (classifier + Claude/ScalarLM routing
behind it). It is the only way for an internal user to get a
working router credential; there is no admin-distribution path.
The view is scoped to the logged-in user; you cannot see or revoke another user's tokens.
- Create token. Click "Create token", optionally provide a
name ("laptop", "ci-eval", "macbook"). The plaintext token is
displayed once in the form
sbk_<40 base62 chars>with a copy-to-clipboard button and an explicit notice that it cannot be shown again. - List my tokens. Table with name, created_at, last_used_at, status (active / revoked). Active tokens have a "Revoke" button.
- Revoke. Marks the token revoked. Takes effect at the router on next refresh (≤ 30 seconds).
The plaintext is never stored — only sha256(token). Users who
lose a token revoke and create a new one. There is no admin
"reissue" path; the user logs back in and clicks Create.
The token is scoped to the router. It does not grant access to the UI itself — UI access is Cloudflare-Access-only. So a leaked token cannot be used to read the audit log or other users' prompts; the worst case is that an attacker can spend the user's LLM quota until the token is revoked.
7. Admin (admins only)
Gated by require_admin (email domain ∈ UI_ADMIN_DOMAINS, default
smasint.com); non-admins get 403 and the nav link is hidden. Two editable
controls plus a user table:
- Default Claude model — the live fallback model the router uses when a
request doesn't pin its own
claude-*model. Validated (claude-*) and written tosettings.json; the router picks it up within seconds. See default-model.md. - Per-user token limits — set the global default and per-user overrides
(0 = unlimited), written to
limits.jsonand enforced by the router. See token-limits.md. - Users table — every known user (token owners + anyone with usage or an override), paginated, with today's tokens, effective limit, and % used; regex/substring search.
8. Docs
The docs/ design docs (this set) rendered as HTML and served publicly
(unauthenticated) so the login page can link to them. Intra-doc links are
rewritten to /docs/<slug>.
Views — Planned (not built)
- Annotation queue — a queue of items needing human label (shadow-routing disagreements, uncertain-band entries lacking an outcome label, random calibration samples). Depends on the outcome-label/shadow-routing machinery that isn't built (see classifier.md).
- Release management — list classifier/ScalarLM image tags with eval metrics and promotion history. Depends on the release-coupling pipeline.
Tech stack
- Backend: FastAPI, separate process and image from the router. Reasons it is not co-located: (a) the UI is allowed to be slower than the router and we don't want it competing for router resources; (b) the router image stays minimal (no templating library, no HTML asset bundle).
- Frontend: server-rendered HTML via Jinja2 + HTMX for
interactivity. No SPA, no build step, no Node toolchain. The
internal-tool feature set does not need React, and a static
HTML render is the smallest possible attack surface. Static bundle is
app.css,htmx.min.js, andbootstrap.js(the upload/progress helper for the Bootstrap view). Asset URLs are content-hashed for cache-busting behind Cloudflare. There is no charting library (no uPlot) and no SSE — the metrics strip is a server-rendered htmx fragment. - Metrics: computed directly from the audit log (with a short TTL cache), not Prometheus. The UI does not talk to Prometheus.
- Audit log access: read-only mount of the shared PVC
(
/var/split-brain/audit/) that the router writes to. The UI's mount isreadOnly: trueat the volume level — there is no write path back to the audit log from the UI even if the UI process is compromised. The UI does have RW subPaths for the state it owns (tokens, bootstrap, heads, labels, limits, settings).
Auth and access control
The UI uses app-level Google sign-in — hand-rolled OIDC against Google, with a signed session cookie — as the primary auth. (Cloudflare Access JWT verification is still supported as an alternative edge mode, and a dev-mode bypass exists for local work.) Full design and setup: google-auth.md.
In brief: /auth/google/start → Google consent → /auth/google/callback
verifies the ID token and issues a long-lived (sliding 30-day) signed
sb_session cookie. Every request re-checks the allowlist — allowed
domains (smasint.com, relational.ai) plus specific allowed emails. The
docs pages are public; everything else requires a signed-in, allowlisted
identity. A direct hit on the pod still requires a valid session, so
port-forward / lateral movement doesn't bypass auth.
Permissions
Two tiers:
- Allowlisted user — token issuance (/tokens), the classifier probe (/probe), and docs. Lands on /tokens.
- Admin (
require_admin: email domain ∈UI_ADMIN_DOMAINS, defaultsmasint.com) — additionally the Request explorer (list / detail / metrics /requests.jsonlexport — it exposes every user's prompts and responses), Labels, Bootstrap, and the Admin page (token limits, default model, tiers). Lands on /requests.
All of /requests*, /labels*, /bootstrap*, and /admin* use
require_admin; their nav links are hidden for non-admins. There is no
per-field view_prompts content tier — admin gating is the control (a finer
tier is Planned).
Service / non-human callers
Not supported as interactive UI access. Programmatic audit access goes
through the router's token-scoped GET /v1/audit/export
(orbital-traces.md) or by reading the audit JSONL on the
PVC directly — neither requires going through the UI.
Local development
Cloudflare Access does not exist on a laptop. Two options:
cloudflared access login <ui-url>issues a local token that signs requests to the real tunnel. Preferred — exercises the real auth code path.UI_DEV_MODE=1env flag bypasses JWT verification and injects a configured fake identity. The process refuses to start ifUI_DEV_MODE=1andKUBERNETES_SERVICE_HOSTare both set, so it cannot deploy by accident.
Router token storage
Tokens created in the Tokens view are persisted as one file
per token under /var/split-brain/tokens/ on the shared
RWX PVC:
/var/split-brain/tokens/
tok_abc123.json
tok_def456.json
...
Each file:
{
"id": "tok_abc123",
"hash": "sha256:e3b0c44...",
"owner_email": "[email protected]",
"name": "laptop",
"created_at": "2026-05-20T12:00:00Z",
"last_used_at": "2026-05-20T15:32:11Z",
"revoked_at": null
}
One file per token sidesteps the read-modify-write contention
that a single shared index file would create on a POSIX
filesystem. Concurrent token creates simply write different
files; concurrent revocations of different tokens never touch
the same path. Updates to an existing token (revocation,
last-used) write to tok_abc123.json.tmp, fsync, then
rename(2) over the original — atomic on POSIX, including the
RWX filesystems we target. Concurrent updates to the same token
serialize via flock(2).
- Writes come from the UI service (create / revoke) and the router (batched last-used flushes).
- Reads come from the router (every 30 s on a background loop; on startup it blocks until the first successful read) and from the UI (on every render of the Tokens view).
If the directory becomes unreadable in steady state, the router keeps serving with its last-known set (fail-static, not fail-open). If the directory is unreadable at startup, the router refuses to become ready — an empty allow-list would silently lock out every client.
The directory remains small (hundreds to a few thousand files
under any realistic internal usage). If we ever cross ~10k
tokens — unlikely — we shard by the first byte of the id:
tokens/ab/tok_abc123.json.
Offboarding
A reconciliation job compares the token index against Google
Workspace membership and revokes tokens whose owner_email is
no longer in the org. Revocations are written to the UI audit
log for review.
For v1 this runs manually (operator invokes it via
kubectl create job --from=cronjob/...). Automating it requires
Workspace Admin API credentials, which is a separate
permissioning conversation; documented in the operator runbook
and revisited if the operator burden becomes real.
The lag between "user leaves Workspace" and "tokens revoked" is the operator's reconciliation cadence — Cloudflare Access blocks them from the UI immediately, but their bearer tokens keep working until reconciliation runs. Document this in the offboarding runbook.
Privacy and the IP invariant
The audit log contains the same prompts the router classified and routed. Anything that reaches the UI inherits the same IP constraint as the audit log itself.
- Allowlist is the boundary. Access to prompt/response content is gated by
the Google sign-in allowlist (allowed domains + emails). Per-field
view_promptsredaction and reveal-logging are Planned, not built — today any allowlisted user sees content. - No copy-to-Claude path. The UI has no "ask Claude to summarize this" button or similar — that would route audit content to Claude, violating the IP invariant. Any future "summarize this conversation" feature uses ScalarLM.
Deployment
UI is its own subchart in the umbrella Helm chart
(charts/split-brain/charts/ui/). One Deployment, Service, ConfigMap,
NetworkPolicy. One replica in dev (it is the single writer of its PVC
files on a RWO volume; the base chart's replicaCount: 2 must not be used
while those files live on the PVC). CPU-only, 250m / 512Mi requests.
The UI Deployment mounts the shared PVC via subPaths (its root filesystem is read-only):
- RW:
tokens/,bootstrap/,heads/,labels/,limits/,settings/ - RO:
audit/,usage/
Network egress is to the classifier Service in-cluster (probe + /reload)
and to Google (OIDC token + JWKS endpoints, for sign-in). It does not
have egress to the router, ScalarLM, Claude, or any object store — keeping the
IP invariant cleanly auditable from a NetworkPolicy dump. (There is no
Prometheus to reach.)
Code layout
ui/
pyproject.toml
src/ui/
__init__.py
app.py # FastAPI app + all routes
auth.py # Google OIDC, session codec, allowlist, require_admin
audit.py # PVC audit reader: search, spans, metrics
classify.py # classifier probe client
labels.py # label store (text-keyed; flywheel)
settings.py # runtime settings writer (default model)
quota.py # usage read + limits read/write (admin)
tokens.py # router-token issuer
bootstrap.py # bootstrap orchestration (train + reload)
bootstrap_pipeline/ # chunk / generate / dataset / train (mirror of classifier)
docs.py # markdown doc rendering
config.py
templates/ # base, requests, request_detail, probe, labels,
# bootstrap, tokens, admin, docs, login, denied, _metrics, ...
static/ # app.css, htmx.min.js, bootstrap.js
tests/