UI

split-brain ships a small internal web UI for operators and ML engineers. It is not the way end users send prompts — clients use the OpenAI-compatible API directly. The UI is a window into the audit log and the routing system's live behavior, and a tool for the people who maintain the classifier.

Implementation note. This doc was written ahead of the build and parts describe a different design than what shipped. What exists today: a Request explorer with an audit-log-derived metrics strip (no Prometheus), a request detail page with per-span labeling, a classifier probe (single + batch), a Labels view, a Bootstrap (train) view, Tokens, an Admin page that does edit per-user token limits and the default Claude model, and rendered Docs. Auth is app-level Google sign-in (hand-rolled OIDC; see google-auth.md), not Cloudflare Access. The sections below are corrected where they diverge; clearly-unbuilt pieces are marked Planned.

Goals

Make routing decisions visible. Operators should be able to see, for any given request, exactly why it went to the backend it did.
Make the audit log explorable. The router writes every request to durable storage; the UI is how a human reads it back.
Make the classifier inspectable. ML engineers should be able to paste a prompt and see what the live classifier (and prior versions) would do with it — without going through the router.

Explicit non-goals:

Not a chat client. We do not run a hosted playground. Existing OpenAI-compatible clients work against the router unmodified; we do not duplicate them.
Limited runtime tuning. τ (the classifier band) still changes via Helm
deploy. But the Admin page (admins only) does edit two runtime knobs that are safe and frequently needed: per-user daily token limits and the default Claude model — both persisted to the PVC and read live by the router (see token-limits.md, default-model.md).
Not customer-facing. The UI is internal-only; there is no per-customer view or self-serve account page.

Users and what they need

User	Primary task	Frequency
SRE / on-call	Is the system healthy? Why is X failing?	Live, incidents
Operator	Did anything route to Claude that shouldn't have?	Daily audit
ML engineer	Why did the classifier label this prompt that way?	Investigations
Domain reviewer	Label these uncertain cases for the anchor set	Weekly batch
Compliance reviewer	Show me everything client X sent last quarter	On request

The v1 UI serves the first three. The fourth and fifth are sketched in v2.

Views

1. Metrics strip (on the Request explorer)

There is no separate Prometheus dashboard. Instead the Request explorer carries a summary strip computed from the audit log for the active filter + time window: totals, the backend split (claude / scalarlm), decision mix, and error/latency rollups, with each card linking to the filtered request list. Because scanning the whole log is too slow to block the page, the strip loads asynchronously (htmx fragment, /requests-metrics → _metrics.html) while the paginated table renders immediately.

2. Request explorer

Searchable, filterable view of the audit log.

Filter bar: time range, client key (fingerprint, not raw key), classifier label, backend, status, free-text substring on the prompt.

Result list: one row per request, columns:

received_at | client | model_requested | classifier_label (confidence) | backend | status | total_ms

Sortable. Page size 50. Cursor-based pagination over the audit log's date/hour directory layout on the PVC (no full scans).

Detail view — click into a row:

Request metadata: owner, token id, requested model, streaming, extended-thinking (requested / in response), status.
Classifier decision: label, p_novel, model version, classifier ms.
Backend: routing decision (general / novel / uncertain / forced), chosen backend, the model actually used, token counts (incl. cached), latency.
The prompt (messages) and response, collapsed by default.
A "Label spans for finetuning" section — see § Per-span labeling.

There is currently no per-field view_prompts permission or redaction gate: any authenticated (allowlisted) user can see prompt/response content. The allowlist is the access control. (A finer prompt-content tier is Planned — see Privacy below.)

Per-span labeling

The router classifies a request over every user text block and tool_result span and routes on the max p_novel (see classifier.md), so labels are created at the same granularity. The detail page lists each span with its current label and a "novel?" checkbox, plus two buttons:

general (all spans) — asserts the whole request is benign; labels every span general.
novel (checked spans) — labels only the spans you check novel, so benign boilerplate isn't dragged into the novel class.

Labels are keyed by span text (not request id), so identical spans across requests share one label and re-labeling overrides. They feed the next classifier bootstrap (oversampled — see classifier.md).

3. Classifier probe

A text area + Classify. Sends the entered text to the live classifier and shows the label, p_novel, and model version, plus a router-decision preview by τ: a bar showing which backend the band rule would pick at several candidate τ values, so a borderline prompt's threshold sensitivity is visible at a glance.

Batch probe — a collapsible panel scores many prompts at once (one per line, capped at 100), returning p_novel, label, and routing for each. This is the tool to mirror the router's per-span max: paste each span of a request on its own line to see exactly which one trips routing.

The probe does not route to any backend — it is a classifier-only call, safe with arbitrary text, and produces no router audit entry. (Comparing against pinned historical model versions and token-level saliency are Planned, not built.)

4. Labels

The curated training set behind the data flywheel. Lists every label (general/novel) created from the Request explorer, with counts, who labeled it and when, a remove action, and a Download training data (.jsonl) button that emits drop-in {"text","label"} rows for the next bootstrap.

5. Bootstrap (train)

Where the classifier head is (re)trained. The operator uploads/stages docs, clicks Train, and the UI runs the bootstrap pipeline in-process (chunk → generate → assemble dataset with oversampled human labels → train the head → save → POST /reload on the classifier), streaming live progress. After a run it auto-evaluates the new head against a probe set so the operator sees whether routing improved.

6. Tokens

Per-user self-service for router API tokens. A token issued here is the bearer credential clients put in Authorization: Bearer sbk_<…> when calling the router's /v1/chat/completions endpoint — i.e., the token authorizes LLM calls through split-brain (classifier + Claude/ScalarLM routing behind it). It is the only way for an internal user to get a working router credential; there is no admin-distribution path.

The view is scoped to the logged-in user; you cannot see or revoke another user's tokens.

Create token. Click "Create token", optionally provide a name ("laptop", "ci-eval", "macbook"). The plaintext token is displayed once in the form sbk_<40 base62 chars> with a copy-to-clipboard button and an explicit notice that it cannot be shown again.
List my tokens. Table with name, created_at, last_used_at, status (active / revoked). Active tokens have a "Revoke" button.
Revoke. Marks the token revoked. Takes effect at the router on next refresh (≤ 30 seconds).

The plaintext is never stored — only sha256(token). Users who lose a token revoke and create a new one. There is no admin "reissue" path; the user logs back in and clicks Create.

The token is scoped to the router. It does not grant access to the UI itself — UI access is Cloudflare-Access-only. So a leaked token cannot be used to read the audit log or other users' prompts; the worst case is that an attacker can spend the user's LLM quota until the token is revoked.

7. Admin (admins only)

Gated by require_admin (email domain ∈ UI_ADMIN_DOMAINS, default smasint.com); non-admins get 403 and the nav link is hidden. Two editable controls plus a user table:

Default Claude model — the live fallback model the router uses when a request doesn't pin its own claude-* model. Validated (claude-*) and written to settings.json; the router picks it up within seconds. See default-model.md.
Per-user token limits — set the global default and per-user overrides (0 = unlimited), written to limits.json and enforced by the router. See token-limits.md.
Users table — every known user (token owners + anyone with usage or an override), paginated, with today's tokens, effective limit, and % used; regex/substring search.

8. Docs

The docs/ design docs (this set) rendered as HTML and served publicly (unauthenticated) so the login page can link to them. Intra-doc links are rewritten to /docs/<slug>.

Views — Planned (not built)

Annotation queue — a queue of items needing human label (shadow-routing disagreements, uncertain-band entries lacking an outcome label, random calibration samples). Depends on the outcome-label/shadow-routing machinery that isn't built (see classifier.md).
Release management — list classifier/ScalarLM image tags with eval metrics and promotion history. Depends on the release-coupling pipeline.

Tech stack

Backend: FastAPI, separate process and image from the router. Reasons it is not co-located: (a) the UI is allowed to be slower than the router and we don't want it competing for router resources; (b) the router image stays minimal (no templating library, no HTML asset bundle).
Frontend: server-rendered HTML via Jinja2 + HTMX for interactivity. No SPA, no build step, no Node toolchain. The internal-tool feature set does not need React, and a static HTML render is the smallest possible attack surface. Static bundle is app.css, htmx.min.js, and bootstrap.js (the upload/progress helper for the Bootstrap view). Asset URLs are content-hashed for cache-busting behind Cloudflare. There is no charting library (no uPlot) and no SSE — the metrics strip is a server-rendered htmx fragment.
Metrics: computed directly from the audit log (with a short TTL cache), not Prometheus. The UI does not talk to Prometheus.
Audit log access: read-only mount of the shared PVC (/var/split-brain/audit/) that the router writes to. The UI's mount is readOnly: true at the volume level — there is no write path back to the audit log from the UI even if the UI process is compromised. The UI does have RW subPaths for the state it owns (tokens, bootstrap, heads, labels, limits, settings).

Auth and access control

The UI uses app-level Google sign-in — hand-rolled OIDC against Google, with a signed session cookie — as the primary auth. (Cloudflare Access JWT verification is still supported as an alternative edge mode, and a dev-mode bypass exists for local work.) Full design and setup: google-auth.md.

In brief: /auth/google/start → Google consent → /auth/google/callback verifies the ID token and issues a long-lived (sliding 30-day) signed sb_session cookie. Every request re-checks the allowlist — allowed domains (smasint.com, relational.ai) plus specific allowed emails. The docs pages are public; everything else requires a signed-in, allowlisted identity. A direct hit on the pod still requires a valid session, so port-forward / lateral movement doesn't bypass auth.

Permissions

Two tiers:

Allowlisted user — token issuance (/tokens), the classifier probe (/probe), and docs. Lands on /tokens.
Admin (require_admin: email domain ∈ UI_ADMIN_DOMAINS, default smasint.com) — additionally the Request explorer (list / detail / metrics / requests.jsonl export — it exposes every user's prompts and responses), Labels, Bootstrap, and the Admin page (token limits, default model, tiers). Lands on /requests.

All of /requests*, /labels*, /bootstrap*, and /admin* use require_admin; their nav links are hidden for non-admins. There is no per-field view_prompts content tier — admin gating is the control (a finer tier is Planned).

Service / non-human callers

Not supported as interactive UI access. Programmatic audit access goes through the router's token-scoped GET /v1/audit/export (orbital-traces.md) or by reading the audit JSONL on the PVC directly — neither requires going through the UI.

Local development

Cloudflare Access does not exist on a laptop. Two options:

cloudflared access login <ui-url> issues a local token that signs requests to the real tunnel. Preferred — exercises the real auth code path.
UI_DEV_MODE=1 env flag bypasses JWT verification and injects a configured fake identity. The process refuses to start if UI_DEV_MODE=1 and KUBERNETES_SERVICE_HOST are both set, so it cannot deploy by accident.

Router token storage

Tokens created in the Tokens view are persisted as one file per token under /var/split-brain/tokens/ on the shared RWX PVC:

/var/split-brain/tokens/
  tok_abc123.json
  tok_def456.json
  ...

Each file:

{
  "id": "tok_abc123",
  "hash": "sha256:e3b0c44...",
  "owner_email": "[email protected]",
  "name": "laptop",
  "created_at": "2026-05-20T12:00:00Z",
  "last_used_at": "2026-05-20T15:32:11Z",
  "revoked_at": null
}

One file per token sidesteps the read-modify-write contention that a single shared index file would create on a POSIX filesystem. Concurrent token creates simply write different files; concurrent revocations of different tokens never touch the same path. Updates to an existing token (revocation, last-used) write to tok_abc123.json.tmp, fsync, then rename(2) over the original — atomic on POSIX, including the RWX filesystems we target. Concurrent updates to the same token serialize via flock(2).

Writes come from the UI service (create / revoke) and the router (batched last-used flushes).
Reads come from the router (every 30 s on a background loop; on startup it blocks until the first successful read) and from the UI (on every render of the Tokens view).

If the directory becomes unreadable in steady state, the router keeps serving with its last-known set (fail-static, not fail-open). If the directory is unreadable at startup, the router refuses to become ready — an empty allow-list would silently lock out every client.

The directory remains small (hundreds to a few thousand files under any realistic internal usage). If we ever cross ~10k tokens — unlikely — we shard by the first byte of the id: tokens/ab/tok_abc123.json.

Offboarding

A reconciliation job compares the token index against Google Workspace membership and revokes tokens whose owner_email is no longer in the org. Revocations are written to the UI audit log for review.

For v1 this runs manually (operator invokes it via kubectl create job --from=cronjob/...). Automating it requires Workspace Admin API credentials, which is a separate permissioning conversation; documented in the operator runbook and revisited if the operator burden becomes real.

The lag between "user leaves Workspace" and "tokens revoked" is the operator's reconciliation cadence — Cloudflare Access blocks them from the UI immediately, but their bearer tokens keep working until reconciliation runs. Document this in the offboarding runbook.

Privacy and the IP invariant

The audit log contains the same prompts the router classified and routed. Anything that reaches the UI inherits the same IP constraint as the audit log itself.

Allowlist is the boundary. Access to prompt/response content is gated by the Google sign-in allowlist (allowed domains + emails). Per-field view_prompts redaction and reveal-logging are Planned, not built — today any allowlisted user sees content.
No copy-to-Claude path. The UI has no "ask Claude to summarize this" button or similar — that would route audit content to Claude, violating the IP invariant. Any future "summarize this conversation" feature uses ScalarLM.

Deployment

UI is its own subchart in the umbrella Helm chart (charts/split-brain/charts/ui/). One Deployment, Service, ConfigMap, NetworkPolicy. One replica in dev (it is the single writer of its PVC files on a RWO volume; the base chart's replicaCount: 2 must not be used while those files live on the PVC). CPU-only, 250m / 512Mi requests.

The UI Deployment mounts the shared PVC via subPaths (its root filesystem is read-only):

RW: tokens/, bootstrap/, heads/, labels/, limits/, settings/
RO: audit/, usage/

Network egress is to the classifier Service in-cluster (probe + /reload) and to Google (OIDC token + JWKS endpoints, for sign-in). It does not have egress to the router, ScalarLM, Claude, or any object store — keeping the IP invariant cleanly auditable from a NetworkPolicy dump. (There is no Prometheus to reach.)

Code layout

ui/
  pyproject.toml
  src/ui/
    __init__.py
    app.py              # FastAPI app + all routes
    auth.py             # Google OIDC, session codec, allowlist, require_admin
    audit.py            # PVC audit reader: search, spans, metrics
    classify.py         # classifier probe client
    labels.py           # label store (text-keyed; flywheel)
    settings.py         # runtime settings writer (default model)
    quota.py            # usage read + limits read/write (admin)
    tokens.py           # router-token issuer
    bootstrap.py        # bootstrap orchestration (train + reload)
    bootstrap_pipeline/ # chunk / generate / dataset / train (mirror of classifier)
    docs.py             # markdown doc rendering
    config.py
    templates/          # base, requests, request_detail, probe, labels,
                        # bootstrap, tokens, admin, docs, login, denied, _metrics, ...
    static/             # app.css, htmx.min.js, bootstrap.js
  tests/