split-brain

Sign in

Google sign-in for the UI — design

Status: Implemented (app + chart), pending Google OAuth credentials to turn on. Replaces the UI's current auth (Cloudflare Access SSO in front of a dev-mode app that trusts a fixed email) with an app-level Google OpenID Connect login, a signed long-lived session cookie, an email allowlist, and a public docs section. See "Turning it on" at the end.

Goals

  • Who gets in: only Google accounts in @smasint.com or @relational.ai, plus [email protected], may reach the operator views — Requests, Labels, Tokens, Bootstrap, Probe.
  • Public docs: anyone may read /docs with no login.
  • Landing page: an unauthenticated visitor lands on a sign-in page with a "Sign in with Google" button and a link to the public docs.
  • Per-account tokens: router API tokens are scoped to the signed-in Google account (the email is the owner key).
  • Remember me: the session persists in a cookie for a long time (~1 month) so users aren't re-prompted constantly.

Current state (what we're changing)

Piece Today After
Who authenticates the user Cloudflare Access (Google Workspace SSO) at the edge The UI itself, via Google OIDC
App auth dev mode — trusts a fixed UI_DEV_EMAIL (assert_safe_dev_mode + UI_DEV_MODE_KUBERNETES_OK=1) Real per-user identity from a verified Google ID token
Authorization (allowlist) Cloudflare Access policy (out of band, in the CF dashboard) In-app allowlist (domains + emails), enforced every request
Docs behind the same SSO gate public
Identity shown / token owner the fixed dev email for everyone each user's own Google email

The today-column means every signed-in user currently shares one identity and one token namespace. The whole point of this change is real per-user identity and a code-owned allowlist.

Auth flow (OAuth 2.0 / OIDC authorization-code)

Browser                        UI (router-side)                Google
  │  GET /requests (no cookie)        │                           │
  │ ─────────────────────────────────►│  302 → /login?next=/requests
  │  GET /login                        │                           │
  │ ◄───────────────────────────────── │  landing page + "Sign in" │
  │  GET /auth/google/start?next=…     │                           │
  │ ─────────────────────────────────►│  set signed `state` cookie │
  │ ◄───────────────────────────────── │  302 → Google authorize URL
  │  ── consent ──────────────────────────────────────────────────►│
  │ ◄──────────────────────────────────────────────────────────────│ 302 → /auth/google/callback?code&state
  │  GET /auth/google/callback         │                           │
  │ ─────────────────────────────────►│  verify state cookie       │
  │                                    │  POST token endpoint ─────►│  (TLS)
  │                                    │ ◄───────────────────────── │  id_token (JWT)
  │                                    │  verify id_token (JWKS,    │
  │                                    │   aud/iss/exp/email_verified)
  │                                    │  check allowlist           │
  │ ◄───────────────────────────────── │  set `sb_session` cookie,  │
  │                                    │  302 → next (/requests)    │
  │  GET /requests (with cookie)       │  decode cookie → Identity  │
  │ ◄───────────────────────────────── │  200                       │

ID-token verification

The id_token returned from Google's token endpoint is a JWT. We:

  1. Fetch + cache Google's JWKS (https://www.googleapis.com/oauth2/v3/certs), reusing the cached-JWKS pattern already in CloudflareJWTVerifier.
  2. Verify the signature, aud == client_id, iss ∈ {accounts.google.com, https://accounts.google.com}, and exp.
  3. Require email_verified == true and a syntactically valid email.

(We verify the signature even though the token arrives directly over TLS — cheap, and it keeps the verifier identical in shape to the CF one.)

On successful login we set sb_session: an HS256 JWT signed with UI_SESSION_SECRET, claims {sub: email, via: "google", iat, exp} with exp = now + UI_SESSION_MAX_AGE_DAYS (default 30). Cookie attributes:

  • HttpOnly (no JS access), Secure (HTTPS only), SameSite=Lax (sends on top-level navigations so the post-Google redirect works), Path=/, Max-Age = 30 days.

require_identity decodes and verifies this cookie on every request and re-checks the allowlist from the cookie's email — so removing someone from the allowlist takes effect on their next request, without waiting for the 30-day expiry. A bumped UI_SESSION_SECRET invalidates all sessions (global logout).

CSRF / state

/auth/google/start generates a random state, stores it in a short-lived signed sb_oauth cookie (10 min) along with the sanitized next path, and passes state to Google. The callback rejects any request whose state doesn't match the cookie — standard OAuth CSRF protection. next is confined to local paths (open-redirect guard, like _safe_redirect).

Authorization (the allowlist)

A request is authorized iff the verified email is email_verified and:

  • its domain ∈ UI_ALLOWED_DOMAINS (default smasint.com,relational.ai), or
  • the email ∈ UI_ALLOWED_EMAILS (default [email protected]).

Comparison is case-insensitive. A logged-in but not-allowed user is shown an "access denied" page (HTTP 403) that still links to the public docs and a logout link — not a generic error.

Public vs. protected routes

Route(s) Access
/docs, /docs/* public
/static/*, /healthz, /readyz, /favicon.ico public
/login, /auth/google/start, /auth/google/callback, /logout public (the login machinery)
/, /requests*, /labels*, /tokens*, /probe*, /bootstrap* allowlisted Google account
  • / redirects to /requests when authed, else to /login.
  • Protected HTML routes, when unauthed, 302 → /login?next=<path> (so the user lands back where they were). Protected non-GET / API-ish routes return 401.
  • The nav/header renders a Sign in link (→ /login) for anonymous visitors on docs, and email · Sign out when authed. Docs routes use an optional identity so they render for everyone.

New surface

Endpoint Purpose
GET /login Landing page: branding, "Sign in with Google", who-can-access note, Docs link. Carries ?next=.
GET /auth/google/start Build Google authorize URL (scope openid email profile), set state/next cookie, 302 to Google.
GET /auth/google/callback Verify state, exchange code, verify id_token, check allowlist, set sb_session, 302 to next. Denied → 403 page.
GET /logout Clear sb_session, 302 → /login.

auth.py gains a GoogleOIDCVerifier (mirrors CloudflareJWTVerifier: cached JWKS + verify) and SessionCodec (sign/verify the sb_session JWT). require_identity reads the cookie instead of the CF header; optional_identity returns Identity | None for public pages.

Token scoping

Tokens are already stored per owner (TokenIssuer.list_by_owner(email) / created with identity.email). With the real Google email as the identity, tokens are naturally scoped to the individual account — no storage change.

Migration note: tokens previously created under the shared dev email are owned by that email string. If [email protected] signs in with Google, any tokens created while the dev email was also [email protected] carry over; tokens under a different dev email keep working at the router (the router validates the secret, not the owner) but won't appear under the new account. We can leave them or one-time re-assign; recommend just re-minting.

Configuration & secrets

New UI settings (env, surfaced via the ConfigMap unless secret):

Setting Where Default
UI_GOOGLE_CLIENT_ID ConfigMap — (required in prod)
UI_GOOGLE_CLIENT_SECRET Secret google-oauth
UI_SESSION_SECRET Secret google-oauth — (random 32+ bytes)
UI_OAUTH_REDIRECT_URL ConfigMap https://split-brain-ui.scalarxlm.com/auth/google/callback
UI_ALLOWED_DOMAINS ConfigMap smasint.com,relational.ai
UI_ALLOWED_EMAILS ConfigMap [email protected]
UI_SESSION_MAX_AGE_DAYS ConfigMap 30

Chart: extend global.secrets with a google block (clientSecret, sessionSecret) → a google-oauth Secret, mounted into the UI deployment via secretKeyRef (optional, like the router's SCALARLM_API_KEY). Flip the UI prod values off dev mode (UI_DEV_MODE unset) and set the Google config. Dev mode stays for local work.

Google Cloud Console setup (operator, one-time): create an OAuth 2.0 Client ID (type Web application); Authorized redirect URI = https://split-brain-ui.scalarxlm.com/auth/google/callback; copy the client ID + secret into the values/secret. The OAuth consent screen can stay "internal" for Workspace-only, but [email protected] is a personal account, so the consent screen must be External (Testing or Published); the allowlist — not Google — is what actually restricts access.

Relationship to Cloudflare Access

Today a CF Access policy gates the UI hostname. With app-level auth we remove the Access policy from the UI host (the tunnel still routes traffic; the app now authenticates). This is required for /docs to be public and for the Google redirect to work without a second login. The router host is unaffected (already programmatic). cloudflared is unchanged.

Trade-off: the UI becomes internet-reachable and relies on app auth rather than edge auth. There's no password to brute-force (auth is delegated to Google); the allowlist is the gate; /docs is intentionally public. If we want defense-in-depth later, a CF Access service-token bypass or WAF rule can sit in front without changing the app.

Dev mode

Unchanged for local development: UI_DEV_MODE=1 short-circuits to a fixed identity (UI_DEV_EMAIL), skipping Google entirely, and assert_safe_dev_mode still refuses to start dev mode in Kubernetes without the explicit override. Tests drive the allowlist + session codec directly and stub the token exchange (httpx MockTransport), mirroring tests/test_* patterns.

Security checklist

  • ID token: signature (JWKS) + aud + iss + exp + email_verified.
  • Allowlist enforced on every request (from the cookie), not just at login.
  • state cookie prevents login CSRF; next is open-redirect-guarded.
  • Session cookie: HttpOnly + Secure + SameSite=Lax, signed; rotating UI_SESSION_SECRET is a global logout.
  • Client secret + session secret live in a Kubernetes Secret, never the ConfigMap or the image.
  • Docs are the only public app surface; everything operator-facing is gated.

Testing

  • SessionCodec round-trip; expired/forged cookie rejected.
  • Allowlist: allowed domains, allowed email, denied domain, unverified email.
  • require_identity: no cookie → 302 /login?next=; valid cookie → Identity; not-allowed cookie → 403.
  • /docs reachable with no cookie; /requests not.
  • Callback: state mismatch → 400; happy path sets cookie + redirects to next (token exchange + JWKS stubbed).
  • Dev mode still yields a fixed identity.

Decisions

  1. Edge auth: app auth only — remove the Cloudflare Access policy from the UI host; no CF bypass layer.
  2. Library: hand-rolled OIDC (httpx + PyJWT, no new dependency).
  3. Public header: minimal — anonymous docs visitors see just the brand, a Docs link, and Sign in.
  4. Session: sliding 30 days — re-issue the cookie when past halfway to expiry, so active users stay logged in; 30 days idle ends it.

Safe rollout

Google auth activates only when UI_GOOGLE_CLIENT_ID + UI_GOOGLE_CLIENT_SECRET + UI_SESSION_SECRET are all set. Until the operator creates the Google OAuth client and populates the google-oauth Secret, the UI keeps its current behavior — so the code can ship before the credentials exist, and the cutover is a values change (turn off dev mode, set the Google config).

Turning it on

  1. Google Cloud Console → APIs & Services → Credentials → Create OAuth client IDWeb application. Authorized redirect URI: https://split-brain-ui.scalarxlm.com/auth/google/callback. On the OAuth consent screen choose External (so the gmail address works) and add the sign-in accounts as test users (or publish). Copy the client ID + secret.
  2. Secret (values.secrets.yaml): set global.secrets.google.clientSecret to the OAuth secret and global.secrets.google.sessionSecret to a fresh random 32+ byte string (openssl rand -base64 48).
  3. Values (values-dev.yaml): set ui.config.googleClientId to the client ID and ui.config.devMode: false (the cutover). Adjust allowedDomains / allowedEmails if needed.
  4. Cloudflare: remove the Access SSO policy on the UI hostname so /docs is public and the Google redirect isn't double-gated. (Router unaffected.)
  5. ./split-brain deploy --build --remote (or build ui + deploy), then sign in at https://split-brain-ui.scalarxlm.com/.