Google sign-in for the UI — design
Status: Implemented (app + chart), pending Google OAuth credentials to turn on. Replaces the UI's current auth (Cloudflare Access SSO in front of a dev-mode app that trusts a fixed email) with an app-level Google OpenID Connect login, a signed long-lived session cookie, an email allowlist, and a public docs section. See "Turning it on" at the end.
Goals
- Who gets in: only Google accounts in
@smasint.comor@relational.ai, plus[email protected], may reach the operator views — Requests, Labels, Tokens, Bootstrap, Probe. - Public docs: anyone may read
/docswith no login. - Landing page: an unauthenticated visitor lands on a sign-in page with a "Sign in with Google" button and a link to the public docs.
- Per-account tokens: router API tokens are scoped to the signed-in Google account (the email is the owner key).
- Remember me: the session persists in a cookie for a long time (~1 month) so users aren't re-prompted constantly.
Current state (what we're changing)
| Piece | Today | After |
|---|---|---|
| Who authenticates the user | Cloudflare Access (Google Workspace SSO) at the edge | The UI itself, via Google OIDC |
| App auth | dev mode — trusts a fixed UI_DEV_EMAIL (assert_safe_dev_mode + UI_DEV_MODE_KUBERNETES_OK=1) |
Real per-user identity from a verified Google ID token |
| Authorization (allowlist) | Cloudflare Access policy (out of band, in the CF dashboard) | In-app allowlist (domains + emails), enforced every request |
| Docs | behind the same SSO gate | public |
| Identity shown / token owner | the fixed dev email for everyone | each user's own Google email |
The today-column means every signed-in user currently shares one identity and one token namespace. The whole point of this change is real per-user identity and a code-owned allowlist.
Auth flow (OAuth 2.0 / OIDC authorization-code)
Browser UI (router-side) Google
│ GET /requests (no cookie) │ │
│ ─────────────────────────────────►│ 302 → /login?next=/requests
│ GET /login │ │
│ ◄───────────────────────────────── │ landing page + "Sign in" │
│ GET /auth/google/start?next=… │ │
│ ─────────────────────────────────►│ set signed `state` cookie │
│ ◄───────────────────────────────── │ 302 → Google authorize URL
│ ── consent ──────────────────────────────────────────────────►│
│ ◄──────────────────────────────────────────────────────────────│ 302 → /auth/google/callback?code&state
│ GET /auth/google/callback │ │
│ ─────────────────────────────────►│ verify state cookie │
│ │ POST token endpoint ─────►│ (TLS)
│ │ ◄───────────────────────── │ id_token (JWT)
│ │ verify id_token (JWKS, │
│ │ aud/iss/exp/email_verified)
│ │ check allowlist │
│ ◄───────────────────────────────── │ set `sb_session` cookie, │
│ │ 302 → next (/requests) │
│ GET /requests (with cookie) │ decode cookie → Identity │
│ ◄───────────────────────────────── │ 200 │
ID-token verification
The id_token returned from Google's token endpoint is a JWT. We:
- Fetch + cache Google's JWKS (
https://www.googleapis.com/oauth2/v3/certs), reusing the cached-JWKS pattern already inCloudflareJWTVerifier. - Verify the signature,
aud == client_id,iss ∈ {accounts.google.com, https://accounts.google.com}, andexp. - Require
email_verified == trueand a syntactically validemail.
(We verify the signature even though the token arrives directly over TLS — cheap, and it keeps the verifier identical in shape to the CF one.)
Session cookie (the "remember me")
On successful login we set sb_session: an HS256 JWT signed with
UI_SESSION_SECRET, claims {sub: email, via: "google", iat, exp} with
exp = now + UI_SESSION_MAX_AGE_DAYS (default 30). Cookie attributes:
HttpOnly(no JS access),Secure(HTTPS only),SameSite=Lax(sends on top-level navigations so the post-Google redirect works),Path=/,Max-Age = 30 days.
require_identity decodes and verifies this cookie on every request and
re-checks the allowlist from the cookie's email — so removing someone
from the allowlist takes effect on their next request, without waiting for
the 30-day expiry. A bumped UI_SESSION_SECRET invalidates all sessions
(global logout).
CSRF / state
/auth/google/start generates a random state, stores it in a short-lived
signed sb_oauth cookie (10 min) along with the sanitized next path, and
passes state to Google. The callback rejects any request whose state
doesn't match the cookie — standard OAuth CSRF protection. next is
confined to local paths (open-redirect guard, like _safe_redirect).
Authorization (the allowlist)
A request is authorized iff the verified email is email_verified and:
- its domain ∈
UI_ALLOWED_DOMAINS(defaultsmasint.com,relational.ai), or - the email ∈
UI_ALLOWED_EMAILS(default[email protected]).
Comparison is case-insensitive. A logged-in but not-allowed user is shown an "access denied" page (HTTP 403) that still links to the public docs and a logout link — not a generic error.
Public vs. protected routes
| Route(s) | Access |
|---|---|
/docs, /docs/* |
public |
/static/*, /healthz, /readyz, /favicon.ico |
public |
/login, /auth/google/start, /auth/google/callback, /logout |
public (the login machinery) |
/, /requests*, /labels*, /tokens*, /probe*, /bootstrap* |
allowlisted Google account |
/redirects to/requestswhen authed, else to/login.- Protected HTML routes, when unauthed, 302 →
/login?next=<path>(so the user lands back where they were). Protected non-GET / API-ish routes return 401. - The nav/header renders a Sign in link (→
/login) for anonymous visitors on docs, andemail · Sign outwhen authed. Docs routes use an optional identity so they render for everyone.
New surface
| Endpoint | Purpose |
|---|---|
GET /login |
Landing page: branding, "Sign in with Google", who-can-access note, Docs link. Carries ?next=. |
GET /auth/google/start |
Build Google authorize URL (scope openid email profile), set state/next cookie, 302 to Google. |
GET /auth/google/callback |
Verify state, exchange code, verify id_token, check allowlist, set sb_session, 302 to next. Denied → 403 page. |
GET /logout |
Clear sb_session, 302 → /login. |
auth.py gains a GoogleOIDCVerifier (mirrors CloudflareJWTVerifier:
cached JWKS + verify) and SessionCodec (sign/verify the sb_session JWT).
require_identity reads the cookie instead of the CF header;
optional_identity returns Identity | None for public pages.
Token scoping
Tokens are already stored per owner (TokenIssuer.list_by_owner(email) /
created with identity.email). With the real Google email as the identity,
tokens are naturally scoped to the individual account — no storage change.
Migration note: tokens previously created under the shared dev email are
owned by that email string. If [email protected] signs in with
Google, any tokens created while the dev email was also
[email protected] carry over; tokens under a different dev email
keep working at the router (the router validates the secret, not the owner)
but won't appear under the new account. We can leave them or one-time
re-assign; recommend just re-minting.
Configuration & secrets
New UI settings (env, surfaced via the ConfigMap unless secret):
| Setting | Where | Default |
|---|---|---|
UI_GOOGLE_CLIENT_ID |
ConfigMap | — (required in prod) |
UI_GOOGLE_CLIENT_SECRET |
Secret google-oauth |
— |
UI_SESSION_SECRET |
Secret google-oauth |
— (random 32+ bytes) |
UI_OAUTH_REDIRECT_URL |
ConfigMap | https://split-brain-ui.scalarxlm.com/auth/google/callback |
UI_ALLOWED_DOMAINS |
ConfigMap | smasint.com,relational.ai |
UI_ALLOWED_EMAILS |
ConfigMap | [email protected] |
UI_SESSION_MAX_AGE_DAYS |
ConfigMap | 30 |
Chart: extend global.secrets with a google block (clientSecret,
sessionSecret) → a google-oauth Secret, mounted into the UI deployment
via secretKeyRef (optional, like the router's SCALARLM_API_KEY). Flip
the UI prod values off dev mode (UI_DEV_MODE unset) and set the Google
config. Dev mode stays for local work.
Google Cloud Console setup (operator, one-time): create an OAuth 2.0
Client ID (type Web application); Authorized redirect URI =
https://split-brain-ui.scalarxlm.com/auth/google/callback; copy the client
ID + secret into the values/secret. The OAuth consent screen can stay
"internal" for Workspace-only, but [email protected] is a personal
account, so the consent screen must be External (Testing or Published);
the allowlist — not Google — is what actually restricts access.
Relationship to Cloudflare Access
Today a CF Access policy gates the UI hostname. With app-level auth we
remove the Access policy from the UI host (the tunnel still routes
traffic; the app now authenticates). This is required for /docs to be
public and for the Google redirect to work without a second login. The
router host is unaffected (already programmatic). cloudflared is unchanged.
Trade-off: the UI becomes internet-reachable and relies on app auth rather
than edge auth. There's no password to brute-force (auth is delegated to
Google); the allowlist is the gate; /docs is intentionally public. If we
want defense-in-depth later, a CF Access service-token bypass or WAF rule
can sit in front without changing the app.
Dev mode
Unchanged for local development: UI_DEV_MODE=1 short-circuits to a fixed
identity (UI_DEV_EMAIL), skipping Google entirely, and
assert_safe_dev_mode still refuses to start dev mode in Kubernetes without
the explicit override. Tests drive the allowlist + session codec directly
and stub the token exchange (httpx MockTransport), mirroring
tests/test_* patterns.
Security checklist
- ID token: signature (JWKS) +
aud+iss+exp+email_verified. - Allowlist enforced on every request (from the cookie), not just at login.
statecookie prevents login CSRF;nextis open-redirect-guarded.- Session cookie:
HttpOnly+Secure+SameSite=Lax, signed; rotatingUI_SESSION_SECRETis a global logout. - Client secret + session secret live in a Kubernetes Secret, never the ConfigMap or the image.
- Docs are the only public app surface; everything operator-facing is gated.
Testing
SessionCodecround-trip; expired/forged cookie rejected.- Allowlist: allowed domains, allowed email, denied domain, unverified email.
require_identity: no cookie → 302/login?next=; valid cookie → Identity; not-allowed cookie → 403./docsreachable with no cookie;/requestsnot.- Callback: state mismatch → 400; happy path sets cookie + redirects to
next(token exchange + JWKS stubbed). - Dev mode still yields a fixed identity.
Decisions
- Edge auth: app auth only — remove the Cloudflare Access policy from the UI host; no CF bypass layer.
- Library: hand-rolled OIDC (httpx + PyJWT, no new dependency).
- Public header: minimal — anonymous docs visitors see just the brand, a Docs link, and Sign in.
- Session: sliding 30 days — re-issue the cookie when past halfway to expiry, so active users stay logged in; 30 days idle ends it.
Safe rollout
Google auth activates only when UI_GOOGLE_CLIENT_ID + UI_GOOGLE_CLIENT_SECRET
+ UI_SESSION_SECRET are all set. Until the operator creates the Google
OAuth client and populates the google-oauth Secret, the UI keeps its
current behavior — so the code can ship before the credentials exist, and
the cutover is a values change (turn off dev mode, set the Google config).
Turning it on
- Google Cloud Console → APIs & Services → Credentials → Create OAuth
client ID → Web application. Authorized redirect URI:
https://split-brain-ui.scalarxlm.com/auth/google/callback. On the OAuth consent screen choose External (so the gmail address works) and add the sign-in accounts as test users (or publish). Copy the client ID + secret. - Secret (
values.secrets.yaml): setglobal.secrets.google.clientSecretto the OAuth secret andglobal.secrets.google.sessionSecretto a fresh random 32+ byte string (openssl rand -base64 48). - Values (
values-dev.yaml): setui.config.googleClientIdto the client ID andui.config.devMode: false(the cutover). AdjustallowedDomains/allowedEmailsif needed. - Cloudflare: remove the Access SSO policy on the UI hostname so
/docsis public and the Google redirect isn't double-gated. (Router unaffected.) ./split-brain deploy --build --remote(or build ui + deploy), then sign in athttps://split-brain-ui.scalarxlm.com/.