orgstate / SSO.md
Legal-i's picture
Initial OrgState deploy via Stage 150 free-tier stack
d2d1903 verified
# OrgState β€” OIDC SSO Setup
Enterprise sign-on via OIDC. Stage 97. Works with any compliant IdP β€” Okta, Google Workspace, Microsoft Entra (Azure AD), Auth0, Authentik, etc. SAML is NOT supported in V1.1; document a customer request if you hit one.
## TL;DR for the customer success engineer
```bash
# 1. install the optional SSO dep (once per deployment)
pip install -r requirements-sso.txt
# 2. register the IdP (operator side)
infra sso provider create acme \
--name "Okta Prod" \
--issuer-url https://acme.okta.com \
--client-id 0oa...id \
--client-secret "iKx...secret" \
--allowed-email-domains acme.com
# 3. point the customer at https://api.orgstate.example/sso/acme/login
# β†’ 302 to Okta β†’ Okta login β†’ 302 to callback β†’ session set
```
## How the flow works
1. User clicks "Sign in with SSO" in the dashboard β†’ `GET /sso/{tid}/login`.
2. We mint state + nonce + PKCE verifier, stash them in a short-lived signed cookie, redirect to the IdP's `authorize_endpoint`.
3. IdP authenticates the user, redirects to `GET /sso/{tid}/callback?code=...&state=...`.
4. We verify the state matches the cookie, exchange the code for tokens at the IdP's `token_endpoint`, validate the ID token (signature via JWKS + iss + aud + exp + nonce).
5. Verified email β†’ domain gate check (`allowed_email_domains`) β†’ mint a session row in `sso_sessions`, set as cookie (`orgstate_sso_session`), redirect to `/`.
6. Subsequent requests carry the session cookie (browser) or `Authorization: Bearer <session_token>` (SDK).
## Env vars
| Var | Required? | Default | Why |
|---|---|---|---|
| `ORGSTATE_SSO_COOKIE_SECRET` | yes (multi-replica) | per-process random | HMAC key for the flow cookie. **Must be the same across replicas** or callbacks land on a node that didn't mint the flow β†’ 400 sso_flow_expired. |
| `ORGSTATE_REQUIRE_HTTPS_WEBHOOKS` | recommended | unset | When true, blocks `http://` IdP issuers (Stage 88). |
## Per-IdP setup
### Okta
1. Okta admin β†’ Applications β†’ Create App Integration β†’ OIDC β†’ Web Application.
2. Sign-in redirect URI: `https://api.orgstate.example/sso/{tid}/callback`. Add one per tenant.
3. Sign-out redirect URI: `https://dashboard.example/` (whatever post-logout landing you want).
4. Grant type: Authorization Code + PKCE.
5. Copy the Client ID + Client Secret + the Okta org URL (`https://acme.okta.com`).
6. Run:
```bash
infra sso provider create acme \
--name "Okta Prod" \
--issuer-url https://acme.okta.com \
--client-id <CLIENT_ID> \
--client-secret <CLIENT_SECRET> \
--allowed-email-domains acme.com
```
### Google Workspace
1. Google Cloud Console β†’ APIs & Services β†’ Credentials β†’ Create OAuth 2.0 Client ID.
2. Application type: Web application.
3. Authorized redirect URI: `https://api.orgstate.example/sso/{tid}/callback`.
4. Copy Client ID + Client Secret.
5. Issuer URL: `https://accounts.google.com`.
6. Run:
```bash
infra sso provider create acme \
--name "Google Workspace" \
--issuer-url https://accounts.google.com \
--client-id <ID>.apps.googleusercontent.com \
--client-secret <SECRET> \
--allowed-email-domains acme.com
```
### Microsoft Entra (Azure AD)
1. Entra portal β†’ App registrations β†’ New registration.
2. Redirect URI (Web): `https://api.orgstate.example/sso/{tid}/callback`.
3. Certificates & secrets β†’ New client secret β†’ copy value.
4. Issuer URL: `https://login.microsoftonline.com/<TENANT_ID>/v2.0`. The tenant_id here is Entra's tenant ID, NOT your OrgState tenant_id.
5. API permissions β†’ grant `openid`, `email`, `profile`.
6. Run:
```bash
infra sso provider create acme \
--name "Microsoft Entra" \
--issuer-url https://login.microsoftonline.com/<ENTRA_TID>/v2.0 \
--client-id <APP_ID> \
--client-secret <SECRET> \
--allowed-email-domains acme.com
```
### Auth0
1. Auth0 dashboard β†’ Applications β†’ Create β†’ Regular Web App.
2. Allowed Callback URLs: `https://api.orgstate.example/sso/{tid}/callback`.
3. Issuer URL: `https://<your-auth0-tenant>.auth0.com`.
4. Run `infra sso provider create` with the Client ID + Secret.
## Multiple IdPs per tenant
A tenant can register multiple SSO providers (e.g. one Okta + one Google for partners). When that happens, the login URL must specify which one:
```
https://api.orgstate.example/sso/acme/login?provider_id=sso_abc123
```
Without `?provider_id=` and with multiple providers, we return 400 `sso_provider_required` rather than guessing.
## Domain gate (`allowed_email_domains`)
CSV list of email-domain suffixes the verified email must match. Empty string = allow any verified email (useful for personal Google accounts in dev, dangerous in prod).
Examples:
- `acme.com` β†’ only @acme.com
- `acme.com,partner.io` β†’ either
- `""` (empty) β†’ any verified email (NOT recommended for prod)
The check is case-insensitive and anchored at the `@` β€” `acme.com` does NOT match `evilacme.com`.
## Session lifetime
12 hours (`DEFAULT_SESSION_TTL_HOURS` in `infra/sso/sessions.py`). Aligns with a typical workday. To change, edit the constant and redeploy.
## Session operations (Stage 98)
```bash
# who's logged in right now?
infra sso session list acme
infra sso session list acme --user-email alice@acme.com
# forensics β€” include past sessions
infra sso session list acme --include-revoked --include-expired
# revoke ONE session (operator pastes token from list)
infra sso session revoke --token <full_session_token>
# OFFBOARDING: kill EVERY session for a user across all devices
infra sso session revoke --tenant acme --user-email alice@acme.com
# writes audit row `sso_revoke_user` with the count
# housekeeping: drop sessions past their expires_at
# (audit row only when count > 0 β€” cron-friendly)
infra sso session purge
# typical nightly cron entry:
# 0 3 * * * python -m infra sso session purge --actor cron_nightly
```
Audit trail β€” every operation lands in `audit_logs`:
| Action | When |
|---|---|
| `sso_login` | session created via /sso/.../callback |
| `sso_logout` | single-token revoke (`infra sso session revoke --token`) |
| `sso_revoke_user` | bulk revoke (`--tenant + --user-email`) |
| `sso_purge_expired` | purge with count > 0 |
## Troubleshooting
| Symptom | Likely cause |
|---|---|
| 503 `sso_unavailable` | `pip install -r requirements-sso.txt` not run on the deployed instance. |
| 404 `sso_not_configured` | No providers registered for this tenant. Run `infra sso provider list <tid>`. |
| 400 `sso_state_mismatch` on callback | Browser dropped the flow cookie OR you're behind a TLS terminator that strips cookies. Check SameSite / Secure flags + `ORGSTATE_REQUIRE_HTTPS=true`. |
| 400 `sso_flow_expired` after multi-replica deploy | `ORGSTATE_SSO_COOKIE_SECRET` not set or differs between replicas. Set it to the same value on every node. |
| 401 `sso_email_unverified` | IdP says the email isn't verified. User confirms email at the IdP and retries. |
| 403 `sso_domain_not_allowed` | User's email domain isn't in `allowed_email_domains`. Either add the domain or have the user use a permitted account. |
| 502 `sso_discovery_failed` | IdP's `/.well-known/openid-configuration` unreachable from the OrgState process. Network ACL? |
Full troubleshooting log: every successful login writes `sso_login`, every revocation writes `sso_logout`. Failed callbacks land in regular API error logs with the request_id (Stage 84) β€” `grep request_id=<...>` to see the chain.
## See also
- [`ENCRYPTION.md`](ENCRYPTION.md) β€” encryption-at-rest posture, including client_secret handling roadmap
- [`RUNBOOK.md`](RUNBOOK.md) β€” operator runbook (incident triage, key rotation)
- Stage 88 (`infra/api/tls.py`) β€” TLS enforcement that pairs with SSO (you want both on in prod)
- Stage 82 β€” GDPR erasure also wipes sso_providers + sso_sessions