--- title: Codex As API emoji: 🤖 colorFrom: indigo colorTo: purple sdk: docker app_port: 7860 pinned: false --- # Codex-as-API An **OpenAI-compatible HTTP API** backed by the [OpenAI Codex CLI](https://github.com/openai/codex), authenticated with your **ChatGPT login** (no API key). Runs on a Hugging Face **Docker Space**; auth and sessions persist in the mounted `/data` bucket so they survive restarts and rebuilds. > ⚠️ Personal use only. `auth.json` contains your ChatGPT access tokens — treat it > like a password. The API is protected by a bearer token; keep your Space's > `API_TOKEN` secret. ## How it works ``` client (OpenAI SDK) │ Authorization: Bearer $API_TOKEN (stream=true -> live SSE tokens) ▼ FastAPI /v1/chat/completions │ JSON-RPC over stdio (one short-lived process per turn): ▼ codex app-server initialize -> thread/start | thread/resume -> turn/start <- item/agentMessage/delta {delta} ← streamed token-by-token <- item/completed / turn/completed / thread/tokenUsage/updated (cwd = /data/sessions//workspace, sandbox = workspace-write, approvals never) │ ▼ /data (bucket) ├─ .codex/auth.json ← your ChatGPT login (you upload this once) ├─ .codex/AGENTS.md ← global safety rules (no delete, etc.) └─ sessions// ← per-session workspace + Codex thread id ``` > **Streaming is real**, not simulated. The App Server emits `item/agentMessage/delta` > events as the model generates, which the API forwards as OpenAI SSE chunks. > (`codex exec` cannot do this — it only returns the whole message at once.) ## One-time setup ### 1. Mount the bucket at `/data` Already done in your Space settings (`sarveshpatel/cli-storage` → `/data`, Read & Write). ### 2. Set the Space secret In **Settings → Variables and secrets**, add a **secret**: | Name | Value | |---|---| | `API_TOKEN` | a long random string (your API key for this service) | Optional **variables**: | Name | Default | Meaning | |---|---|---| | `CODEX_SANDBOX` | `workspace-write` | `read-only` for chat-only, `workspace-write` to let Codex edit files | | `CODEX_MODEL` | (unset) | pin a Codex model, e.g. `gpt-5-codex` | | `CODEX_TIMEOUT` | `180` | max seconds between Codex output events | | `CODEX_MAX_CONCURRENCY` | `4` | max Codex turns running at once (resource cap) | | `CODEX_QUEUE_TIMEOUT` | `90` | seconds a request waits in queue before `429` | ### Concurrency - Requests for **different** sessions run in parallel, up to `CODEX_MAX_CONCURRENCY`. - Requests for the **same** session are **serialized** — two calls never resume the same Codex thread or write the same workspace at once (prevents corruption). - When all slots are busy and the queue wait exceeds `CODEX_QUEUE_TIMEOUT`, the API returns **HTTP 429** so clients can back off and retry. ### 3. Upload your login (`auth.json`) On your **local machine** (with a browser): ```bash npm install -g @openai/codex codex login # completes the ChatGPT OAuth in a browser cat ~/.codex/auth.json # confirm it exists ``` Then upload `~/.codex/auth.json` into the bucket at **`/data/.codex/auth.json`** (via the HF bucket UI or the CLI). The Space auto-refreshes the tokens from there on, so you only do this once (until you explicitly log out). `GET /health` reports `"logged_in": true` once it's in place. ## Usage ```bash curl https://.hf.space/v1/chat/completions \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -H "X-Session-Id: my-project-1" \ -d '{ "model": "codex", "messages": [{"role": "user", "content": "Write a Python function to reverse a linked list."}] }' ``` With the OpenAI Python SDK: ```python from openai import OpenAI client = OpenAI( base_url="https://.hf.space/v1", api_key="", ) resp = client.chat.completions.create( model="codex", messages=[{"role": "user", "content": "Refactor app.py for readability."}], extra_headers={"X-Session-Id": "my-project-1"}, # persistent session ) print(resp.choices[0].message.content) ``` - **Sessions**: pass `X-Session-Id` (or the OpenAI `user` field) to keep a persistent workspace and resume the Codex thread across calls. Omit it for a clean one-shot. - **Streaming**: `stream=true` gives real token-by-token SSE (set `stream_options={"include_usage": true}` to get a final usage chunk). ## Endpoints - `GET /health` — liveness + login status - `GET /v1/models` - `POST /v1/chat/completions` ## Custom domain (Nginx reverse proxy) `ai.antaram.org` fronts the Space via Nginx (config in [`deploy/nginx/ai.antaram.org.conf`](deploy/nginx/ai.antaram.org.conf)): 1. DNS: point an **A record** `ai.antaram.org` → your server's IP. 2. Install the config, then get TLS: `sudo certbot --nginx -d ai.antaram.org`. 3. `sudo nginx -t && sudo systemctl reload nginx`. The config sets the upstream `Host`/SNI to `sarveshpatel-codex.hf.space` (required for HF routing) and turns **buffering off** so SSE streaming stays live. Clients then use `base_url=https://ai.antaram.org/v1`. ## Safety A global `AGENTS.md` (installed into `CODEX_HOME` on boot) forbids file deletion, destructive git, escaping the working directory, and printing credentials. Codex also runs sandboxed (`workspace-write`) and confined to the session's workspace.