---
title: Codex As API
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# Codex-as-API

An **OpenAI-compatible HTTP API** backed by the [OpenAI Codex CLI](https://github.com/openai/codex),
authenticated with your **ChatGPT login** (no API key). Runs on a Hugging Face
**Docker Space**; auth and sessions persist in the mounted `/data` bucket so they
survive restarts and rebuilds.

> ⚠️ Personal use only. `auth.json` contains your ChatGPT access tokens — treat it
> like a password. The API is protected by a bearer token; keep your Space's
> `API_TOKEN` secret.

## How it works

```
client (OpenAI SDK)
   │  Authorization: Bearer $API_TOKEN   (stream=true -> live SSE tokens)
   ▼
FastAPI  /v1/chat/completions
   │  JSON-RPC over stdio (one short-lived process per turn):
   ▼
codex app-server
   initialize -> thread/start | thread/resume -> turn/start
   <- item/agentMessage/delta {delta}   ← streamed token-by-token
   <- item/completed / turn/completed / thread/tokenUsage/updated
   (cwd = /data/sessions/<id>/workspace, sandbox = workspace-write, approvals never)
   │
   ▼
/data  (bucket)
  ├─ .codex/auth.json      ← your ChatGPT login (you upload this once)
  ├─ .codex/AGENTS.md      ← global safety rules (no delete, etc.)
  └─ sessions/<id>/        ← per-session workspace + Codex thread id
```

> **Streaming is real**, not simulated. The App Server emits `item/agentMessage/delta`
> events as the model generates, which the API forwards as OpenAI SSE chunks.
> (`codex exec` cannot do this — it only returns the whole message at once.)

## One-time setup

### 1. Mount the bucket at `/data`
Already done in your Space settings (`sarveshpatel/cli-storage` → `/data`, Read & Write).

### 2. Set the Space secret
In **Settings → Variables and secrets**, add a **secret**:

| Name | Value |
|---|---|
| `API_TOKEN` | a long random string (your API key for this service) |

Optional **variables**:

| Name | Default | Meaning |
|---|---|---|
| `CODEX_SANDBOX` | `workspace-write` | `read-only` for chat-only, `workspace-write` to let Codex edit files |
| `CODEX_MODEL` | (unset) | pin a Codex model, e.g. `gpt-5-codex` |
| `CODEX_TIMEOUT` | `180` | max seconds between Codex output events |
| `CODEX_MAX_CONCURRENCY` | `4` | max Codex turns running at once (resource cap) |
| `CODEX_QUEUE_TIMEOUT` | `90` | seconds a request waits in queue before `429` |

### Concurrency

- Requests for **different** sessions run in parallel, up to `CODEX_MAX_CONCURRENCY`.
- Requests for the **same** session are **serialized** — two calls never resume the
  same Codex thread or write the same workspace at once (prevents corruption).
- When all slots are busy and the queue wait exceeds `CODEX_QUEUE_TIMEOUT`, the API
  returns **HTTP 429** so clients can back off and retry.

### 3. Upload your login (`auth.json`)
On your **local machine** (with a browser):

```bash
npm install -g @openai/codex
codex login                 # completes the ChatGPT OAuth in a browser
cat ~/.codex/auth.json      # confirm it exists
```

Then upload `~/.codex/auth.json` into the bucket at **`/data/.codex/auth.json`**
(via the HF bucket UI or the CLI). The Space auto-refreshes the tokens from there
on, so you only do this once (until you explicitly log out).

`GET /health` reports `"logged_in": true` once it's in place.

## Usage

```bash
curl https://<your-space>.hf.space/v1/chat/completions \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Session-Id: my-project-1" \
  -d '{
        "model": "codex",
        "messages": [{"role": "user", "content": "Write a Python function to reverse a linked list."}]
      }'
```

With the OpenAI Python SDK:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://<your-space>.hf.space/v1",
    api_key="<your API_TOKEN>",
)
resp = client.chat.completions.create(
    model="codex",
    messages=[{"role": "user", "content": "Refactor app.py for readability."}],
    extra_headers={"X-Session-Id": "my-project-1"},  # persistent session
)
print(resp.choices[0].message.content)
```

- **Sessions**: pass `X-Session-Id` (or the OpenAI `user` field) to keep a
  persistent workspace and resume the Codex thread across calls. Omit it for a
  clean one-shot.
- **Streaming**: `stream=true` gives real token-by-token SSE (set
  `stream_options={"include_usage": true}` to get a final usage chunk).

## Endpoints
- `GET /health` — liveness + login status
- `GET /v1/models`
- `POST /v1/chat/completions`

## Custom domain (Nginx reverse proxy)

`ai.antaram.org` fronts the Space via Nginx (config in
[`deploy/nginx/ai.antaram.org.conf`](deploy/nginx/ai.antaram.org.conf)):

1. DNS: point an **A record** `ai.antaram.org` → your server's IP.
2. Install the config, then get TLS: `sudo certbot --nginx -d ai.antaram.org`.
3. `sudo nginx -t && sudo systemctl reload nginx`.

The config sets the upstream `Host`/SNI to `sarveshpatel-codex.hf.space` (required
for HF routing) and turns **buffering off** so SSE streaming stays live. Clients
then use `base_url=https://ai.antaram.org/v1`.

## Safety
A global `AGENTS.md` (installed into `CODEX_HOME` on boot) forbids file deletion,
destructive git, escaping the working directory, and printing credentials. Codex
also runs sandboxed (`workspace-write`) and confined to the session's workspace.