Spaces:

sarveshpatel
/

codex

Running

App Files Files Community

codex / README.md

sarveshpatel

Upload 12 files

19499ad verified 29 days ago

preview code

Raw

History Blame Contribute Delete

5.42 kB

metadata

title: Codex As API
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Codex-as-API

An OpenAI-compatible HTTP API backed by the OpenAI Codex CLI, authenticated with your ChatGPT login (no API key). Runs on a Hugging Face Docker Space; auth and sessions persist in the mounted /data bucket so they survive restarts and rebuilds.

⚠️ Personal use only. auth.json contains your ChatGPT access tokens — treat it like a password. The API is protected by a bearer token; keep your Space's API_TOKEN secret.

How it works

client (OpenAI SDK)
   │  Authorization: Bearer $API_TOKEN   (stream=true -> live SSE tokens)
   ▼
FastAPI  /v1/chat/completions
   │  JSON-RPC over stdio (one short-lived process per turn):
   ▼
codex app-server
   initialize -> thread/start | thread/resume -> turn/start
   <- item/agentMessage/delta {delta}   ← streamed token-by-token
   <- item/completed / turn/completed / thread/tokenUsage/updated
   (cwd = /data/sessions/<id>/workspace, sandbox = workspace-write, approvals never)
   │
   ▼
/data  (bucket)
  ├─ .codex/auth.json      ← your ChatGPT login (you upload this once)
  ├─ .codex/AGENTS.md      ← global safety rules (no delete, etc.)
  └─ sessions/<id>/        ← per-session workspace + Codex thread id

Streaming is real, not simulated. The App Server emits item/agentMessage/delta events as the model generates, which the API forwards as OpenAI SSE chunks. (codex exec cannot do this — it only returns the whole message at once.)

One-time setup

1. Mount the bucket at `/data`

Already done in your Space settings (sarveshpatel/cli-storage → /data, Read & Write).

2. Set the Space secret

In Settings → Variables and secrets, add a secret:

Name	Value
`API_TOKEN`	a long random string (your API key for this service)

Optional variables:

Name	Default	Meaning
`CODEX_SANDBOX`	`workspace-write`	`read-only` for chat-only, `workspace-write` to let Codex edit files
`CODEX_MODEL`	(unset)	pin a Codex model, e.g. `gpt-5-codex`
`CODEX_TIMEOUT`	`180`	max seconds between Codex output events
`CODEX_MAX_CONCURRENCY`	`4`	max Codex turns running at once (resource cap)
`CODEX_QUEUE_TIMEOUT`	`90`	seconds a request waits in queue before `429`

Concurrency

Requests for different sessions run in parallel, up to CODEX_MAX_CONCURRENCY.
Requests for the same session are serialized — two calls never resume the same Codex thread or write the same workspace at once (prevents corruption).
When all slots are busy and the queue wait exceeds CODEX_QUEUE_TIMEOUT, the API returns HTTP 429 so clients can back off and retry.

3. Upload your login (`auth.json`)

On your local machine (with a browser):

npm install -g @openai/codex
codex login                 # completes the ChatGPT OAuth in a browser
cat ~/.codex/auth.json      # confirm it exists

Then upload ~/.codex/auth.json into the bucket at /data/.codex/auth.json (via the HF bucket UI or the CLI). The Space auto-refreshes the tokens from there on, so you only do this once (until you explicitly log out).

GET /health reports "logged_in": true once it's in place.

Usage

curl https://<your-space>.hf.space/v1/chat/completions \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Session-Id: my-project-1" \
  -d '{
        "model": "codex",
        "messages": [{"role": "user", "content": "Write a Python function to reverse a linked list."}]
      }'

With the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-space>.hf.space/v1",
    api_key="<your API_TOKEN>",
)
resp = client.chat.completions.create(
    model="codex",
    messages=[{"role": "user", "content": "Refactor app.py for readability."}],
    extra_headers={"X-Session-Id": "my-project-1"},  # persistent session
)
print(resp.choices[0].message.content)

Sessions: pass X-Session-Id (or the OpenAI user field) to keep a persistent workspace and resume the Codex thread across calls. Omit it for a clean one-shot.
Streaming: stream=true gives real token-by-token SSE (set stream_options={"include_usage": true} to get a final usage chunk).

Endpoints

GET /health — liveness + login status
GET /v1/models
POST /v1/chat/completions

Custom domain (Nginx reverse proxy)

ai.antaram.org fronts the Space via Nginx (config in deploy/nginx/ai.antaram.org.conf):

DNS: point an A record ai.antaram.org → your server's IP.
Install the config, then get TLS: sudo certbot --nginx -d ai.antaram.org.
sudo nginx -t && sudo systemctl reload nginx.

The config sets the upstream Host/SNI to sarveshpatel-codex.hf.space (required for HF routing) and turns buffering off so SSE streaming stays live. Clients then use base_url=https://ai.antaram.org/v1.

Safety

A global AGENTS.md (installed into CODEX_HOME on boot) forbids file deletion, destructive git, escaping the working directory, and printing credentials. Codex also runs sandboxed (workspace-write) and confined to the session's workspace.