# deploy_env_space.md — DriftCall Env HF Space Deployment **Owner:** Person D (Deploy & Story) **Implements:** DESIGN.md §3.3 (Deployed Env Topology), §11.1 (Env Space files), §13 (Deliverables) **Depends on:** `docs/modules/env.md` (FastAPI surface contract), `docs/modules/models.md` (dataclass wire format), `docs/modules/audio.md` (Kokoro + Whisper runtime) **Status:** DRAFT — pending ≥ 2 fresh critic rounds --- ## 1. Purpose `driftcall-env` is the production hosting target for the DriftCall OpenEnv RL environment. It runs on a **free-tier Hugging Face Space (Docker SDK, CPU basic, 2 vCPU / 16 GB RAM)** and is the artifact the hackathon judges exercise via `openenv validate`. The Space exposes a FastAPI application implementing the OpenEnv REST contract (`/reset`, `/step`, `/state`, `/close`) plus a lightweight session cache so concurrent training / evaluation runs can share one deployment without state bleed. The Space is **intentionally CPU-only**. Kokoro TTS (82 M params) and `faster-whisper-small` int8 (~244 M params) both run at roughly real-time on a single modern CPU core; the training topology (DESIGN.md §3.2, §9.4) never loads TTS/ASR because GRPO operates text-in / text-out. This module owns: 1. The Dockerfile (multi-stage build, <2 GB final image, pre-pulled audio weights). 2. `openenv.yaml` metadata (required for `openenv validate`). 3. `requirements.txt` pin set (fastapi, uvicorn, openenv, kokoro, faster-whisper, plus transitive deps). 4. The Space README (Space card) — must satisfy HF Space schema + hackathon submission rules. 5. The session cache implementation sketch delegated to `app.py` (full code in `docs/modules/env.md`; this doc specifies the cache's **deployment constraints** only). 6. The deployment command set (build, push, validate). This doc is a design spec, not an executable. It must contain every decision needed so a single operator can ship the env Space in one 30-minute sitting on Apr 25 morning (DESIGN.md §12.2 pre-onsite hour 16 gate). --- ## 2. Interface ### 2.1 External HTTP surface (served by the Space) The Space exposes the OpenEnv REST surface on **port 7860** (HF Spaces Docker SDK convention — any other port is unreachable). All endpoints accept and return `application/json`. Session identity is carried as a request header so the cache can dispatch to the right env instance. ``` POST /reset → 200 application/json # create or recycle a session, return initial observation POST /step → 200 application/json # advance one turn; returns observation + reward + done GET /state → 200 application/json # read the current DriftCallState (debug / judge inspection) POST /close → 200 application/json # explicitly evict a session GET /healthz → 200 text/plain "ok" # Space healthcheck (HF pings this to mark the Space "running") GET / → 200 text/html # minimal landing page (see §4.4); NOT the agent surface ``` **Headers (all mutating endpoints):** | Header | Required | Notes | |---|---|---| | `Authorization: Bearer ` | yes (see §3.5) | Space secret; judge receives this via submission form | | `X-Session-Id: ` | yes | Opaque string, max 64 chars, `[A-Za-z0-9_-]` only | | `Content-Type: application/json` | yes | UTF-8 | The endpoint contracts (request / response shapes) are owned by `docs/modules/env.md` and serialize the `DriftCallObservation` / `DriftCallState` / `DriftCallAction` dataclasses defined in `docs/modules/models.md`. This doc only pins the **deployment-visible** aspects: port, headers, auth, status codes. > **Cross-doc sync note (2026-04-24):** DESIGN.md §3.3 was updated to match this doc's choice of carrying session identity via the `X-Session-Id` HTTP header (previously documented there as a `session_id` query param). Both docs now agree. No behavior change in this spec — the note is recorded so reviewers don't perceive divergence. ### 2.1.1 Success body shapes (top-level only) Top-level JSON shapes for each success response. Inner dataclass fields (`DriftCallObservation`, `DriftCallAction`, `DriftCallState`) are owned by `docs/modules/env.md` and `docs/modules/models.md` — this section pins only the envelope each endpoint returns. **`POST /reset`** Request: ```json { "config": { "curriculum_stage": 1, "language_weights": { "hi": 0.4, "ta": 0.2, "kn": 0.2, "hinglish": 0.2 }, "audio_boundary_enabled": true }, "seed": 42 } ``` - `config.curriculum_stage`: `1 | 2 | 3` - `config.language_weights`: object, keys are language codes, values sum to 1.0 - `config.audio_boundary_enabled`: bool - `seed`: `int | null` Response: ```json { "observation": { "...DriftCallObservation..." }, "episode_id": "uuid4-string", "max_turns": 12 } ``` **`POST /step`** Request: ```json { "action": { "...DriftCallAction..." } } ``` Response: ```json { "observation": { "...DriftCallObservation..." }, "reward": 0.0, "done": false, "info": { "...opaque..." } } ``` - `reward`: `float | null` (null when reward is deferred to episode end) **`GET /state`** Response: ```json { "state": { "...DriftCallState..." }, "turn": 3 } ``` **`POST /close`** Response: ```json { "closed": true, "final_state": { "...DriftCallState... | null" } } ``` - `final_state`: `object | null` (null if session was already evicted) Deeper field-level detail for `DriftCallObservation`, `DriftCallAction`, and `DriftCallState` lives in `docs/modules/env.md` and `docs/modules/models.md` — do not duplicate it here. ### 2.2 Status code map | Code | Meaning | Triggered by | |---|---|---| | 200 | Success | Normal return | | 400 | Malformed JSON / missing header / invalid action shape | Parsing or dataclass validation failure | | 401 | Missing or bad bearer | §3.5 auth check | | 404 | `X-Session-Id` not in cache (for `/step` / `/state` / `/close`) | Session expired, evicted, or never created | | 409 | Concurrent `/reset` on same session id (see §7, case 1) | Cache key collision during init | | 429 | Max concurrent sessions reached | §3.2 cap hit | | 500 | Unhandled exception inside env step | Bug; logged, stack trace NOT returned in body | | 503 | Model weights not yet loaded on cold-start | §7, case 3 | All error bodies are `{"error": {"code": "", "message": ""}}`. Internal stack traces never cross the wire. ### 2.3 Outbound network The Space makes **zero outbound HTTP calls at runtime**. Kokoro and Whisper weights are baked into the image (§4.2); no HF Hub fetches, no telemetry, no phone-home. This is load-bearing because HF Spaces free CPU tier often has slow / rate-limited egress, and because reproducibility demands an offline image. ### 2.4 Container entrypoint ```dockerfile CMD ["uvicorn", "app:app", \ "--host", "0.0.0.0", \ "--port", "7860", \ "--workers", "2", \ "--timeout-keep-alive", "30", \ "--log-level", "info"] ``` Two uvicorn workers (not four) — CPU basic tier has 2 vCPUs, and Kokoro/Whisper hold the GIL on synthesis/transcription; more workers just contend for the same cores. --- ## 3. Behavior Spec ### 3.1 Session lifecycle A session is an instance of `DriftCallEnvironment` (the class whose full behavior lives in `docs/modules/env.md`). The deployment layer treats each session as an opaque object with `reset()`, `step()`, `state()`, `close()` methods and does not introspect it. ``` client Space (app.py) cache │ POST /reset {seed, config} │ │ │ X-Session-Id: S1 │ │ ├─────────────────────────────────▶│ look up S1 │ │ ├─────────────────────────────▶│ │ │◀───── miss ──────────────────┤ │ │ construct env, bind seed │ │ │ store (env, last_touched) │ │ ├─────────────────────────────▶│ │ │ env.reset(...) → obs │ │◀──────────── 200 obs ───────────┤ │ │ │ │ │ POST /step │ │ ├─────────────────────────────────▶│ lookup S1 → hit │ │ │ touch last_touched = now │ │ │ env.step(...) → obs,r,done │ │◀────────── 200 obs,r ────────────┤ │ ``` ### 3.2 Cache policy (deployment-level invariants) The cache is an in-process dict, keyed by `X-Session-Id`. The implementation lives in `app.py` (`docs/modules/env.md` §3 "session cache"), but this doc locks the policy: | Invariant | Value | Source | |---|---|---| | Max concurrent sessions | **10** | DESIGN.md §3.3 | | TTL (time since `last_touched`) | **3600 s = 1 hr** | DESIGN.md §3.3 | | Storage | In-memory only (no Redis, no disk) | Free tier has no persistent disk writable at runtime; container state resets on Space rebuild | | Eviction policy | LRU when cap reached; stale-TTL sweep every 60 s | §3.3 | | Cross-process sharing | None — each uvicorn worker has its own cache | Acceptable because cache is advisory; clients that get routed to a different worker on re-connect re-issue `/reset` | **Consequence of the "per-worker cache" choice:** a client's session id may land on worker W1 for `/reset` and W2 for `/step` (uvicorn uses round-robin-ish scheduling on the OS socket). In that case `/step` returns 404 and the client must re-`/reset`. This is acceptable for the hackathon because: 1. Training / eval runs keep a persistent HTTP connection via `requests.Session`, which typically pins to one worker for the life of the socket. 2. Judges use one session end-to-end; they hit `/reset` and then replay steps over the same connection. 3. Two-worker degradation is documented in the Space README so judges don't get silently surprised. A future hardening path (not in-scope for this hackathon) is to run `--workers 1` with thread pool, or share the cache via `multiprocessing.Manager`. Both are listed in §9. ### 3.3 Eviction sweep A background asyncio task (started in `app.py` `lifespan`) runs every 60 s: ``` for sid, entry in list(cache.items()): if now() - entry.last_touched > TTL: env = cache.pop(sid).env env.close() # frees whatever audio buffers the env holds ``` LRU eviction on `/reset` when `len(cache) >= 10` drops the oldest `last_touched` entry first; the new session replaces it. ### 3.4 Streaming / keep-alive All endpoint responses are single JSON bodies — **no SSE, no websockets, no chunked streaming**. OpenEnv's client library (`openenv.HTTPEnvClient`) uses blocking `POST` + `json()` and a shared `requests.Session`; anything exotic risks failing `openenv validate`. A `/step` call may take up to ~5 s when an audio pass is involved (Kokoro synth + Whisper transcribe on CPU), so we set `--timeout-keep-alive 30` to keep the socket alive comfortably below the 60 s HF Spaces proxy timeout. ### 3.5 Authentication A single shared-secret bearer guards all mutating endpoints. The token is injected as a HF Space **Secret** named `DRIFTCALL_ENV_TOKEN` and read by `app.py` at import time. `/healthz` is **unauthenticated** (HF Space probes have no bearer). - Token format: 32+ byte URL-safe random (`secrets.token_urlsafe(32)`). - Token rotation: delete the Space secret and push a new one; all in-flight sessions 401 on the next request. - Missing secret at Space boot → container exits 1 (fail-fast). - The token is bundled with the hackathon submission package so judges can exercise `openenv validate` against the live Space. ### 3.6 Determinism The deployment does not itself introduce nondeterminism. `env.py` owns seed handling; the cache is a pass-through. However, **two CPU-bound sources of wall-clock variance** can change observable latency (`tool_results[i].latency_ms` is wall-clock, not simulated): 1. Kokoro synth time on the first call after cold start can be 2–3× steady-state due to JIT / lazy graph compile. 2. Whisper VAD + decode time varies with input length. Neither perturbs reward math — `latency_ms` is informational, never scored. ### 3.7 Logging Structured JSON logs to stdout (HF Spaces captures stdout into the Logs tab). One log line per request, fields: `ts`, `level`, `session_id`, `endpoint`, `status`, `latency_ms`, `turn`, `err_code` (nullable). No PII, no audio bytes, no bearer token. The full `DriftCallAction` body is logged at DEBUG only, disabled by default. --- ## 4. Data structures ### 4.1 `SessionEntry` ```python @dataclass(frozen=True) class SessionEntry: env: DriftCallEnvironment # opaque; see docs/modules/env.md created_at: float # time.monotonic() at /reset last_touched: float # time.monotonic() at every /step|/state reset_count: int # incremented on in-place /reset (§7, case 1) ``` Frozen per project rule (CLAUDE.md §7). `last_touched` updates produce a new `SessionEntry`; the cache dict replaces the old entry. ### 4.2 Dockerfile layout Multi-stage build. Stage 1 installs wheels into a throwaway image; stage 2 copies only the site-packages dir and the app code. Target final image < 2 GB (DESIGN.md Risk 10). ``` # -------- Stage 1: builder -------- FROM python:3.11-slim AS builder ENV PIP_NO_CACHE_DIR=1 \ PIP_DISABLE_PIP_VERSION_CHECK=1 WORKDIR /build RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential git libsndfile1 ffmpeg && \ rm -rf /var/lib/apt/lists/* COPY requirements.txt ./ RUN pip install --prefix=/install -r requirements.txt # Pre-pull model weights so first /reset is fast RUN pip install --prefix=/install huggingface_hub RUN PYTHONPATH=/install/lib/python3.11/site-packages \ python -c "from huggingface_hub import snapshot_download; \ snapshot_download('hexgrad/Kokoro-82M', cache_dir='/weights'); \ snapshot_download('Systran/faster-whisper-small', cache_dir='/weights')" # -------- Stage 2: runtime -------- FROM python:3.11-slim ENV PYTHONUNBUFFERED=1 \ HF_HOME=/root/.cache/huggingface \ TRANSFORMERS_OFFLINE=1 \ HF_HUB_OFFLINE=1 RUN apt-get update && apt-get install -y --no-install-recommends \ libsndfile1 ffmpeg ca-certificates && \ rm -rf /var/lib/apt/lists/* COPY --from=builder /install /usr/local COPY --from=builder /weights /root/.cache/huggingface WORKDIR /app COPY app.py openenv.yaml ./ COPY driftcall/ ./driftcall/ COPY data/ ./data/ EXPOSE 7860 HEALTHCHECK --interval=30s --timeout=5s --start-period=45s \ CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:7860/healthz', timeout=4).read()" || exit 1 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2", "--timeout-keep-alive", "30", "--log-level", "info"] ``` Key decisions: - `python:3.11-slim` base: smallest stable Python base with glibc (alpine would force musl-incompatible wheels for `faster-whisper` / `ctranslate2`). - `ffmpeg` installed because Whisper's audio loader shells out to it for anything non-WAV. - `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` are hard guarantees — if a download is attempted at runtime it raises, never silently fetches and hangs (§5, mode M6). - Weights land under `/root/.cache/huggingface`; that's where both Kokoro and faster-whisper look by default. ### 4.3 `openenv.yaml` ```yaml # openenv.yaml — consumed by `openenv validate` # Schema source: https://github.com/meta-pytorch/OpenEnv schema_version: "1.0" env: id: driftcall version: "0.1.0" display_name: "DriftCall — Indic Voice Concierge under Schema Drift" description: > OpenEnv-compliant RL environment where a voice-first agent must complete Indic consumer concierge tasks while the vendor APIs undergo mid-episode schema, policy, T&C, pricing, and auth drift. Five independent reward components; deterministic seeded drift; Hindi/Tamil/Kannada/Hinglish briefs via Kokoro TTS + faster-whisper ASR. license: apache-2.0 tags: - openenv - rl - voice - indic - schema-drift entrypoint: type: http base_url: "https://-driftcall-env.hf.space" endpoints: reset: "/reset" step: "/step" state: "/state" close: "/close" health: "/healthz" auth: type: bearer secret_env: DRIFTCALL_ENV_TOKEN action_space: ref: "docs/modules/models.md#DriftCallAction" observation_space: ref: "docs/modules/models.md#DriftCallObservation" episode: max_turns: 16 # worst case, stage-3 curriculum (DESIGN.md §4.5) reset_config: seed: { type: int, required: false } curriculum_stage: { type: int, range: [1, 3], required: false } language_weights: { type: object, required: false } reward: shape: scalar range: [-1.0, 1.0] components: ref: "docs/modules/rewards.md" ``` Field names match the OpenEnv v1.0 schema (`entrypoint.type`, `action_space.ref`, etc.). The `ref` pointers resolve to paths inside the repo; `openenv validate` reads them to assert the env is self-describing. ### 4.4 `README.md` (Space card) ``` --- title: DriftCall Env emoji: 🧭 colorFrom: indigo colorTo: pink sdk: docker app_port: 7860 pinned: false short_description: OpenEnv — Indic voice concierge under schema drift. --- ``` Below the YAML header: one-paragraph description, `openenv validate` command, auth note, link to GitHub, link to the demo Space, link to the HF Hub model + dataset. The README is also rendered as the root `/` route's fallback (Docker Spaces serve nothing at `/` otherwise). ### 4.5 `requirements.txt` ``` fastapi==0.115.* uvicorn[standard]==0.32.* pydantic==2.* openenv==0.2.* # or whatever is current at build time; version-pin in PR kokoro==0.9.* faster-whisper==1.1.* ctranslate2==4.5.* # pinned to match faster-whisper's wheel soundfile==0.12.* numpy<2.0 huggingface_hub==0.26.* # only used at build time (snapshot_download) ``` The version set matches `docs/modules/audio.md` §6.1 (upstream consumer) exactly. Pinning is deliberate: the env Space is a reproducibility artifact; judges may rebuild it months from now. --- ## 5. Error modes Every failure path that can cross the HTTP boundary: | ID | When | HTTP | Body `error.code` | Recovery | |---|---|---|---|---| | M1 | No `Authorization` header, or bad bearer | 401 | `unauthorized` | Client fixes token | | M2 | No `X-Session-Id` on `/reset`/`/step`/`/state`/`/close` | 400 | `missing_session_id` | Client adds header | | M3 | `/step`/`/state`/`/close` with unknown session id | 404 | `session_not_found` | Client re-issues `/reset` | | M4 | Session was in cache but TTL expired between request and handler | 404 | `session_expired` | Client re-issues `/reset` | | M5 | `/reset` when cache is full and LRU victim cannot be evicted (all 10 slots freshly `last_touched`) | 429 | `max_sessions` | Client backs off and retries; `Retry-After: 30` header set | | M6 | Kokoro or Whisper model weights missing at startup (image build was broken) | 503 | `model_not_ready` | **Operator** fixes image; client cannot recover | | M7 | Malformed JSON in request body | 400 | `bad_json` | Client fixes payload | | M8 | Action fails pydantic / dataclass validation (wrong `ActionType`, missing `tool_name` for `TOOL_CALL`) | 400 | `invalid_action` | Client fixes action | | M9 | Unhandled exception in `env.step` | 500 | `internal_error` | Logged with request id; client SHOULD NOT retry same action | | M10 | Disk full writing tmp WAV in audio pipeline | 500 | `io_error` | Very rare on HF Spaces (no writable persistent disk, but /tmp is tmpfs and can fill); operator action | | M11 | Request body exceeds 1 MiB | 413 | `payload_too_large` | Client trims (should never happen; actions are small) | | M12 | Concurrent `/reset` on same session id (two requests race) | 409 | `reset_in_progress` | Client serializes resets on its side | Rules: - No stack traces in response bodies. `request_id` (uvicorn's ASGI scope id) is included so operators can grep logs. - All error responses include `Cache-Control: no-store`. - M5 (`429`) is the **only** code that includes `Retry-After`. Others are terminal for the request. --- ## 6. Dependencies ### 6.1 Upstream (consumed by the deployment artifact) - **`docs/modules/env.md`** — defines `DriftCallEnvironment.__init__/reset/step/state/close` and the FastAPI route handlers. This doc references but does not duplicate env behavior. - **`docs/modules/models.md`** — every dataclass crossing the HTTP boundary. - **`docs/modules/audio.md`** — Kokoro + Whisper integration; tells this doc which weights to pre-pull and what CPU footprint to budget. - **`docs/modules/rewards.md`** — cited from `openenv.yaml` `reward.components.ref`. - **DESIGN.md §3.3, §9.1, §9.2, §11.1, §13, Risk 10** — authoritative. ### 6.2 External runtime dependencies (pinned in §4.5) `fastapi`, `uvicorn[standard]`, `openenv`, `kokoro`, `faster-whisper`, `ctranslate2`, `soundfile`, `pydantic`, `numpy<2.0`, `huggingface_hub` (build-time only). ### 6.3 Hugging Face platform dependencies - **Space SDK:** `docker` (NOT `gradio`/`static`). The Docker SDK is the only path that lets us bake weights into the image and pin `uvicorn` workers. - **Space hardware:** `cpu-basic` (free). 2 vCPU, 16 GB RAM, 50 GB ephemeral disk, **no persistent storage**, no GPU. - **Space secrets:** `DRIFTCALL_ENV_TOKEN` (required). - **Space env vars:** none (all config is baked in or via `X-Session-Id`). - **Space region:** default (us-east-1); we do not need region pinning for CPU-basic. ### 6.4 Downstream consumers (who pings this Space) - `training/eval_baseline.py` and `training/eval_final.py` (DESIGN.md §12) — the training-side `HTTPEnvClient`. - `demo/app_gradio.py` — the demo Space (documented in `docs/modules/deploy_demo_space.md`) uses this env over HTTP for live runs. - `openenv validate .` — run against the Space URL as part of the hackathon submission gate. - Hackathon judges — direct HTTP exercise via curl / the `openenv` CLI. ### 6.5 Explicit non-dependencies - **No GPU** at runtime (load-bearing; DESIGN.md §3.3). - **No LLM weights** on the env Space (Gemma 4 lives on the demo Space or on the trainer's local V100). - **No training code** (`training/` is NOT copied into the image; see §4.2 `COPY` list). - **No HF Hub network** at runtime (§2.3, §4.2 offline envs). --- ## 7. Edge cases Six cases the deployment plan must handle correctly. Each is load-bearing for either the 30-min deploy window or the judge's `openenv validate` run. ### 7.1 Concurrent `/reset` on the same session id Client A and client B both POST `/reset` with `X-Session-Id: S1` within the same ~100 ms window. The cache uses a per-session asyncio lock; the second request observes the session mid-construction. **Handling:** - If the first request is still inside `env.__init__`, the second request gets `409 reset_in_progress`. Client is expected to serialize on its side. - If the first request has completed, the second request performs an in-place reset: the old env is `.close()`'d, a new env replaces it, `reset_count += 1`. This matches `gym`'s idempotent reset semantics. - `seed` is honored on the winning reset; the losing (409'd) request's seed is discarded. ### 7.2 `/step` on an evicted session A client idles for 65 minutes between `/step` calls. The sweep task evicts the session at minute 60. The client's next `/step` returns `404 session_expired`. **Handling:** - The client MUST re-issue `/reset` with the same or new seed; it cannot resume mid-episode. This is explicit in the Space README. - No attempt is made to persist episode state across evictions. The free tier has no writable persistent disk, and replaying a seeded episode is cheap (< 1 s on the CPU basic tier). - `env.close()` is called on eviction to release the Kokoro audio buffer (saves ~80 MB resident per lingering session). ### 7.3 Cold-start model-weight load race The Space boots. Uvicorn workers start and each lazily triggers a Kokoro + Whisper load on the first audio-involving `/step`. Whisper's CTranslate2 model load takes ~3–5 s; Kokoro takes ~2 s. A `/step` arriving before load completes can block up to ~8 s. **Handling:** - `app.py`'s `lifespan` startup hook performs an **eager** load of both models during container boot. This turns cold-start latency into Space "Starting…" time (which HF surfaces via the spinner) instead of a hung client request. - If eager load fails (bad weights, disk corruption), the container exits 1 and HF's Space restart loop catches it — operator sees the Space status as "Error" instead of silently hanging. - The first `/healthz` probe is expected at +30 s (`--start-period=45s` on the HEALTHCHECK gives us a comfortable margin). ### 7.4 Kokoro voice pack missing for a language Kokoro is loaded at startup but an individual voice pack for `language="kn"` (Kannada) is missing from the snapshot cache due to a partial download. **Handling:** - `audio/tts_kokoro.py` (per `docs/modules/audio.md` §5) raises `VoicePackMissingError`. The env treats this as a SPEAK-action failure and returns a `tool_results` entry with `status="schema_error"` and `response={"error": "voice_unavailable"}`. The episode continues; reward R4 (format compliance) may drop but R1/R2 are unaffected. - The image build in §4.2 pre-pulls the **full** Kokoro snapshot (`snapshot_download('hexgrad/Kokoro-82M')`), which includes all voice packs. If a voice pack is missing at runtime, the image is broken — operator fixes the Dockerfile and rebuilds. ### 7.5 HTTP timeout mid-`/step` A `/step` takes 35 s because Whisper is processing a long utterance and the Space is also handling three concurrent episodes. The HF Space edge proxy has a 60 s idle timeout — we stay under it but only barely. **Handling:** - `--timeout-keep-alive 30` means uvicorn holds the connection; the HTTP client's TCP timeout should be ≥ 60 s (default `requests.Session` timeout is infinite — safe). - Inside `env.step`, audio ops have **hard caps** owned by `audio/*.py`: Whisper `max_duration_s=30`, Kokoro synth implicitly bounded by text length. The env cannot produce a `/step` longer than ~40 s at p99. - If a `/step` does exceed 60 s (e.g., 10 concurrent sessions all doing audio at once on 2 vCPU), the proxy closes the socket and the client sees `ConnectionError`. Client re-issues; the session is still in the cache and the step was effectively a no-op on the server side because responses are atomic-on-return (state is only mutated after all work succeeds — see `docs/modules/env.md` §3 transactional step semantics). ### 7.6 Out-of-memory during concurrent audio Five sessions simultaneously run audio-heavy `/step`s. Each Whisper int8 model takes ~250 MB RAM; Kokoro takes ~350 MB. Naive loading would hit `5 × 600 MB = 3 GB` plus Python overhead — well within the 16 GB tier budget, but the Space can still OOM if the image unexpectedly loads fp32 weights. **Handling:** - Whisper is forced to `compute_type="int8"` and Kokoro to fp32 (its default is already smallest viable). `audio/*.py` asserts these at load time. - The models are **singletons** shared across sessions (they are stateless w.r.t. concurrent calls; CTranslate2 releases the GIL during decode). Memory budget is therefore `~600 MB total`, not per-session. - If an OOM happens, the container is killed by the HF Space OOM-killer and auto-restarts. We lose all in-flight sessions; clients re-`/reset`. The eviction sweep and TTL ensure no permanently-dead sessions pile up. --- ## 8. Examples ### 8.1 End-to-end `/reset` → `/step` flow via curl ```bash # Assume DRIFTCALL_ENV_TOKEN is set locally for scripting convenience. TOKEN="${DRIFTCALL_ENV_TOKEN:?export DRIFTCALL_ENV_TOKEN first}" BASE="https://-driftcall-env.hf.space" # 1. Reset with seed 42, stage 2 curriculum. curl -sS -X POST "$BASE/reset" \ -H "Authorization: Bearer $TOKEN" \ -H "X-Session-Id: demo-001" \ -H "Content-Type: application/json" \ -d '{"seed": 42, "config": {"curriculum_stage": 2}}' # → 200 {"observation": {"turn": 0, "goal": {...}, "last_transcript": "Bhai Friday ko...", ...}} # 2. Step: call airline.search. curl -sS -X POST "$BASE/step" \ -H "Authorization: Bearer $TOKEN" \ -H "X-Session-Id: demo-001" \ -H "Content-Type: application/json" \ -d '{ "action": { "action_type": "tool_call", "tool_name": "airline.search", "tool_args": {"origin": "DEL", "destination": "BLR", "date": "2026-04-26"} } }' # → 200 {"observation": {...}, "reward": 0.0, "done": false, "info": {"drift_fired": []}} # 3. Inspect state (judge-only, optional). curl -sS "$BASE/state" \ -H "Authorization: Bearer $TOKEN" \ -H "X-Session-Id: demo-001" # → 200 {"episode_id": "...", "turn": 1, "max_turns": 12, "drift_schedule": [...], ...} # 4. Close. curl -sS -X POST "$BASE/close" \ -H "Authorization: Bearer $TOKEN" \ -H "X-Session-Id: demo-001" # → 200 {"closed": true} ``` ### 8.2 Container build + smoke + push ```bash # Local build (from DRIFTCALL/ repo root) docker build -t driftcall-env:local . # Local smoke (bind a dummy secret) docker run --rm -p 7860:7860 \ -e DRIFTCALL_ENV_TOKEN=dev-local-token \ driftcall-env:local # In another shell: curl -sS http://localhost:7860/healthz # → "ok" curl -sS -X POST http://localhost:7860/reset \ -H "Authorization: Bearer dev-local-token" \ -H "X-Session-Id: smoke" \ -H "Content-Type: application/json" -d '{}' # → 200 with initial observation # Push to HF Space via the new `hf` CLI. # The team-lead brief flags that `huggingface-cli` is deprecated; we migrate # DriftCall/CLAUDE.md §6 row "HF push env" to `hf upload` in a follow-up PR. hf upload /driftcall-env . --repo-type=space # (Requires `pip install huggingface_hub>=0.25` and `hf auth login` completed.) ``` ### 8.3 `openenv validate` against the live Space ```bash # Against local container: openenv validate http://localhost:7860 \ --auth-bearer dev-local-token # Against deployed Space: openenv validate https://-driftcall-env.hf.space \ --auth-bearer "$DRIFTCALL_ENV_TOKEN" # Expected output: # ✓ openenv.yaml parses, schema v1.0 # ✓ GET /healthz → 200 ok # ✓ POST /reset → 200, observation matches observation_space.ref # ✓ POST /step → 200, observation + reward + done # ✓ GET /state → 200, DriftCallState matches schema # ✓ POST /close → 200 # ✓ 6 endpoints validated, 0 errors ``` Running this before submission is the DESIGN.md §12.2 hour-16 gate. If it fails, we fix before moving to training. --- ## 9. Open questions 1. **OpenEnv schema version pin:** `openenv==0.2.*` in §4.5 is a placeholder. Confirm the exact current release on the hackathon kickoff morning (Apr 25) and tighten the pin; `openenv validate` schema fields may have shifted between 0.1 and 0.2. 2. **Per-worker cache divergence:** documented in §3.2 as acceptable. Re-evaluate after local load-testing — if even training hits the cross-worker 404 path > 1% of the time, switch to `--workers 1` with a bigger thread pool. 3. **HF Space CPU cold-start time:** the free CPU basic tier can sleep on idle and take 60–120 s to wake. This doc assumes Space is "always-on" because we exercise it during development; if the judge hits a cold Space, the first `/reset` may appear hung. Risk-register coverage owned by `docs/modules/risk_book.md`. 4. **`DRIFTCALL_ENV_TOKEN` rotation during the hackathon:** if the token leaks mid-judging, rotating it 401s the judge mid-run. Do we need a two-token grace period? Likely no (hackathon is 48 h and we trust submission channels), but flag for Person D's risk book. 5. **CLAUDE.md §6 `hf upload` migration:** the hackathon briefing flags `huggingface-cli` as deprecated. Update `DRIFTCALL/CLAUDE.md` §6 rows ("HF push env", "HF push dataset") to `hf upload ... --repo-type=...` in a separate small PR so this design doc doesn't diverge from the command catalogue. Own: Person D. 6. **Image-size margin vs §1.1 Whisper upgrade path:** if `docs/modules/audio.md` §1.1's WER bail-out triggers and we swap to `faster-whisper-medium`, final image grows from ~1.2 GB to ~1.8 GB. Still under the 2 GB Risk-10 bound but with less slack. Re-check image size after any audio-weights change. 7. **`/state` access control:** should `/state` require the same bearer as mutating endpoints, or should we expose a narrower "episode summary" for judges without the full vendor-states dump? Current design keeps full state behind the bearer; revisit if leaderboard ops ask for a public read-only pane.