driftcall / docs /modules /deploy_env_space.md
saumilyajj's picture
Upload folder using huggingface_hub
f2df60e verified

deploy_env_space.md β€” DriftCall Env HF Space Deployment

Owner: Person D (Deploy & Story) Implements: DESIGN.md Β§3.3 (Deployed Env Topology), Β§11.1 (Env Space files), Β§13 (Deliverables) Depends on: docs/modules/env.md (FastAPI surface contract), docs/modules/models.md (dataclass wire format), docs/modules/audio.md (Kokoro + Whisper runtime) Status: DRAFT β€” pending β‰₯ 2 fresh critic rounds


1. Purpose

driftcall-env is the production hosting target for the DriftCall OpenEnv RL environment. It runs on a free-tier Hugging Face Space (Docker SDK, CPU basic, 2 vCPU / 16 GB RAM) and is the artifact the hackathon judges exercise via openenv validate. The Space exposes a FastAPI application implementing the OpenEnv REST contract (/reset, /step, /state, /close) plus a lightweight session cache so concurrent training / evaluation runs can share one deployment without state bleed.

The Space is intentionally CPU-only. Kokoro TTS (82 M params) and faster-whisper-small int8 (~244 M params) both run at roughly real-time on a single modern CPU core; the training topology (DESIGN.md Β§3.2, Β§9.4) never loads TTS/ASR because GRPO operates text-in / text-out. This module owns:

  1. The Dockerfile (multi-stage build, <2 GB final image, pre-pulled audio weights).
  2. openenv.yaml metadata (required for openenv validate).
  3. requirements.txt pin set (fastapi, uvicorn, openenv, kokoro, faster-whisper, plus transitive deps).
  4. The Space README (Space card) β€” must satisfy HF Space schema + hackathon submission rules.
  5. The session cache implementation sketch delegated to app.py (full code in docs/modules/env.md; this doc specifies the cache's deployment constraints only).
  6. The deployment command set (build, push, validate).

This doc is a design spec, not an executable. It must contain every decision needed so a single operator can ship the env Space in one 30-minute sitting on Apr 25 morning (DESIGN.md Β§12.2 pre-onsite hour 16 gate).


2. Interface

2.1 External HTTP surface (served by the Space)

The Space exposes the OpenEnv REST surface on port 7860 (HF Spaces Docker SDK convention β€” any other port is unreachable). All endpoints accept and return application/json. Session identity is carried as a request header so the cache can dispatch to the right env instance.

POST   /reset           β†’ 200 application/json   # create or recycle a session, return initial observation
POST   /step            β†’ 200 application/json   # advance one turn; returns observation + reward + done
GET    /state           β†’ 200 application/json   # read the current DriftCallState (debug / judge inspection)
POST   /close           β†’ 200 application/json   # explicitly evict a session
GET    /healthz         β†’ 200 text/plain "ok"    # Space healthcheck (HF pings this to mark the Space "running")
GET    /                β†’ 200 text/html          # minimal landing page (see Β§4.4); NOT the agent surface

Headers (all mutating endpoints):

Header Required Notes
Authorization: Bearer <DRIFTCALL_ENV_TOKEN> yes (see Β§3.5) Space secret; judge receives this via submission form
X-Session-Id: <uuid4-or-caller-chosen> yes Opaque string, max 64 chars, [A-Za-z0-9_-] only
Content-Type: application/json yes UTF-8

The endpoint contracts (request / response shapes) are owned by docs/modules/env.md and serialize the DriftCallObservation / DriftCallState / DriftCallAction dataclasses defined in docs/modules/models.md. This doc only pins the deployment-visible aspects: port, headers, auth, status codes.

Cross-doc sync note (2026-04-24): DESIGN.md Β§3.3 was updated to match this doc's choice of carrying session identity via the X-Session-Id HTTP header (previously documented there as a session_id query param). Both docs now agree. No behavior change in this spec β€” the note is recorded so reviewers don't perceive divergence.

2.1.1 Success body shapes (top-level only)

Top-level JSON shapes for each success response. Inner dataclass fields (DriftCallObservation, DriftCallAction, DriftCallState) are owned by docs/modules/env.md and docs/modules/models.md β€” this section pins only the envelope each endpoint returns.

POST /reset

Request:

{
  "config": {
    "curriculum_stage": 1,
    "language_weights": { "hi": 0.4, "ta": 0.2, "kn": 0.2, "hinglish": 0.2 },
    "audio_boundary_enabled": true
  },
  "seed": 42
}
  • config.curriculum_stage: 1 | 2 | 3
  • config.language_weights: object, keys are language codes, values sum to 1.0
  • config.audio_boundary_enabled: bool
  • seed: int | null

Response:

{
  "observation": { "...DriftCallObservation..." },
  "episode_id": "uuid4-string",
  "max_turns": 12
}

POST /step

Request:

{ "action": { "...DriftCallAction..." } }

Response:

{
  "observation": { "...DriftCallObservation..." },
  "reward": 0.0,
  "done": false,
  "info": { "...opaque..." }
}
  • reward: float | null (null when reward is deferred to episode end)

GET /state

Response:

{
  "state": { "...DriftCallState..." },
  "turn": 3
}

POST /close

Response:

{
  "closed": true,
  "final_state": { "...DriftCallState... | null" }
}
  • final_state: object | null (null if session was already evicted)

Deeper field-level detail for DriftCallObservation, DriftCallAction, and DriftCallState lives in docs/modules/env.md and docs/modules/models.md β€” do not duplicate it here.

2.2 Status code map

Code Meaning Triggered by
200 Success Normal return
400 Malformed JSON / missing header / invalid action shape Parsing or dataclass validation failure
401 Missing or bad bearer Β§3.5 auth check
404 X-Session-Id not in cache (for /step / /state / /close) Session expired, evicted, or never created
409 Concurrent /reset on same session id (see Β§7, case 1) Cache key collision during init
429 Max concurrent sessions reached Β§3.2 cap hit
500 Unhandled exception inside env step Bug; logged, stack trace NOT returned in body
503 Model weights not yet loaded on cold-start Β§7, case 3

All error bodies are {"error": {"code": "<slug>", "message": "<user-safe string>"}}. Internal stack traces never cross the wire.

2.3 Outbound network

The Space makes zero outbound HTTP calls at runtime. Kokoro and Whisper weights are baked into the image (Β§4.2); no HF Hub fetches, no telemetry, no phone-home. This is load-bearing because HF Spaces free CPU tier often has slow / rate-limited egress, and because reproducibility demands an offline image.

2.4 Container entrypoint

CMD ["uvicorn", "app:app", \
     "--host", "0.0.0.0", \
     "--port", "7860", \
     "--workers", "2", \
     "--timeout-keep-alive", "30", \
     "--log-level", "info"]

Two uvicorn workers (not four) β€” CPU basic tier has 2 vCPUs, and Kokoro/Whisper hold the GIL on synthesis/transcription; more workers just contend for the same cores.


3. Behavior Spec

3.1 Session lifecycle

A session is an instance of DriftCallEnvironment (the class whose full behavior lives in docs/modules/env.md). The deployment layer treats each session as an opaque object with reset(), step(), state(), close() methods and does not introspect it.

client                              Space (app.py)                cache
   β”‚  POST /reset {seed, config}      β”‚                              β”‚
   β”‚  X-Session-Id: S1                β”‚                              β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚  look up S1                  β”‚
   β”‚                                  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚
   β”‚                                  │◀───── miss ───────────────────
   β”‚                                  β”‚  construct env, bind seed    β”‚
   β”‚                                  β”‚  store (env, last_touched)   β”‚
   β”‚                                  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚
   β”‚                                  β”‚  env.reset(...) β†’ obs        β”‚
   │◀────────────  200 obs ────────────                              β”‚
   β”‚                                  β”‚                              β”‚
   β”‚  POST /step                      β”‚                              β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚  lookup S1 β†’ hit             β”‚
   β”‚                                  β”‚  touch last_touched = now    β”‚
   β”‚                                  β”‚  env.step(...) β†’ obs,r,done  β”‚
   │◀────────── 200 obs,r ─────────────                              β”‚

3.2 Cache policy (deployment-level invariants)

The cache is an in-process dict, keyed by X-Session-Id. The implementation lives in app.py (docs/modules/env.md Β§3 "session cache"), but this doc locks the policy:

Invariant Value Source
Max concurrent sessions 10 DESIGN.md Β§3.3
TTL (time since last_touched) 3600 s = 1 hr DESIGN.md Β§3.3
Storage In-memory only (no Redis, no disk) Free tier has no persistent disk writable at runtime; container state resets on Space rebuild
Eviction policy LRU when cap reached; stale-TTL sweep every 60 s Β§3.3
Cross-process sharing None β€” each uvicorn worker has its own cache Acceptable because cache is advisory; clients that get routed to a different worker on re-connect re-issue /reset

Consequence of the "per-worker cache" choice: a client's session id may land on worker W1 for /reset and W2 for /step (uvicorn uses round-robin-ish scheduling on the OS socket). In that case /step returns 404 and the client must re-/reset. This is acceptable for the hackathon because:

  1. Training / eval runs keep a persistent HTTP connection via requests.Session, which typically pins to one worker for the life of the socket.
  2. Judges use one session end-to-end; they hit /reset and then replay steps over the same connection.
  3. Two-worker degradation is documented in the Space README so judges don't get silently surprised.

A future hardening path (not in-scope for this hackathon) is to run --workers 1 with thread pool, or share the cache via multiprocessing.Manager. Both are listed in Β§9.

3.3 Eviction sweep

A background asyncio task (started in app.py lifespan) runs every 60 s:

for sid, entry in list(cache.items()):
    if now() - entry.last_touched > TTL:
        env = cache.pop(sid).env
        env.close()  # frees whatever audio buffers the env holds

LRU eviction on /reset when len(cache) >= 10 drops the oldest last_touched entry first; the new session replaces it.

3.4 Streaming / keep-alive

All endpoint responses are single JSON bodies β€” no SSE, no websockets, no chunked streaming. OpenEnv's client library (openenv.HTTPEnvClient) uses blocking POST + json() and a shared requests.Session; anything exotic risks failing openenv validate. A /step call may take up to ~5 s when an audio pass is involved (Kokoro synth + Whisper transcribe on CPU), so we set --timeout-keep-alive 30 to keep the socket alive comfortably below the 60 s HF Spaces proxy timeout.

3.5 Authentication

A single shared-secret bearer guards all mutating endpoints. The token is injected as a HF Space Secret named DRIFTCALL_ENV_TOKEN and read by app.py at import time. /healthz is unauthenticated (HF Space probes have no bearer).

  • Token format: 32+ byte URL-safe random (secrets.token_urlsafe(32)).
  • Token rotation: delete the Space secret and push a new one; all in-flight sessions 401 on the next request.
  • Missing secret at Space boot β†’ container exits 1 (fail-fast).
  • The token is bundled with the hackathon submission package so judges can exercise openenv validate against the live Space.

3.6 Determinism

The deployment does not itself introduce nondeterminism. env.py owns seed handling; the cache is a pass-through. However, two CPU-bound sources of wall-clock variance can change observable latency (tool_results[i].latency_ms is wall-clock, not simulated):

  1. Kokoro synth time on the first call after cold start can be 2–3Γ— steady-state due to JIT / lazy graph compile.
  2. Whisper VAD + decode time varies with input length.

Neither perturbs reward math β€” latency_ms is informational, never scored.

3.7 Logging

Structured JSON logs to stdout (HF Spaces captures stdout into the Logs tab). One log line per request, fields: ts, level, session_id, endpoint, status, latency_ms, turn, err_code (nullable). No PII, no audio bytes, no bearer token. The full DriftCallAction body is logged at DEBUG only, disabled by default.


4. Data structures

4.1 SessionEntry

@dataclass(frozen=True)
class SessionEntry:
    env: DriftCallEnvironment        # opaque; see docs/modules/env.md
    created_at: float                # time.monotonic() at /reset
    last_touched: float               # time.monotonic() at every /step|/state
    reset_count: int                 # incremented on in-place /reset (Β§7, case 1)

Frozen per project rule (CLAUDE.md Β§7). last_touched updates produce a new SessionEntry; the cache dict replaces the old entry.

4.2 Dockerfile layout

Multi-stage build. Stage 1 installs wheels into a throwaway image; stage 2 copies only the site-packages dir and the app code. Target final image < 2 GB (DESIGN.md Risk 10).

# -------- Stage 1: builder --------
FROM python:3.11-slim AS builder
ENV PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential git libsndfile1 ffmpeg && \
    rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./
RUN pip install --prefix=/install -r requirements.txt

# Pre-pull model weights so first /reset is fast
RUN pip install --prefix=/install huggingface_hub
RUN PYTHONPATH=/install/lib/python3.11/site-packages \
    python -c "from huggingface_hub import snapshot_download; \
               snapshot_download('hexgrad/Kokoro-82M', cache_dir='/weights'); \
               snapshot_download('Systran/faster-whisper-small', cache_dir='/weights')"

# -------- Stage 2: runtime --------
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1 \
    HF_HOME=/root/.cache/huggingface \
    TRANSFORMERS_OFFLINE=1 \
    HF_HUB_OFFLINE=1
RUN apt-get update && apt-get install -y --no-install-recommends \
        libsndfile1 ffmpeg ca-certificates && \
    rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
COPY --from=builder /weights /root/.cache/huggingface
WORKDIR /app
COPY app.py openenv.yaml ./
COPY driftcall/ ./driftcall/
COPY data/ ./data/
EXPOSE 7860
HEALTHCHECK --interval=30s --timeout=5s --start-period=45s \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:7860/healthz', timeout=4).read()" || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2", "--timeout-keep-alive", "30", "--log-level", "info"]

Key decisions:

  • python:3.11-slim base: smallest stable Python base with glibc (alpine would force musl-incompatible wheels for faster-whisper / ctranslate2).
  • ffmpeg installed because Whisper's audio loader shells out to it for anything non-WAV.
  • HF_HUB_OFFLINE=1 + TRANSFORMERS_OFFLINE=1 are hard guarantees β€” if a download is attempted at runtime it raises, never silently fetches and hangs (Β§5, mode M6).
  • Weights land under /root/.cache/huggingface; that's where both Kokoro and faster-whisper look by default.

4.3 openenv.yaml

# openenv.yaml β€” consumed by `openenv validate`
# Schema source: https://github.com/meta-pytorch/OpenEnv
schema_version: "1.0"
env:
  id: driftcall
  version: "0.1.0"
  display_name: "DriftCall β€” Indic Voice Concierge under Schema Drift"
  description: >
    OpenEnv-compliant RL environment where a voice-first agent must complete
    Indic consumer concierge tasks while the vendor APIs undergo mid-episode
    schema, policy, T&C, pricing, and auth drift. Five independent reward
    components; deterministic seeded drift; Hindi/Tamil/Kannada/Hinglish
    briefs via Kokoro TTS + faster-whisper ASR.
  license: apache-2.0
  tags:
    - openenv
    - rl
    - voice
    - indic
    - schema-drift
  entrypoint:
    type: http
    base_url: "https://<team>-driftcall-env.hf.space"
    endpoints:
      reset: "/reset"
      step: "/step"
      state: "/state"
      close: "/close"
      health: "/healthz"
    auth:
      type: bearer
      secret_env: DRIFTCALL_ENV_TOKEN
  action_space:
    ref: "docs/modules/models.md#DriftCallAction"
  observation_space:
    ref: "docs/modules/models.md#DriftCallObservation"
  episode:
    max_turns: 16        # worst case, stage-3 curriculum (DESIGN.md Β§4.5)
    reset_config:
      seed: { type: int, required: false }
      curriculum_stage: { type: int, range: [1, 3], required: false }
      language_weights: { type: object, required: false }
  reward:
    shape: scalar
    range: [-1.0, 1.0]
    components:
      ref: "docs/modules/rewards.md"

Field names match the OpenEnv v1.0 schema (entrypoint.type, action_space.ref, etc.). The ref pointers resolve to paths inside the repo; openenv validate reads them to assert the env is self-describing.

4.4 README.md (Space card)

---
title: DriftCall Env
emoji: 🧭
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
short_description: OpenEnv β€” Indic voice concierge under schema drift.
---

Below the YAML header: one-paragraph description, openenv validate command, auth note, link to GitHub, link to the demo Space, link to the HF Hub model + dataset. The README is also rendered as the root / route's fallback (Docker Spaces serve nothing at / otherwise).

4.5 requirements.txt

fastapi==0.115.*
uvicorn[standard]==0.32.*
pydantic==2.*
openenv==0.2.*            # or whatever is current at build time; version-pin in PR
kokoro==0.9.*
faster-whisper==1.1.*
ctranslate2==4.5.*        # pinned to match faster-whisper's wheel
soundfile==0.12.*
numpy<2.0
huggingface_hub==0.26.*   # only used at build time (snapshot_download)

The version set matches docs/modules/audio.md Β§6.1 (upstream consumer) exactly. Pinning is deliberate: the env Space is a reproducibility artifact; judges may rebuild it months from now.


5. Error modes

Every failure path that can cross the HTTP boundary:

ID When HTTP Body error.code Recovery
M1 No Authorization header, or bad bearer 401 unauthorized Client fixes token
M2 No X-Session-Id on /reset//step//state//close 400 missing_session_id Client adds header
M3 /step//state//close with unknown session id 404 session_not_found Client re-issues /reset
M4 Session was in cache but TTL expired between request and handler 404 session_expired Client re-issues /reset
M5 /reset when cache is full and LRU victim cannot be evicted (all 10 slots freshly last_touched) 429 max_sessions Client backs off and retries; Retry-After: 30 header set
M6 Kokoro or Whisper model weights missing at startup (image build was broken) 503 model_not_ready Operator fixes image; client cannot recover
M7 Malformed JSON in request body 400 bad_json Client fixes payload
M8 Action fails pydantic / dataclass validation (wrong ActionType, missing tool_name for TOOL_CALL) 400 invalid_action Client fixes action
M9 Unhandled exception in env.step 500 internal_error Logged with request id; client SHOULD NOT retry same action
M10 Disk full writing tmp WAV in audio pipeline 500 io_error Very rare on HF Spaces (no writable persistent disk, but /tmp is tmpfs and can fill); operator action
M11 Request body exceeds 1 MiB 413 payload_too_large Client trims (should never happen; actions are small)
M12 Concurrent /reset on same session id (two requests race) 409 reset_in_progress Client serializes resets on its side

Rules:

  • No stack traces in response bodies. request_id (uvicorn's ASGI scope id) is included so operators can grep logs.
  • All error responses include Cache-Control: no-store.
  • M5 (429) is the only code that includes Retry-After. Others are terminal for the request.

6. Dependencies

6.1 Upstream (consumed by the deployment artifact)

  • docs/modules/env.md β€” defines DriftCallEnvironment.__init__/reset/step/state/close and the FastAPI route handlers. This doc references but does not duplicate env behavior.
  • docs/modules/models.md β€” every dataclass crossing the HTTP boundary.
  • docs/modules/audio.md β€” Kokoro + Whisper integration; tells this doc which weights to pre-pull and what CPU footprint to budget.
  • docs/modules/rewards.md β€” cited from openenv.yaml reward.components.ref.
  • DESIGN.md Β§3.3, Β§9.1, Β§9.2, Β§11.1, Β§13, Risk 10 β€” authoritative.

6.2 External runtime dependencies (pinned in Β§4.5)

fastapi, uvicorn[standard], openenv, kokoro, faster-whisper, ctranslate2, soundfile, pydantic, numpy<2.0, huggingface_hub (build-time only).

6.3 Hugging Face platform dependencies

  • Space SDK: docker (NOT gradio/static). The Docker SDK is the only path that lets us bake weights into the image and pin uvicorn workers.
  • Space hardware: cpu-basic (free). 2 vCPU, 16 GB RAM, 50 GB ephemeral disk, no persistent storage, no GPU.
  • Space secrets: DRIFTCALL_ENV_TOKEN (required).
  • Space env vars: none (all config is baked in or via X-Session-Id).
  • Space region: default (us-east-1); we do not need region pinning for CPU-basic.

6.4 Downstream consumers (who pings this Space)

  • training/eval_baseline.py and training/eval_final.py (DESIGN.md Β§12) β€” the training-side HTTPEnvClient.
  • demo/app_gradio.py β€” the demo Space (documented in docs/modules/deploy_demo_space.md) uses this env over HTTP for live runs.
  • openenv validate . β€” run against the Space URL as part of the hackathon submission gate.
  • Hackathon judges β€” direct HTTP exercise via curl / the openenv CLI.

6.5 Explicit non-dependencies

  • No GPU at runtime (load-bearing; DESIGN.md Β§3.3).
  • No LLM weights on the env Space (Gemma 4 lives on the demo Space or on the trainer's local V100).
  • No training code (training/ is NOT copied into the image; see Β§4.2 COPY list).
  • No HF Hub network at runtime (Β§2.3, Β§4.2 offline envs).

7. Edge cases

Six cases the deployment plan must handle correctly. Each is load-bearing for either the 30-min deploy window or the judge's openenv validate run.

7.1 Concurrent /reset on the same session id

Client A and client B both POST /reset with X-Session-Id: S1 within the same ~100 ms window. The cache uses a per-session asyncio lock; the second request observes the session mid-construction.

Handling:

  • If the first request is still inside env.__init__, the second request gets 409 reset_in_progress. Client is expected to serialize on its side.
  • If the first request has completed, the second request performs an in-place reset: the old env is .close()'d, a new env replaces it, reset_count += 1. This matches gym's idempotent reset semantics.
  • seed is honored on the winning reset; the losing (409'd) request's seed is discarded.

7.2 /step on an evicted session

A client idles for 65 minutes between /step calls. The sweep task evicts the session at minute 60. The client's next /step returns 404 session_expired.

Handling:

  • The client MUST re-issue /reset with the same or new seed; it cannot resume mid-episode. This is explicit in the Space README.
  • No attempt is made to persist episode state across evictions. The free tier has no writable persistent disk, and replaying a seeded episode is cheap (< 1 s on the CPU basic tier).
  • env.close() is called on eviction to release the Kokoro audio buffer (saves ~80 MB resident per lingering session).

7.3 Cold-start model-weight load race

The Space boots. Uvicorn workers start and each lazily triggers a Kokoro + Whisper load on the first audio-involving /step. Whisper's CTranslate2 model load takes ~3–5 s; Kokoro takes ~2 s. A /step arriving before load completes can block up to ~8 s.

Handling:

  • app.py's lifespan startup hook performs an eager load of both models during container boot. This turns cold-start latency into Space "Starting…" time (which HF surfaces via the spinner) instead of a hung client request.
  • If eager load fails (bad weights, disk corruption), the container exits 1 and HF's Space restart loop catches it β€” operator sees the Space status as "Error" instead of silently hanging.
  • The first /healthz probe is expected at +30 s (--start-period=45s on the HEALTHCHECK gives us a comfortable margin).

7.4 Kokoro voice pack missing for a language

Kokoro is loaded at startup but an individual voice pack for language="kn" (Kannada) is missing from the snapshot cache due to a partial download.

Handling:

  • audio/tts_kokoro.py (per docs/modules/audio.md Β§5) raises VoicePackMissingError. The env treats this as a SPEAK-action failure and returns a tool_results entry with status="schema_error" and response={"error": "voice_unavailable"}. The episode continues; reward R4 (format compliance) may drop but R1/R2 are unaffected.
  • The image build in Β§4.2 pre-pulls the full Kokoro snapshot (snapshot_download('hexgrad/Kokoro-82M')), which includes all voice packs. If a voice pack is missing at runtime, the image is broken β€” operator fixes the Dockerfile and rebuilds.

7.5 HTTP timeout mid-/step

A /step takes 35 s because Whisper is processing a long utterance and the Space is also handling three concurrent episodes. The HF Space edge proxy has a 60 s idle timeout β€” we stay under it but only barely.

Handling:

  • --timeout-keep-alive 30 means uvicorn holds the connection; the HTTP client's TCP timeout should be β‰₯ 60 s (default requests.Session timeout is infinite β€” safe).
  • Inside env.step, audio ops have hard caps owned by audio/*.py: Whisper max_duration_s=30, Kokoro synth implicitly bounded by text length. The env cannot produce a /step longer than ~40 s at p99.
  • If a /step does exceed 60 s (e.g., 10 concurrent sessions all doing audio at once on 2 vCPU), the proxy closes the socket and the client sees ConnectionError. Client re-issues; the session is still in the cache and the step was effectively a no-op on the server side because responses are atomic-on-return (state is only mutated after all work succeeds β€” see docs/modules/env.md Β§3 transactional step semantics).

7.6 Out-of-memory during concurrent audio

Five sessions simultaneously run audio-heavy /steps. Each Whisper int8 model takes ~250 MB RAM; Kokoro takes ~350 MB. Naive loading would hit 5 Γ— 600 MB = 3 GB plus Python overhead β€” well within the 16 GB tier budget, but the Space can still OOM if the image unexpectedly loads fp32 weights.

Handling:

  • Whisper is forced to compute_type="int8" and Kokoro to fp32 (its default is already smallest viable). audio/*.py asserts these at load time.
  • The models are singletons shared across sessions (they are stateless w.r.t. concurrent calls; CTranslate2 releases the GIL during decode). Memory budget is therefore ~600 MB total, not per-session.
  • If an OOM happens, the container is killed by the HF Space OOM-killer and auto-restarts. We lose all in-flight sessions; clients re-/reset. The eviction sweep and TTL ensure no permanently-dead sessions pile up.

8. Examples

8.1 End-to-end /reset β†’ /step flow via curl

# Assume DRIFTCALL_ENV_TOKEN is set locally for scripting convenience.
TOKEN="${DRIFTCALL_ENV_TOKEN:?export DRIFTCALL_ENV_TOKEN first}"
BASE="https://<team>-driftcall-env.hf.space"

# 1. Reset with seed 42, stage 2 curriculum.
curl -sS -X POST "$BASE/reset" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Session-Id: demo-001" \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "config": {"curriculum_stage": 2}}'
# β†’ 200 {"observation": {"turn": 0, "goal": {...}, "last_transcript": "Bhai Friday ko...", ...}}

# 2. Step: call airline.search.
curl -sS -X POST "$BASE/step" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Session-Id: demo-001" \
  -H "Content-Type: application/json" \
  -d '{
    "action": {
      "action_type": "tool_call",
      "tool_name": "airline.search",
      "tool_args": {"origin": "DEL", "destination": "BLR", "date": "2026-04-26"}
    }
  }'
# β†’ 200 {"observation": {...}, "reward": 0.0, "done": false, "info": {"drift_fired": []}}

# 3. Inspect state (judge-only, optional).
curl -sS "$BASE/state" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Session-Id: demo-001"
# β†’ 200 {"episode_id": "...", "turn": 1, "max_turns": 12, "drift_schedule": [...], ...}

# 4. Close.
curl -sS -X POST "$BASE/close" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Session-Id: demo-001"
# β†’ 200 {"closed": true}

8.2 Container build + smoke + push

# Local build (from DRIFTCALL/ repo root)
docker build -t driftcall-env:local .

# Local smoke (bind a dummy secret)
docker run --rm -p 7860:7860 \
  -e DRIFTCALL_ENV_TOKEN=dev-local-token \
  driftcall-env:local

# In another shell:
curl -sS http://localhost:7860/healthz            # β†’ "ok"
curl -sS -X POST http://localhost:7860/reset \
  -H "Authorization: Bearer dev-local-token" \
  -H "X-Session-Id: smoke" \
  -H "Content-Type: application/json" -d '{}'
# β†’ 200 with initial observation

# Push to HF Space via the new `hf` CLI.
# The team-lead brief flags that `huggingface-cli` is deprecated; we migrate
# DriftCall/CLAUDE.md Β§6 row "HF push env" to `hf upload` in a follow-up PR.
hf upload <team>/driftcall-env . --repo-type=space
# (Requires `pip install huggingface_hub>=0.25` and `hf auth login` completed.)

8.3 openenv validate against the live Space

# Against local container:
openenv validate http://localhost:7860 \
  --auth-bearer dev-local-token

# Against deployed Space:
openenv validate https://<team>-driftcall-env.hf.space \
  --auth-bearer "$DRIFTCALL_ENV_TOKEN"

# Expected output:
#   βœ“ openenv.yaml parses, schema v1.0
#   βœ“ GET  /healthz β†’ 200 ok
#   βœ“ POST /reset   β†’ 200, observation matches observation_space.ref
#   βœ“ POST /step    β†’ 200, observation + reward + done
#   βœ“ GET  /state   β†’ 200, DriftCallState matches schema
#   βœ“ POST /close   β†’ 200
#   βœ“ 6 endpoints validated, 0 errors

Running this before submission is the DESIGN.md Β§12.2 hour-16 gate. If it fails, we fix before moving to training.


9. Open questions

  1. OpenEnv schema version pin: openenv==0.2.* in Β§4.5 is a placeholder. Confirm the exact current release on the hackathon kickoff morning (Apr 25) and tighten the pin; openenv validate schema fields may have shifted between 0.1 and 0.2.
  2. Per-worker cache divergence: documented in Β§3.2 as acceptable. Re-evaluate after local load-testing β€” if even training hits the cross-worker 404 path > 1% of the time, switch to --workers 1 with a bigger thread pool.
  3. HF Space CPU cold-start time: the free CPU basic tier can sleep on idle and take 60–120 s to wake. This doc assumes Space is "always-on" because we exercise it during development; if the judge hits a cold Space, the first /reset may appear hung. Risk-register coverage owned by docs/modules/risk_book.md.
  4. DRIFTCALL_ENV_TOKEN rotation during the hackathon: if the token leaks mid-judging, rotating it 401s the judge mid-run. Do we need a two-token grace period? Likely no (hackathon is 48 h and we trust submission channels), but flag for Person D's risk book.
  5. CLAUDE.md Β§6 hf upload migration: the hackathon briefing flags huggingface-cli as deprecated. Update DRIFTCALL/CLAUDE.md Β§6 rows ("HF push env", "HF push dataset") to hf upload ... --repo-type=... in a separate small PR so this design doc doesn't diverge from the command catalogue. Own: Person D.
  6. Image-size margin vs Β§1.1 Whisper upgrade path: if docs/modules/audio.md Β§1.1's WER bail-out triggers and we swap to faster-whisper-medium, final image grows from ~1.2 GB to ~1.8 GB. Still under the 2 GB Risk-10 bound but with less slack. Re-check image size after any audio-weights change.
  7. /state access control: should /state require the same bearer as mutating endpoints, or should we expose a narrower "episode summary" for judges without the full vendor-states dump? Current design keeps full state behind the bearer; revisit if leaderboard ops ask for a public read-only pane.