Spaces:
Sleeping
deploy_env_space.md β DriftCall Env HF Space Deployment
Owner: Person D (Deploy & Story)
Implements: DESIGN.md Β§3.3 (Deployed Env Topology), Β§11.1 (Env Space files), Β§13 (Deliverables)
Depends on: docs/modules/env.md (FastAPI surface contract), docs/modules/models.md (dataclass wire format), docs/modules/audio.md (Kokoro + Whisper runtime)
Status: DRAFT β pending β₯ 2 fresh critic rounds
1. Purpose
driftcall-env is the production hosting target for the DriftCall OpenEnv RL environment. It runs on a free-tier Hugging Face Space (Docker SDK, CPU basic, 2 vCPU / 16 GB RAM) and is the artifact the hackathon judges exercise via openenv validate. The Space exposes a FastAPI application implementing the OpenEnv REST contract (/reset, /step, /state, /close) plus a lightweight session cache so concurrent training / evaluation runs can share one deployment without state bleed.
The Space is intentionally CPU-only. Kokoro TTS (82 M params) and faster-whisper-small int8 (~244 M params) both run at roughly real-time on a single modern CPU core; the training topology (DESIGN.md Β§3.2, Β§9.4) never loads TTS/ASR because GRPO operates text-in / text-out. This module owns:
- The Dockerfile (multi-stage build, <2 GB final image, pre-pulled audio weights).
openenv.yamlmetadata (required foropenenv validate).requirements.txtpin set (fastapi, uvicorn, openenv, kokoro, faster-whisper, plus transitive deps).- The Space README (Space card) β must satisfy HF Space schema + hackathon submission rules.
- The session cache implementation sketch delegated to
app.py(full code indocs/modules/env.md; this doc specifies the cache's deployment constraints only). - The deployment command set (build, push, validate).
This doc is a design spec, not an executable. It must contain every decision needed so a single operator can ship the env Space in one 30-minute sitting on Apr 25 morning (DESIGN.md Β§12.2 pre-onsite hour 16 gate).
2. Interface
2.1 External HTTP surface (served by the Space)
The Space exposes the OpenEnv REST surface on port 7860 (HF Spaces Docker SDK convention β any other port is unreachable). All endpoints accept and return application/json. Session identity is carried as a request header so the cache can dispatch to the right env instance.
POST /reset β 200 application/json # create or recycle a session, return initial observation
POST /step β 200 application/json # advance one turn; returns observation + reward + done
GET /state β 200 application/json # read the current DriftCallState (debug / judge inspection)
POST /close β 200 application/json # explicitly evict a session
GET /healthz β 200 text/plain "ok" # Space healthcheck (HF pings this to mark the Space "running")
GET / β 200 text/html # minimal landing page (see Β§4.4); NOT the agent surface
Headers (all mutating endpoints):
| Header | Required | Notes |
|---|---|---|
Authorization: Bearer <DRIFTCALL_ENV_TOKEN> |
yes (see Β§3.5) | Space secret; judge receives this via submission form |
X-Session-Id: <uuid4-or-caller-chosen> |
yes | Opaque string, max 64 chars, [A-Za-z0-9_-] only |
Content-Type: application/json |
yes | UTF-8 |
The endpoint contracts (request / response shapes) are owned by docs/modules/env.md and serialize the DriftCallObservation / DriftCallState / DriftCallAction dataclasses defined in docs/modules/models.md. This doc only pins the deployment-visible aspects: port, headers, auth, status codes.
Cross-doc sync note (2026-04-24): DESIGN.md Β§3.3 was updated to match this doc's choice of carrying session identity via the
X-Session-IdHTTP header (previously documented there as asession_idquery param). Both docs now agree. No behavior change in this spec β the note is recorded so reviewers don't perceive divergence.
2.1.1 Success body shapes (top-level only)
Top-level JSON shapes for each success response. Inner dataclass fields (DriftCallObservation, DriftCallAction, DriftCallState) are owned by docs/modules/env.md and docs/modules/models.md β this section pins only the envelope each endpoint returns.
POST /reset
Request:
{
"config": {
"curriculum_stage": 1,
"language_weights": { "hi": 0.4, "ta": 0.2, "kn": 0.2, "hinglish": 0.2 },
"audio_boundary_enabled": true
},
"seed": 42
}
config.curriculum_stage:1 | 2 | 3config.language_weights: object, keys are language codes, values sum to 1.0config.audio_boundary_enabled: boolseed:int | null
Response:
{
"observation": { "...DriftCallObservation..." },
"episode_id": "uuid4-string",
"max_turns": 12
}
POST /step
Request:
{ "action": { "...DriftCallAction..." } }
Response:
{
"observation": { "...DriftCallObservation..." },
"reward": 0.0,
"done": false,
"info": { "...opaque..." }
}
reward:float | null(null when reward is deferred to episode end)
GET /state
Response:
{
"state": { "...DriftCallState..." },
"turn": 3
}
POST /close
Response:
{
"closed": true,
"final_state": { "...DriftCallState... | null" }
}
final_state:object | null(null if session was already evicted)
Deeper field-level detail for DriftCallObservation, DriftCallAction, and DriftCallState lives in docs/modules/env.md and docs/modules/models.md β do not duplicate it here.
2.2 Status code map
| Code | Meaning | Triggered by |
|---|---|---|
| 200 | Success | Normal return |
| 400 | Malformed JSON / missing header / invalid action shape | Parsing or dataclass validation failure |
| 401 | Missing or bad bearer | Β§3.5 auth check |
| 404 | X-Session-Id not in cache (for /step / /state / /close) |
Session expired, evicted, or never created |
| 409 | Concurrent /reset on same session id (see Β§7, case 1) |
Cache key collision during init |
| 429 | Max concurrent sessions reached | Β§3.2 cap hit |
| 500 | Unhandled exception inside env step | Bug; logged, stack trace NOT returned in body |
| 503 | Model weights not yet loaded on cold-start | Β§7, case 3 |
All error bodies are {"error": {"code": "<slug>", "message": "<user-safe string>"}}. Internal stack traces never cross the wire.
2.3 Outbound network
The Space makes zero outbound HTTP calls at runtime. Kokoro and Whisper weights are baked into the image (Β§4.2); no HF Hub fetches, no telemetry, no phone-home. This is load-bearing because HF Spaces free CPU tier often has slow / rate-limited egress, and because reproducibility demands an offline image.
2.4 Container entrypoint
CMD ["uvicorn", "app:app", \
"--host", "0.0.0.0", \
"--port", "7860", \
"--workers", "2", \
"--timeout-keep-alive", "30", \
"--log-level", "info"]
Two uvicorn workers (not four) β CPU basic tier has 2 vCPUs, and Kokoro/Whisper hold the GIL on synthesis/transcription; more workers just contend for the same cores.
3. Behavior Spec
3.1 Session lifecycle
A session is an instance of DriftCallEnvironment (the class whose full behavior lives in docs/modules/env.md). The deployment layer treats each session as an opaque object with reset(), step(), state(), close() methods and does not introspect it.
client Space (app.py) cache
β POST /reset {seed, config} β β
β X-Session-Id: S1 β β
βββββββββββββββββββββββββββββββββββΆβ look up S1 β
β βββββββββββββββββββββββββββββββΆβ
β βββββββ miss βββββββββββββββββββ€
β β construct env, bind seed β
β β store (env, last_touched) β
β βββββββββββββββββββββββββββββββΆβ
β β env.reset(...) β obs β
ββββββββββββββ 200 obs ββββββββββββ€ β
β β β
β POST /step β β
βββββββββββββββββββββββββββββββββββΆβ lookup S1 β hit β
β β touch last_touched = now β
β β env.step(...) β obs,r,done β
ββββββββββββ 200 obs,r βββββββββββββ€ β
3.2 Cache policy (deployment-level invariants)
The cache is an in-process dict, keyed by X-Session-Id. The implementation lives in app.py (docs/modules/env.md Β§3 "session cache"), but this doc locks the policy:
| Invariant | Value | Source |
|---|---|---|
| Max concurrent sessions | 10 | DESIGN.md Β§3.3 |
TTL (time since last_touched) |
3600 s = 1 hr | DESIGN.md Β§3.3 |
| Storage | In-memory only (no Redis, no disk) | Free tier has no persistent disk writable at runtime; container state resets on Space rebuild |
| Eviction policy | LRU when cap reached; stale-TTL sweep every 60 s | Β§3.3 |
| Cross-process sharing | None β each uvicorn worker has its own cache | Acceptable because cache is advisory; clients that get routed to a different worker on re-connect re-issue /reset |
Consequence of the "per-worker cache" choice: a client's session id may land on worker W1 for /reset and W2 for /step (uvicorn uses round-robin-ish scheduling on the OS socket). In that case /step returns 404 and the client must re-/reset. This is acceptable for the hackathon because:
- Training / eval runs keep a persistent HTTP connection via
requests.Session, which typically pins to one worker for the life of the socket. - Judges use one session end-to-end; they hit
/resetand then replay steps over the same connection. - Two-worker degradation is documented in the Space README so judges don't get silently surprised.
A future hardening path (not in-scope for this hackathon) is to run --workers 1 with thread pool, or share the cache via multiprocessing.Manager. Both are listed in Β§9.
3.3 Eviction sweep
A background asyncio task (started in app.py lifespan) runs every 60 s:
for sid, entry in list(cache.items()):
if now() - entry.last_touched > TTL:
env = cache.pop(sid).env
env.close() # frees whatever audio buffers the env holds
LRU eviction on /reset when len(cache) >= 10 drops the oldest last_touched entry first; the new session replaces it.
3.4 Streaming / keep-alive
All endpoint responses are single JSON bodies β no SSE, no websockets, no chunked streaming. OpenEnv's client library (openenv.HTTPEnvClient) uses blocking POST + json() and a shared requests.Session; anything exotic risks failing openenv validate. A /step call may take up to ~5 s when an audio pass is involved (Kokoro synth + Whisper transcribe on CPU), so we set --timeout-keep-alive 30 to keep the socket alive comfortably below the 60 s HF Spaces proxy timeout.
3.5 Authentication
A single shared-secret bearer guards all mutating endpoints. The token is injected as a HF Space Secret named DRIFTCALL_ENV_TOKEN and read by app.py at import time. /healthz is unauthenticated (HF Space probes have no bearer).
- Token format: 32+ byte URL-safe random (
secrets.token_urlsafe(32)). - Token rotation: delete the Space secret and push a new one; all in-flight sessions 401 on the next request.
- Missing secret at Space boot β container exits 1 (fail-fast).
- The token is bundled with the hackathon submission package so judges can exercise
openenv validateagainst the live Space.
3.6 Determinism
The deployment does not itself introduce nondeterminism. env.py owns seed handling; the cache is a pass-through. However, two CPU-bound sources of wall-clock variance can change observable latency (tool_results[i].latency_ms is wall-clock, not simulated):
- Kokoro synth time on the first call after cold start can be 2β3Γ steady-state due to JIT / lazy graph compile.
- Whisper VAD + decode time varies with input length.
Neither perturbs reward math β latency_ms is informational, never scored.
3.7 Logging
Structured JSON logs to stdout (HF Spaces captures stdout into the Logs tab). One log line per request, fields: ts, level, session_id, endpoint, status, latency_ms, turn, err_code (nullable). No PII, no audio bytes, no bearer token. The full DriftCallAction body is logged at DEBUG only, disabled by default.
4. Data structures
4.1 SessionEntry
@dataclass(frozen=True)
class SessionEntry:
env: DriftCallEnvironment # opaque; see docs/modules/env.md
created_at: float # time.monotonic() at /reset
last_touched: float # time.monotonic() at every /step|/state
reset_count: int # incremented on in-place /reset (Β§7, case 1)
Frozen per project rule (CLAUDE.md Β§7). last_touched updates produce a new SessionEntry; the cache dict replaces the old entry.
4.2 Dockerfile layout
Multi-stage build. Stage 1 installs wheels into a throwaway image; stage 2 copies only the site-packages dir and the app code. Target final image < 2 GB (DESIGN.md Risk 10).
# -------- Stage 1: builder --------
FROM python:3.11-slim AS builder
ENV PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential git libsndfile1 ffmpeg && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./
RUN pip install --prefix=/install -r requirements.txt
# Pre-pull model weights so first /reset is fast
RUN pip install --prefix=/install huggingface_hub
RUN PYTHONPATH=/install/lib/python3.11/site-packages \
python -c "from huggingface_hub import snapshot_download; \
snapshot_download('hexgrad/Kokoro-82M', cache_dir='/weights'); \
snapshot_download('Systran/faster-whisper-small', cache_dir='/weights')"
# -------- Stage 2: runtime --------
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1 \
HF_HOME=/root/.cache/huggingface \
TRANSFORMERS_OFFLINE=1 \
HF_HUB_OFFLINE=1
RUN apt-get update && apt-get install -y --no-install-recommends \
libsndfile1 ffmpeg ca-certificates && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
COPY --from=builder /weights /root/.cache/huggingface
WORKDIR /app
COPY app.py openenv.yaml ./
COPY driftcall/ ./driftcall/
COPY data/ ./data/
EXPOSE 7860
HEALTHCHECK --interval=30s --timeout=5s --start-period=45s \
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:7860/healthz', timeout=4).read()" || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2", "--timeout-keep-alive", "30", "--log-level", "info"]
Key decisions:
python:3.11-slimbase: smallest stable Python base with glibc (alpine would force musl-incompatible wheels forfaster-whisper/ctranslate2).ffmpeginstalled because Whisper's audio loader shells out to it for anything non-WAV.HF_HUB_OFFLINE=1+TRANSFORMERS_OFFLINE=1are hard guarantees β if a download is attempted at runtime it raises, never silently fetches and hangs (Β§5, mode M6).- Weights land under
/root/.cache/huggingface; that's where both Kokoro and faster-whisper look by default.
4.3 openenv.yaml
# openenv.yaml β consumed by `openenv validate`
# Schema source: https://github.com/meta-pytorch/OpenEnv
schema_version: "1.0"
env:
id: driftcall
version: "0.1.0"
display_name: "DriftCall β Indic Voice Concierge under Schema Drift"
description: >
OpenEnv-compliant RL environment where a voice-first agent must complete
Indic consumer concierge tasks while the vendor APIs undergo mid-episode
schema, policy, T&C, pricing, and auth drift. Five independent reward
components; deterministic seeded drift; Hindi/Tamil/Kannada/Hinglish
briefs via Kokoro TTS + faster-whisper ASR.
license: apache-2.0
tags:
- openenv
- rl
- voice
- indic
- schema-drift
entrypoint:
type: http
base_url: "https://<team>-driftcall-env.hf.space"
endpoints:
reset: "/reset"
step: "/step"
state: "/state"
close: "/close"
health: "/healthz"
auth:
type: bearer
secret_env: DRIFTCALL_ENV_TOKEN
action_space:
ref: "docs/modules/models.md#DriftCallAction"
observation_space:
ref: "docs/modules/models.md#DriftCallObservation"
episode:
max_turns: 16 # worst case, stage-3 curriculum (DESIGN.md Β§4.5)
reset_config:
seed: { type: int, required: false }
curriculum_stage: { type: int, range: [1, 3], required: false }
language_weights: { type: object, required: false }
reward:
shape: scalar
range: [-1.0, 1.0]
components:
ref: "docs/modules/rewards.md"
Field names match the OpenEnv v1.0 schema (entrypoint.type, action_space.ref, etc.). The ref pointers resolve to paths inside the repo; openenv validate reads them to assert the env is self-describing.
4.4 README.md (Space card)
---
title: DriftCall Env
emoji: π§
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
short_description: OpenEnv β Indic voice concierge under schema drift.
---
Below the YAML header: one-paragraph description, openenv validate command, auth note, link to GitHub, link to the demo Space, link to the HF Hub model + dataset. The README is also rendered as the root / route's fallback (Docker Spaces serve nothing at / otherwise).
4.5 requirements.txt
fastapi==0.115.*
uvicorn[standard]==0.32.*
pydantic==2.*
openenv==0.2.* # or whatever is current at build time; version-pin in PR
kokoro==0.9.*
faster-whisper==1.1.*
ctranslate2==4.5.* # pinned to match faster-whisper's wheel
soundfile==0.12.*
numpy<2.0
huggingface_hub==0.26.* # only used at build time (snapshot_download)
The version set matches docs/modules/audio.md Β§6.1 (upstream consumer) exactly. Pinning is deliberate: the env Space is a reproducibility artifact; judges may rebuild it months from now.
5. Error modes
Every failure path that can cross the HTTP boundary:
| ID | When | HTTP | Body error.code |
Recovery |
|---|---|---|---|---|
| M1 | No Authorization header, or bad bearer |
401 | unauthorized |
Client fixes token |
| M2 | No X-Session-Id on /reset//step//state//close |
400 | missing_session_id |
Client adds header |
| M3 | /step//state//close with unknown session id |
404 | session_not_found |
Client re-issues /reset |
| M4 | Session was in cache but TTL expired between request and handler | 404 | session_expired |
Client re-issues /reset |
| M5 | /reset when cache is full and LRU victim cannot be evicted (all 10 slots freshly last_touched) |
429 | max_sessions |
Client backs off and retries; Retry-After: 30 header set |
| M6 | Kokoro or Whisper model weights missing at startup (image build was broken) | 503 | model_not_ready |
Operator fixes image; client cannot recover |
| M7 | Malformed JSON in request body | 400 | bad_json |
Client fixes payload |
| M8 | Action fails pydantic / dataclass validation (wrong ActionType, missing tool_name for TOOL_CALL) |
400 | invalid_action |
Client fixes action |
| M9 | Unhandled exception in env.step |
500 | internal_error |
Logged with request id; client SHOULD NOT retry same action |
| M10 | Disk full writing tmp WAV in audio pipeline | 500 | io_error |
Very rare on HF Spaces (no writable persistent disk, but /tmp is tmpfs and can fill); operator action |
| M11 | Request body exceeds 1 MiB | 413 | payload_too_large |
Client trims (should never happen; actions are small) |
| M12 | Concurrent /reset on same session id (two requests race) |
409 | reset_in_progress |
Client serializes resets on its side |
Rules:
- No stack traces in response bodies.
request_id(uvicorn's ASGI scope id) is included so operators can grep logs. - All error responses include
Cache-Control: no-store. - M5 (
429) is the only code that includesRetry-After. Others are terminal for the request.
6. Dependencies
6.1 Upstream (consumed by the deployment artifact)
docs/modules/env.mdβ definesDriftCallEnvironment.__init__/reset/step/state/closeand the FastAPI route handlers. This doc references but does not duplicate env behavior.docs/modules/models.mdβ every dataclass crossing the HTTP boundary.docs/modules/audio.mdβ Kokoro + Whisper integration; tells this doc which weights to pre-pull and what CPU footprint to budget.docs/modules/rewards.mdβ cited fromopenenv.yamlreward.components.ref.- DESIGN.md Β§3.3, Β§9.1, Β§9.2, Β§11.1, Β§13, Risk 10 β authoritative.
6.2 External runtime dependencies (pinned in Β§4.5)
fastapi, uvicorn[standard], openenv, kokoro, faster-whisper, ctranslate2, soundfile, pydantic, numpy<2.0, huggingface_hub (build-time only).
6.3 Hugging Face platform dependencies
- Space SDK:
docker(NOTgradio/static). The Docker SDK is the only path that lets us bake weights into the image and pinuvicornworkers. - Space hardware:
cpu-basic(free). 2 vCPU, 16 GB RAM, 50 GB ephemeral disk, no persistent storage, no GPU. - Space secrets:
DRIFTCALL_ENV_TOKEN(required). - Space env vars: none (all config is baked in or via
X-Session-Id). - Space region: default (us-east-1); we do not need region pinning for CPU-basic.
6.4 Downstream consumers (who pings this Space)
training/eval_baseline.pyandtraining/eval_final.py(DESIGN.md Β§12) β the training-sideHTTPEnvClient.demo/app_gradio.pyβ the demo Space (documented indocs/modules/deploy_demo_space.md) uses this env over HTTP for live runs.openenv validate .β run against the Space URL as part of the hackathon submission gate.- Hackathon judges β direct HTTP exercise via curl / the
openenvCLI.
6.5 Explicit non-dependencies
- No GPU at runtime (load-bearing; DESIGN.md Β§3.3).
- No LLM weights on the env Space (Gemma 4 lives on the demo Space or on the trainer's local V100).
- No training code (
training/is NOT copied into the image; see Β§4.2COPYlist). - No HF Hub network at runtime (Β§2.3, Β§4.2 offline envs).
7. Edge cases
Six cases the deployment plan must handle correctly. Each is load-bearing for either the 30-min deploy window or the judge's openenv validate run.
7.1 Concurrent /reset on the same session id
Client A and client B both POST /reset with X-Session-Id: S1 within the same ~100 ms window. The cache uses a per-session asyncio lock; the second request observes the session mid-construction.
Handling:
- If the first request is still inside
env.__init__, the second request gets409 reset_in_progress. Client is expected to serialize on its side. - If the first request has completed, the second request performs an in-place reset: the old env is
.close()'d, a new env replaces it,reset_count += 1. This matchesgym's idempotent reset semantics. seedis honored on the winning reset; the losing (409'd) request's seed is discarded.
7.2 /step on an evicted session
A client idles for 65 minutes between /step calls. The sweep task evicts the session at minute 60. The client's next /step returns 404 session_expired.
Handling:
- The client MUST re-issue
/resetwith the same or new seed; it cannot resume mid-episode. This is explicit in the Space README. - No attempt is made to persist episode state across evictions. The free tier has no writable persistent disk, and replaying a seeded episode is cheap (< 1 s on the CPU basic tier).
env.close()is called on eviction to release the Kokoro audio buffer (saves ~80 MB resident per lingering session).
7.3 Cold-start model-weight load race
The Space boots. Uvicorn workers start and each lazily triggers a Kokoro + Whisper load on the first audio-involving /step. Whisper's CTranslate2 model load takes ~3β5 s; Kokoro takes ~2 s. A /step arriving before load completes can block up to ~8 s.
Handling:
app.py'slifespanstartup hook performs an eager load of both models during container boot. This turns cold-start latency into Space "Startingβ¦" time (which HF surfaces via the spinner) instead of a hung client request.- If eager load fails (bad weights, disk corruption), the container exits 1 and HF's Space restart loop catches it β operator sees the Space status as "Error" instead of silently hanging.
- The first
/healthzprobe is expected at +30 s (--start-period=45son the HEALTHCHECK gives us a comfortable margin).
7.4 Kokoro voice pack missing for a language
Kokoro is loaded at startup but an individual voice pack for language="kn" (Kannada) is missing from the snapshot cache due to a partial download.
Handling:
audio/tts_kokoro.py(perdocs/modules/audio.mdΒ§5) raisesVoicePackMissingError. The env treats this as a SPEAK-action failure and returns atool_resultsentry withstatus="schema_error"andresponse={"error": "voice_unavailable"}. The episode continues; reward R4 (format compliance) may drop but R1/R2 are unaffected.- The image build in Β§4.2 pre-pulls the full Kokoro snapshot (
snapshot_download('hexgrad/Kokoro-82M')), which includes all voice packs. If a voice pack is missing at runtime, the image is broken β operator fixes the Dockerfile and rebuilds.
7.5 HTTP timeout mid-/step
A /step takes 35 s because Whisper is processing a long utterance and the Space is also handling three concurrent episodes. The HF Space edge proxy has a 60 s idle timeout β we stay under it but only barely.
Handling:
--timeout-keep-alive 30means uvicorn holds the connection; the HTTP client's TCP timeout should be β₯ 60 s (defaultrequests.Sessiontimeout is infinite β safe).- Inside
env.step, audio ops have hard caps owned byaudio/*.py: Whispermax_duration_s=30, Kokoro synth implicitly bounded by text length. The env cannot produce a/steplonger than ~40 s at p99. - If a
/stepdoes exceed 60 s (e.g., 10 concurrent sessions all doing audio at once on 2 vCPU), the proxy closes the socket and the client seesConnectionError. Client re-issues; the session is still in the cache and the step was effectively a no-op on the server side because responses are atomic-on-return (state is only mutated after all work succeeds β seedocs/modules/env.mdΒ§3 transactional step semantics).
7.6 Out-of-memory during concurrent audio
Five sessions simultaneously run audio-heavy /steps. Each Whisper int8 model takes ~250 MB RAM; Kokoro takes ~350 MB. Naive loading would hit 5 Γ 600 MB = 3 GB plus Python overhead β well within the 16 GB tier budget, but the Space can still OOM if the image unexpectedly loads fp32 weights.
Handling:
- Whisper is forced to
compute_type="int8"and Kokoro to fp32 (its default is already smallest viable).audio/*.pyasserts these at load time. - The models are singletons shared across sessions (they are stateless w.r.t. concurrent calls; CTranslate2 releases the GIL during decode). Memory budget is therefore
~600 MB total, not per-session. - If an OOM happens, the container is killed by the HF Space OOM-killer and auto-restarts. We lose all in-flight sessions; clients re-
/reset. The eviction sweep and TTL ensure no permanently-dead sessions pile up.
8. Examples
8.1 End-to-end /reset β /step flow via curl
# Assume DRIFTCALL_ENV_TOKEN is set locally for scripting convenience.
TOKEN="${DRIFTCALL_ENV_TOKEN:?export DRIFTCALL_ENV_TOKEN first}"
BASE="https://<team>-driftcall-env.hf.space"
# 1. Reset with seed 42, stage 2 curriculum.
curl -sS -X POST "$BASE/reset" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001" \
-H "Content-Type: application/json" \
-d '{"seed": 42, "config": {"curriculum_stage": 2}}'
# β 200 {"observation": {"turn": 0, "goal": {...}, "last_transcript": "Bhai Friday ko...", ...}}
# 2. Step: call airline.search.
curl -sS -X POST "$BASE/step" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001" \
-H "Content-Type: application/json" \
-d '{
"action": {
"action_type": "tool_call",
"tool_name": "airline.search",
"tool_args": {"origin": "DEL", "destination": "BLR", "date": "2026-04-26"}
}
}'
# β 200 {"observation": {...}, "reward": 0.0, "done": false, "info": {"drift_fired": []}}
# 3. Inspect state (judge-only, optional).
curl -sS "$BASE/state" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001"
# β 200 {"episode_id": "...", "turn": 1, "max_turns": 12, "drift_schedule": [...], ...}
# 4. Close.
curl -sS -X POST "$BASE/close" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001"
# β 200 {"closed": true}
8.2 Container build + smoke + push
# Local build (from DRIFTCALL/ repo root)
docker build -t driftcall-env:local .
# Local smoke (bind a dummy secret)
docker run --rm -p 7860:7860 \
-e DRIFTCALL_ENV_TOKEN=dev-local-token \
driftcall-env:local
# In another shell:
curl -sS http://localhost:7860/healthz # β "ok"
curl -sS -X POST http://localhost:7860/reset \
-H "Authorization: Bearer dev-local-token" \
-H "X-Session-Id: smoke" \
-H "Content-Type: application/json" -d '{}'
# β 200 with initial observation
# Push to HF Space via the new `hf` CLI.
# The team-lead brief flags that `huggingface-cli` is deprecated; we migrate
# DriftCall/CLAUDE.md Β§6 row "HF push env" to `hf upload` in a follow-up PR.
hf upload <team>/driftcall-env . --repo-type=space
# (Requires `pip install huggingface_hub>=0.25` and `hf auth login` completed.)
8.3 openenv validate against the live Space
# Against local container:
openenv validate http://localhost:7860 \
--auth-bearer dev-local-token
# Against deployed Space:
openenv validate https://<team>-driftcall-env.hf.space \
--auth-bearer "$DRIFTCALL_ENV_TOKEN"
# Expected output:
# β openenv.yaml parses, schema v1.0
# β GET /healthz β 200 ok
# β POST /reset β 200, observation matches observation_space.ref
# β POST /step β 200, observation + reward + done
# β GET /state β 200, DriftCallState matches schema
# β POST /close β 200
# β 6 endpoints validated, 0 errors
Running this before submission is the DESIGN.md Β§12.2 hour-16 gate. If it fails, we fix before moving to training.
9. Open questions
- OpenEnv schema version pin:
openenv==0.2.*in Β§4.5 is a placeholder. Confirm the exact current release on the hackathon kickoff morning (Apr 25) and tighten the pin;openenv validateschema fields may have shifted between 0.1 and 0.2. - Per-worker cache divergence: documented in Β§3.2 as acceptable. Re-evaluate after local load-testing β if even training hits the cross-worker 404 path > 1% of the time, switch to
--workers 1with a bigger thread pool. - HF Space CPU cold-start time: the free CPU basic tier can sleep on idle and take 60β120 s to wake. This doc assumes Space is "always-on" because we exercise it during development; if the judge hits a cold Space, the first
/resetmay appear hung. Risk-register coverage owned bydocs/modules/risk_book.md. DRIFTCALL_ENV_TOKENrotation during the hackathon: if the token leaks mid-judging, rotating it 401s the judge mid-run. Do we need a two-token grace period? Likely no (hackathon is 48 h and we trust submission channels), but flag for Person D's risk book.- CLAUDE.md Β§6
hf uploadmigration: the hackathon briefing flagshuggingface-clias deprecated. UpdateDRIFTCALL/CLAUDE.mdΒ§6 rows ("HF push env", "HF push dataset") tohf upload ... --repo-type=...in a separate small PR so this design doc doesn't diverge from the command catalogue. Own: Person D. - Image-size margin vs Β§1.1 Whisper upgrade path: if
docs/modules/audio.mdΒ§1.1's WER bail-out triggers and we swap tofaster-whisper-medium, final image grows from ~1.2 GB to ~1.8 GB. Still under the 2 GB Risk-10 bound but with less slack. Re-check image size after any audio-weights change. /stateaccess control: should/staterequire the same bearer as mutating endpoints, or should we expose a narrower "episode summary" for judges without the full vendor-states dump? Current design keeps full state behind the bearer; revisit if leaderboard ops ask for a public read-only pane.