Spaces:
Sleeping
Sleeping
File size: 33,593 Bytes
f2df60e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 | # deploy_env_space.md β DriftCall Env HF Space Deployment
**Owner:** Person D (Deploy & Story)
**Implements:** DESIGN.md Β§3.3 (Deployed Env Topology), Β§11.1 (Env Space files), Β§13 (Deliverables)
**Depends on:** `docs/modules/env.md` (FastAPI surface contract), `docs/modules/models.md` (dataclass wire format), `docs/modules/audio.md` (Kokoro + Whisper runtime)
**Status:** DRAFT β pending β₯ 2 fresh critic rounds
---
## 1. Purpose
`driftcall-env` is the production hosting target for the DriftCall OpenEnv RL environment. It runs on a **free-tier Hugging Face Space (Docker SDK, CPU basic, 2 vCPU / 16 GB RAM)** and is the artifact the hackathon judges exercise via `openenv validate`. The Space exposes a FastAPI application implementing the OpenEnv REST contract (`/reset`, `/step`, `/state`, `/close`) plus a lightweight session cache so concurrent training / evaluation runs can share one deployment without state bleed.
The Space is **intentionally CPU-only**. Kokoro TTS (82 M params) and `faster-whisper-small` int8 (~244 M params) both run at roughly real-time on a single modern CPU core; the training topology (DESIGN.md Β§3.2, Β§9.4) never loads TTS/ASR because GRPO operates text-in / text-out. This module owns:
1. The Dockerfile (multi-stage build, <2 GB final image, pre-pulled audio weights).
2. `openenv.yaml` metadata (required for `openenv validate`).
3. `requirements.txt` pin set (fastapi, uvicorn, openenv, kokoro, faster-whisper, plus transitive deps).
4. The Space README (Space card) β must satisfy HF Space schema + hackathon submission rules.
5. The session cache implementation sketch delegated to `app.py` (full code in `docs/modules/env.md`; this doc specifies the cache's **deployment constraints** only).
6. The deployment command set (build, push, validate).
This doc is a design spec, not an executable. It must contain every decision needed so a single operator can ship the env Space in one 30-minute sitting on Apr 25 morning (DESIGN.md Β§12.2 pre-onsite hour 16 gate).
---
## 2. Interface
### 2.1 External HTTP surface (served by the Space)
The Space exposes the OpenEnv REST surface on **port 7860** (HF Spaces Docker SDK convention β any other port is unreachable). All endpoints accept and return `application/json`. Session identity is carried as a request header so the cache can dispatch to the right env instance.
```
POST /reset β 200 application/json # create or recycle a session, return initial observation
POST /step β 200 application/json # advance one turn; returns observation + reward + done
GET /state β 200 application/json # read the current DriftCallState (debug / judge inspection)
POST /close β 200 application/json # explicitly evict a session
GET /healthz β 200 text/plain "ok" # Space healthcheck (HF pings this to mark the Space "running")
GET / β 200 text/html # minimal landing page (see Β§4.4); NOT the agent surface
```
**Headers (all mutating endpoints):**
| Header | Required | Notes |
|---|---|---|
| `Authorization: Bearer <DRIFTCALL_ENV_TOKEN>` | yes (see Β§3.5) | Space secret; judge receives this via submission form |
| `X-Session-Id: <uuid4-or-caller-chosen>` | yes | Opaque string, max 64 chars, `[A-Za-z0-9_-]` only |
| `Content-Type: application/json` | yes | UTF-8 |
The endpoint contracts (request / response shapes) are owned by `docs/modules/env.md` and serialize the `DriftCallObservation` / `DriftCallState` / `DriftCallAction` dataclasses defined in `docs/modules/models.md`. This doc only pins the **deployment-visible** aspects: port, headers, auth, status codes.
> **Cross-doc sync note (2026-04-24):** DESIGN.md Β§3.3 was updated to match this doc's choice of carrying session identity via the `X-Session-Id` HTTP header (previously documented there as a `session_id` query param). Both docs now agree. No behavior change in this spec β the note is recorded so reviewers don't perceive divergence.
### 2.1.1 Success body shapes (top-level only)
Top-level JSON shapes for each success response. Inner dataclass fields (`DriftCallObservation`, `DriftCallAction`, `DriftCallState`) are owned by `docs/modules/env.md` and `docs/modules/models.md` β this section pins only the envelope each endpoint returns.
**`POST /reset`**
Request:
```json
{
"config": {
"curriculum_stage": 1,
"language_weights": { "hi": 0.4, "ta": 0.2, "kn": 0.2, "hinglish": 0.2 },
"audio_boundary_enabled": true
},
"seed": 42
}
```
- `config.curriculum_stage`: `1 | 2 | 3`
- `config.language_weights`: object, keys are language codes, values sum to 1.0
- `config.audio_boundary_enabled`: bool
- `seed`: `int | null`
Response:
```json
{
"observation": { "...DriftCallObservation..." },
"episode_id": "uuid4-string",
"max_turns": 12
}
```
**`POST /step`**
Request:
```json
{ "action": { "...DriftCallAction..." } }
```
Response:
```json
{
"observation": { "...DriftCallObservation..." },
"reward": 0.0,
"done": false,
"info": { "...opaque..." }
}
```
- `reward`: `float | null` (null when reward is deferred to episode end)
**`GET /state`**
Response:
```json
{
"state": { "...DriftCallState..." },
"turn": 3
}
```
**`POST /close`**
Response:
```json
{
"closed": true,
"final_state": { "...DriftCallState... | null" }
}
```
- `final_state`: `object | null` (null if session was already evicted)
Deeper field-level detail for `DriftCallObservation`, `DriftCallAction`, and `DriftCallState` lives in `docs/modules/env.md` and `docs/modules/models.md` β do not duplicate it here.
### 2.2 Status code map
| Code | Meaning | Triggered by |
|---|---|---|
| 200 | Success | Normal return |
| 400 | Malformed JSON / missing header / invalid action shape | Parsing or dataclass validation failure |
| 401 | Missing or bad bearer | Β§3.5 auth check |
| 404 | `X-Session-Id` not in cache (for `/step` / `/state` / `/close`) | Session expired, evicted, or never created |
| 409 | Concurrent `/reset` on same session id (see Β§7, case 1) | Cache key collision during init |
| 429 | Max concurrent sessions reached | Β§3.2 cap hit |
| 500 | Unhandled exception inside env step | Bug; logged, stack trace NOT returned in body |
| 503 | Model weights not yet loaded on cold-start | Β§7, case 3 |
All error bodies are `{"error": {"code": "<slug>", "message": "<user-safe string>"}}`. Internal stack traces never cross the wire.
### 2.3 Outbound network
The Space makes **zero outbound HTTP calls at runtime**. Kokoro and Whisper weights are baked into the image (Β§4.2); no HF Hub fetches, no telemetry, no phone-home. This is load-bearing because HF Spaces free CPU tier often has slow / rate-limited egress, and because reproducibility demands an offline image.
### 2.4 Container entrypoint
```dockerfile
CMD ["uvicorn", "app:app", \
"--host", "0.0.0.0", \
"--port", "7860", \
"--workers", "2", \
"--timeout-keep-alive", "30", \
"--log-level", "info"]
```
Two uvicorn workers (not four) β CPU basic tier has 2 vCPUs, and Kokoro/Whisper hold the GIL on synthesis/transcription; more workers just contend for the same cores.
---
## 3. Behavior Spec
### 3.1 Session lifecycle
A session is an instance of `DriftCallEnvironment` (the class whose full behavior lives in `docs/modules/env.md`). The deployment layer treats each session as an opaque object with `reset()`, `step()`, `state()`, `close()` methods and does not introspect it.
```
client Space (app.py) cache
β POST /reset {seed, config} β β
β X-Session-Id: S1 β β
βββββββββββββββββββββββββββββββββββΆβ look up S1 β
β βββββββββββββββββββββββββββββββΆβ
β βββββββ miss βββββββββββββββββββ€
β β construct env, bind seed β
β β store (env, last_touched) β
β βββββββββββββββββββββββββββββββΆβ
β β env.reset(...) β obs β
ββββββββββββββ 200 obs ββββββββββββ€ β
β β β
β POST /step β β
βββββββββββββββββββββββββββββββββββΆβ lookup S1 β hit β
β β touch last_touched = now β
β β env.step(...) β obs,r,done β
ββββββββββββ 200 obs,r βββββββββββββ€ β
```
### 3.2 Cache policy (deployment-level invariants)
The cache is an in-process dict, keyed by `X-Session-Id`. The implementation lives in `app.py` (`docs/modules/env.md` Β§3 "session cache"), but this doc locks the policy:
| Invariant | Value | Source |
|---|---|---|
| Max concurrent sessions | **10** | DESIGN.md Β§3.3 |
| TTL (time since `last_touched`) | **3600 s = 1 hr** | DESIGN.md Β§3.3 |
| Storage | In-memory only (no Redis, no disk) | Free tier has no persistent disk writable at runtime; container state resets on Space rebuild |
| Eviction policy | LRU when cap reached; stale-TTL sweep every 60 s | Β§3.3 |
| Cross-process sharing | None β each uvicorn worker has its own cache | Acceptable because cache is advisory; clients that get routed to a different worker on re-connect re-issue `/reset` |
**Consequence of the "per-worker cache" choice:** a client's session id may land on worker W1 for `/reset` and W2 for `/step` (uvicorn uses round-robin-ish scheduling on the OS socket). In that case `/step` returns 404 and the client must re-`/reset`. This is acceptable for the hackathon because:
1. Training / eval runs keep a persistent HTTP connection via `requests.Session`, which typically pins to one worker for the life of the socket.
2. Judges use one session end-to-end; they hit `/reset` and then replay steps over the same connection.
3. Two-worker degradation is documented in the Space README so judges don't get silently surprised.
A future hardening path (not in-scope for this hackathon) is to run `--workers 1` with thread pool, or share the cache via `multiprocessing.Manager`. Both are listed in Β§9.
### 3.3 Eviction sweep
A background asyncio task (started in `app.py` `lifespan`) runs every 60 s:
```
for sid, entry in list(cache.items()):
if now() - entry.last_touched > TTL:
env = cache.pop(sid).env
env.close() # frees whatever audio buffers the env holds
```
LRU eviction on `/reset` when `len(cache) >= 10` drops the oldest `last_touched` entry first; the new session replaces it.
### 3.4 Streaming / keep-alive
All endpoint responses are single JSON bodies β **no SSE, no websockets, no chunked streaming**. OpenEnv's client library (`openenv.HTTPEnvClient`) uses blocking `POST` + `json()` and a shared `requests.Session`; anything exotic risks failing `openenv validate`. A `/step` call may take up to ~5 s when an audio pass is involved (Kokoro synth + Whisper transcribe on CPU), so we set `--timeout-keep-alive 30` to keep the socket alive comfortably below the 60 s HF Spaces proxy timeout.
### 3.5 Authentication
A single shared-secret bearer guards all mutating endpoints. The token is injected as a HF Space **Secret** named `DRIFTCALL_ENV_TOKEN` and read by `app.py` at import time. `/healthz` is **unauthenticated** (HF Space probes have no bearer).
- Token format: 32+ byte URL-safe random (`secrets.token_urlsafe(32)`).
- Token rotation: delete the Space secret and push a new one; all in-flight sessions 401 on the next request.
- Missing secret at Space boot β container exits 1 (fail-fast).
- The token is bundled with the hackathon submission package so judges can exercise `openenv validate` against the live Space.
### 3.6 Determinism
The deployment does not itself introduce nondeterminism. `env.py` owns seed handling; the cache is a pass-through. However, **two CPU-bound sources of wall-clock variance** can change observable latency (`tool_results[i].latency_ms` is wall-clock, not simulated):
1. Kokoro synth time on the first call after cold start can be 2β3Γ steady-state due to JIT / lazy graph compile.
2. Whisper VAD + decode time varies with input length.
Neither perturbs reward math β `latency_ms` is informational, never scored.
### 3.7 Logging
Structured JSON logs to stdout (HF Spaces captures stdout into the Logs tab). One log line per request, fields: `ts`, `level`, `session_id`, `endpoint`, `status`, `latency_ms`, `turn`, `err_code` (nullable). No PII, no audio bytes, no bearer token. The full `DriftCallAction` body is logged at DEBUG only, disabled by default.
---
## 4. Data structures
### 4.1 `SessionEntry`
```python
@dataclass(frozen=True)
class SessionEntry:
env: DriftCallEnvironment # opaque; see docs/modules/env.md
created_at: float # time.monotonic() at /reset
last_touched: float # time.monotonic() at every /step|/state
reset_count: int # incremented on in-place /reset (Β§7, case 1)
```
Frozen per project rule (CLAUDE.md Β§7). `last_touched` updates produce a new `SessionEntry`; the cache dict replaces the old entry.
### 4.2 Dockerfile layout
Multi-stage build. Stage 1 installs wheels into a throwaway image; stage 2 copies only the site-packages dir and the app code. Target final image < 2 GB (DESIGN.md Risk 10).
```
# -------- Stage 1: builder --------
FROM python:3.11-slim AS builder
ENV PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential git libsndfile1 ffmpeg && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./
RUN pip install --prefix=/install -r requirements.txt
# Pre-pull model weights so first /reset is fast
RUN pip install --prefix=/install huggingface_hub
RUN PYTHONPATH=/install/lib/python3.11/site-packages \
python -c "from huggingface_hub import snapshot_download; \
snapshot_download('hexgrad/Kokoro-82M', cache_dir='/weights'); \
snapshot_download('Systran/faster-whisper-small', cache_dir='/weights')"
# -------- Stage 2: runtime --------
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1 \
HF_HOME=/root/.cache/huggingface \
TRANSFORMERS_OFFLINE=1 \
HF_HUB_OFFLINE=1
RUN apt-get update && apt-get install -y --no-install-recommends \
libsndfile1 ffmpeg ca-certificates && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
COPY --from=builder /weights /root/.cache/huggingface
WORKDIR /app
COPY app.py openenv.yaml ./
COPY driftcall/ ./driftcall/
COPY data/ ./data/
EXPOSE 7860
HEALTHCHECK --interval=30s --timeout=5s --start-period=45s \
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:7860/healthz', timeout=4).read()" || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2", "--timeout-keep-alive", "30", "--log-level", "info"]
```
Key decisions:
- `python:3.11-slim` base: smallest stable Python base with glibc (alpine would force musl-incompatible wheels for `faster-whisper` / `ctranslate2`).
- `ffmpeg` installed because Whisper's audio loader shells out to it for anything non-WAV.
- `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` are hard guarantees β if a download is attempted at runtime it raises, never silently fetches and hangs (Β§5, mode M6).
- Weights land under `/root/.cache/huggingface`; that's where both Kokoro and faster-whisper look by default.
### 4.3 `openenv.yaml`
```yaml
# openenv.yaml β consumed by `openenv validate`
# Schema source: https://github.com/meta-pytorch/OpenEnv
schema_version: "1.0"
env:
id: driftcall
version: "0.1.0"
display_name: "DriftCall β Indic Voice Concierge under Schema Drift"
description: >
OpenEnv-compliant RL environment where a voice-first agent must complete
Indic consumer concierge tasks while the vendor APIs undergo mid-episode
schema, policy, T&C, pricing, and auth drift. Five independent reward
components; deterministic seeded drift; Hindi/Tamil/Kannada/Hinglish
briefs via Kokoro TTS + faster-whisper ASR.
license: apache-2.0
tags:
- openenv
- rl
- voice
- indic
- schema-drift
entrypoint:
type: http
base_url: "https://<team>-driftcall-env.hf.space"
endpoints:
reset: "/reset"
step: "/step"
state: "/state"
close: "/close"
health: "/healthz"
auth:
type: bearer
secret_env: DRIFTCALL_ENV_TOKEN
action_space:
ref: "docs/modules/models.md#DriftCallAction"
observation_space:
ref: "docs/modules/models.md#DriftCallObservation"
episode:
max_turns: 16 # worst case, stage-3 curriculum (DESIGN.md Β§4.5)
reset_config:
seed: { type: int, required: false }
curriculum_stage: { type: int, range: [1, 3], required: false }
language_weights: { type: object, required: false }
reward:
shape: scalar
range: [-1.0, 1.0]
components:
ref: "docs/modules/rewards.md"
```
Field names match the OpenEnv v1.0 schema (`entrypoint.type`, `action_space.ref`, etc.). The `ref` pointers resolve to paths inside the repo; `openenv validate` reads them to assert the env is self-describing.
### 4.4 `README.md` (Space card)
```
---
title: DriftCall Env
emoji: π§
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
short_description: OpenEnv β Indic voice concierge under schema drift.
---
```
Below the YAML header: one-paragraph description, `openenv validate` command, auth note, link to GitHub, link to the demo Space, link to the HF Hub model + dataset. The README is also rendered as the root `/` route's fallback (Docker Spaces serve nothing at `/` otherwise).
### 4.5 `requirements.txt`
```
fastapi==0.115.*
uvicorn[standard]==0.32.*
pydantic==2.*
openenv==0.2.* # or whatever is current at build time; version-pin in PR
kokoro==0.9.*
faster-whisper==1.1.*
ctranslate2==4.5.* # pinned to match faster-whisper's wheel
soundfile==0.12.*
numpy<2.0
huggingface_hub==0.26.* # only used at build time (snapshot_download)
```
The version set matches `docs/modules/audio.md` Β§6.1 (upstream consumer) exactly. Pinning is deliberate: the env Space is a reproducibility artifact; judges may rebuild it months from now.
---
## 5. Error modes
Every failure path that can cross the HTTP boundary:
| ID | When | HTTP | Body `error.code` | Recovery |
|---|---|---|---|---|
| M1 | No `Authorization` header, or bad bearer | 401 | `unauthorized` | Client fixes token |
| M2 | No `X-Session-Id` on `/reset`/`/step`/`/state`/`/close` | 400 | `missing_session_id` | Client adds header |
| M3 | `/step`/`/state`/`/close` with unknown session id | 404 | `session_not_found` | Client re-issues `/reset` |
| M4 | Session was in cache but TTL expired between request and handler | 404 | `session_expired` | Client re-issues `/reset` |
| M5 | `/reset` when cache is full and LRU victim cannot be evicted (all 10 slots freshly `last_touched`) | 429 | `max_sessions` | Client backs off and retries; `Retry-After: 30` header set |
| M6 | Kokoro or Whisper model weights missing at startup (image build was broken) | 503 | `model_not_ready` | **Operator** fixes image; client cannot recover |
| M7 | Malformed JSON in request body | 400 | `bad_json` | Client fixes payload |
| M8 | Action fails pydantic / dataclass validation (wrong `ActionType`, missing `tool_name` for `TOOL_CALL`) | 400 | `invalid_action` | Client fixes action |
| M9 | Unhandled exception in `env.step` | 500 | `internal_error` | Logged with request id; client SHOULD NOT retry same action |
| M10 | Disk full writing tmp WAV in audio pipeline | 500 | `io_error` | Very rare on HF Spaces (no writable persistent disk, but /tmp is tmpfs and can fill); operator action |
| M11 | Request body exceeds 1 MiB | 413 | `payload_too_large` | Client trims (should never happen; actions are small) |
| M12 | Concurrent `/reset` on same session id (two requests race) | 409 | `reset_in_progress` | Client serializes resets on its side |
Rules:
- No stack traces in response bodies. `request_id` (uvicorn's ASGI scope id) is included so operators can grep logs.
- All error responses include `Cache-Control: no-store`.
- M5 (`429`) is the **only** code that includes `Retry-After`. Others are terminal for the request.
---
## 6. Dependencies
### 6.1 Upstream (consumed by the deployment artifact)
- **`docs/modules/env.md`** β defines `DriftCallEnvironment.__init__/reset/step/state/close` and the FastAPI route handlers. This doc references but does not duplicate env behavior.
- **`docs/modules/models.md`** β every dataclass crossing the HTTP boundary.
- **`docs/modules/audio.md`** β Kokoro + Whisper integration; tells this doc which weights to pre-pull and what CPU footprint to budget.
- **`docs/modules/rewards.md`** β cited from `openenv.yaml` `reward.components.ref`.
- **DESIGN.md Β§3.3, Β§9.1, Β§9.2, Β§11.1, Β§13, Risk 10** β authoritative.
### 6.2 External runtime dependencies (pinned in Β§4.5)
`fastapi`, `uvicorn[standard]`, `openenv`, `kokoro`, `faster-whisper`, `ctranslate2`, `soundfile`, `pydantic`, `numpy<2.0`, `huggingface_hub` (build-time only).
### 6.3 Hugging Face platform dependencies
- **Space SDK:** `docker` (NOT `gradio`/`static`). The Docker SDK is the only path that lets us bake weights into the image and pin `uvicorn` workers.
- **Space hardware:** `cpu-basic` (free). 2 vCPU, 16 GB RAM, 50 GB ephemeral disk, **no persistent storage**, no GPU.
- **Space secrets:** `DRIFTCALL_ENV_TOKEN` (required).
- **Space env vars:** none (all config is baked in or via `X-Session-Id`).
- **Space region:** default (us-east-1); we do not need region pinning for CPU-basic.
### 6.4 Downstream consumers (who pings this Space)
- `training/eval_baseline.py` and `training/eval_final.py` (DESIGN.md Β§12) β the training-side `HTTPEnvClient`.
- `demo/app_gradio.py` β the demo Space (documented in `docs/modules/deploy_demo_space.md`) uses this env over HTTP for live runs.
- `openenv validate .` β run against the Space URL as part of the hackathon submission gate.
- Hackathon judges β direct HTTP exercise via curl / the `openenv` CLI.
### 6.5 Explicit non-dependencies
- **No GPU** at runtime (load-bearing; DESIGN.md Β§3.3).
- **No LLM weights** on the env Space (Gemma 4 lives on the demo Space or on the trainer's local V100).
- **No training code** (`training/` is NOT copied into the image; see Β§4.2 `COPY` list).
- **No HF Hub network** at runtime (Β§2.3, Β§4.2 offline envs).
---
## 7. Edge cases
Six cases the deployment plan must handle correctly. Each is load-bearing for either the 30-min deploy window or the judge's `openenv validate` run.
### 7.1 Concurrent `/reset` on the same session id
Client A and client B both POST `/reset` with `X-Session-Id: S1` within the same ~100 ms window. The cache uses a per-session asyncio lock; the second request observes the session mid-construction.
**Handling:**
- If the first request is still inside `env.__init__`, the second request gets `409 reset_in_progress`. Client is expected to serialize on its side.
- If the first request has completed, the second request performs an in-place reset: the old env is `.close()`'d, a new env replaces it, `reset_count += 1`. This matches `gym`'s idempotent reset semantics.
- `seed` is honored on the winning reset; the losing (409'd) request's seed is discarded.
### 7.2 `/step` on an evicted session
A client idles for 65 minutes between `/step` calls. The sweep task evicts the session at minute 60. The client's next `/step` returns `404 session_expired`.
**Handling:**
- The client MUST re-issue `/reset` with the same or new seed; it cannot resume mid-episode. This is explicit in the Space README.
- No attempt is made to persist episode state across evictions. The free tier has no writable persistent disk, and replaying a seeded episode is cheap (< 1 s on the CPU basic tier).
- `env.close()` is called on eviction to release the Kokoro audio buffer (saves ~80 MB resident per lingering session).
### 7.3 Cold-start model-weight load race
The Space boots. Uvicorn workers start and each lazily triggers a Kokoro + Whisper load on the first audio-involving `/step`. Whisper's CTranslate2 model load takes ~3β5 s; Kokoro takes ~2 s. A `/step` arriving before load completes can block up to ~8 s.
**Handling:**
- `app.py`'s `lifespan` startup hook performs an **eager** load of both models during container boot. This turns cold-start latency into Space "Startingβ¦" time (which HF surfaces via the spinner) instead of a hung client request.
- If eager load fails (bad weights, disk corruption), the container exits 1 and HF's Space restart loop catches it β operator sees the Space status as "Error" instead of silently hanging.
- The first `/healthz` probe is expected at +30 s (`--start-period=45s` on the HEALTHCHECK gives us a comfortable margin).
### 7.4 Kokoro voice pack missing for a language
Kokoro is loaded at startup but an individual voice pack for `language="kn"` (Kannada) is missing from the snapshot cache due to a partial download.
**Handling:**
- `audio/tts_kokoro.py` (per `docs/modules/audio.md` Β§5) raises `VoicePackMissingError`. The env treats this as a SPEAK-action failure and returns a `tool_results` entry with `status="schema_error"` and `response={"error": "voice_unavailable"}`. The episode continues; reward R4 (format compliance) may drop but R1/R2 are unaffected.
- The image build in Β§4.2 pre-pulls the **full** Kokoro snapshot (`snapshot_download('hexgrad/Kokoro-82M')`), which includes all voice packs. If a voice pack is missing at runtime, the image is broken β operator fixes the Dockerfile and rebuilds.
### 7.5 HTTP timeout mid-`/step`
A `/step` takes 35 s because Whisper is processing a long utterance and the Space is also handling three concurrent episodes. The HF Space edge proxy has a 60 s idle timeout β we stay under it but only barely.
**Handling:**
- `--timeout-keep-alive 30` means uvicorn holds the connection; the HTTP client's TCP timeout should be β₯ 60 s (default `requests.Session` timeout is infinite β safe).
- Inside `env.step`, audio ops have **hard caps** owned by `audio/*.py`: Whisper `max_duration_s=30`, Kokoro synth implicitly bounded by text length. The env cannot produce a `/step` longer than ~40 s at p99.
- If a `/step` does exceed 60 s (e.g., 10 concurrent sessions all doing audio at once on 2 vCPU), the proxy closes the socket and the client sees `ConnectionError`. Client re-issues; the session is still in the cache and the step was effectively a no-op on the server side because responses are atomic-on-return (state is only mutated after all work succeeds β see `docs/modules/env.md` Β§3 transactional step semantics).
### 7.6 Out-of-memory during concurrent audio
Five sessions simultaneously run audio-heavy `/step`s. Each Whisper int8 model takes ~250 MB RAM; Kokoro takes ~350 MB. Naive loading would hit `5 Γ 600 MB = 3 GB` plus Python overhead β well within the 16 GB tier budget, but the Space can still OOM if the image unexpectedly loads fp32 weights.
**Handling:**
- Whisper is forced to `compute_type="int8"` and Kokoro to fp32 (its default is already smallest viable). `audio/*.py` asserts these at load time.
- The models are **singletons** shared across sessions (they are stateless w.r.t. concurrent calls; CTranslate2 releases the GIL during decode). Memory budget is therefore `~600 MB total`, not per-session.
- If an OOM happens, the container is killed by the HF Space OOM-killer and auto-restarts. We lose all in-flight sessions; clients re-`/reset`. The eviction sweep and TTL ensure no permanently-dead sessions pile up.
---
## 8. Examples
### 8.1 End-to-end `/reset` β `/step` flow via curl
```bash
# Assume DRIFTCALL_ENV_TOKEN is set locally for scripting convenience.
TOKEN="${DRIFTCALL_ENV_TOKEN:?export DRIFTCALL_ENV_TOKEN first}"
BASE="https://<team>-driftcall-env.hf.space"
# 1. Reset with seed 42, stage 2 curriculum.
curl -sS -X POST "$BASE/reset" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001" \
-H "Content-Type: application/json" \
-d '{"seed": 42, "config": {"curriculum_stage": 2}}'
# β 200 {"observation": {"turn": 0, "goal": {...}, "last_transcript": "Bhai Friday ko...", ...}}
# 2. Step: call airline.search.
curl -sS -X POST "$BASE/step" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001" \
-H "Content-Type: application/json" \
-d '{
"action": {
"action_type": "tool_call",
"tool_name": "airline.search",
"tool_args": {"origin": "DEL", "destination": "BLR", "date": "2026-04-26"}
}
}'
# β 200 {"observation": {...}, "reward": 0.0, "done": false, "info": {"drift_fired": []}}
# 3. Inspect state (judge-only, optional).
curl -sS "$BASE/state" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001"
# β 200 {"episode_id": "...", "turn": 1, "max_turns": 12, "drift_schedule": [...], ...}
# 4. Close.
curl -sS -X POST "$BASE/close" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Session-Id: demo-001"
# β 200 {"closed": true}
```
### 8.2 Container build + smoke + push
```bash
# Local build (from DRIFTCALL/ repo root)
docker build -t driftcall-env:local .
# Local smoke (bind a dummy secret)
docker run --rm -p 7860:7860 \
-e DRIFTCALL_ENV_TOKEN=dev-local-token \
driftcall-env:local
# In another shell:
curl -sS http://localhost:7860/healthz # β "ok"
curl -sS -X POST http://localhost:7860/reset \
-H "Authorization: Bearer dev-local-token" \
-H "X-Session-Id: smoke" \
-H "Content-Type: application/json" -d '{}'
# β 200 with initial observation
# Push to HF Space via the new `hf` CLI.
# The team-lead brief flags that `huggingface-cli` is deprecated; we migrate
# DriftCall/CLAUDE.md Β§6 row "HF push env" to `hf upload` in a follow-up PR.
hf upload <team>/driftcall-env . --repo-type=space
# (Requires `pip install huggingface_hub>=0.25` and `hf auth login` completed.)
```
### 8.3 `openenv validate` against the live Space
```bash
# Against local container:
openenv validate http://localhost:7860 \
--auth-bearer dev-local-token
# Against deployed Space:
openenv validate https://<team>-driftcall-env.hf.space \
--auth-bearer "$DRIFTCALL_ENV_TOKEN"
# Expected output:
# β openenv.yaml parses, schema v1.0
# β GET /healthz β 200 ok
# β POST /reset β 200, observation matches observation_space.ref
# β POST /step β 200, observation + reward + done
# β GET /state β 200, DriftCallState matches schema
# β POST /close β 200
# β 6 endpoints validated, 0 errors
```
Running this before submission is the DESIGN.md Β§12.2 hour-16 gate. If it fails, we fix before moving to training.
---
## 9. Open questions
1. **OpenEnv schema version pin:** `openenv==0.2.*` in Β§4.5 is a placeholder. Confirm the exact current release on the hackathon kickoff morning (Apr 25) and tighten the pin; `openenv validate` schema fields may have shifted between 0.1 and 0.2.
2. **Per-worker cache divergence:** documented in Β§3.2 as acceptable. Re-evaluate after local load-testing β if even training hits the cross-worker 404 path > 1% of the time, switch to `--workers 1` with a bigger thread pool.
3. **HF Space CPU cold-start time:** the free CPU basic tier can sleep on idle and take 60β120 s to wake. This doc assumes Space is "always-on" because we exercise it during development; if the judge hits a cold Space, the first `/reset` may appear hung. Risk-register coverage owned by `docs/modules/risk_book.md`.
4. **`DRIFTCALL_ENV_TOKEN` rotation during the hackathon:** if the token leaks mid-judging, rotating it 401s the judge mid-run. Do we need a two-token grace period? Likely no (hackathon is 48 h and we trust submission channels), but flag for Person D's risk book.
5. **CLAUDE.md Β§6 `hf upload` migration:** the hackathon briefing flags `huggingface-cli` as deprecated. Update `DRIFTCALL/CLAUDE.md` Β§6 rows ("HF push env", "HF push dataset") to `hf upload ... --repo-type=...` in a separate small PR so this design doc doesn't diverge from the command catalogue. Own: Person D.
6. **Image-size margin vs Β§1.1 Whisper upgrade path:** if `docs/modules/audio.md` Β§1.1's WER bail-out triggers and we swap to `faster-whisper-medium`, final image grows from ~1.2 GB to ~1.8 GB. Still under the 2 GB Risk-10 bound but with less slack. Re-check image size after any audio-weights change.
7. **`/state` access control:** should `/state` require the same bearer as mutating endpoints, or should we expose a narrower "episode summary" for judges without the full vendor-states dump? Current design keeps full state behind the bearer; revisit if leaderboard ops ask for a public read-only pane.
|