--- title: DriftCall Env emoji: ๐Ÿ›ซ colorFrom: indigo colorTo: pink sdk: docker pinned: true license: apache-2.0 short_description: Indic voice concierge env under schema drift tags: - openenv - rl - voice - indic - schema-drift - grpo --- # DriftCall โ€” OpenEnv Env Space OpenEnv-compliant RL environment exposing **DriftCall**, a voice-first Indic consumer concierge env under schema / policy / pricing / auth drift. ## REST surface (OpenEnv v1.0) | Method | Path | Purpose | |--------|-------------|---------| | `GET` | `/healthz` | Health probe (unauthenticated). | | `POST` | `/reset` | Create or recycle a session. | | `POST` | `/step` | Advance one turn. | | `GET` | `/state` | Read `DriftCallState`. | | `POST` | `/close` | Evict a session. | All mutating endpoints require: ``` Authorization: Bearer X-Session-Id: [A-Za-z0-9_-]{1,64} ``` Error envelope: ```json { "error": { "code": "", "message": "", "request_id": "" } } ``` `Cache-Control: no-store` on every response. Only `M5 max_sessions` carries `Retry-After: 30`. No stack traces ever leak. ## Action / observation schemas - Action: `cells.step_04_models:DriftCallAction` - Observation: `cells.step_04_models:DriftCallObservation` ## Reward function Reward is a scalar in `[-1.0, 1.0]`, computed at episode termination from five independent components, combined โ†’ calibrated โ†’ clamped: | ID | Component | Weight | Implementation | |---:|---|---:|---| | R1 | `task_completion` | 0.40 | `cells.step_08_rewards:task_completion` | | R2 | `drift_detection` | 0.20 | `cells.step_08_rewards:drift_detection` | | R3 | `constraint_adherence` | 0.20 | `cells.step_08_rewards:constraint_adherence` | | R4 | `format_compliance` | 0.10 | `cells.step_08_rewards:format_compliance` | | R5 | `anti_hack_penalty` | 0.10 | `cells.step_08_rewards:anti_hack_penalty` | Pipeline: ```python quality = combine_quality(R1..R5, weights) brier = brier_penalty(confidence, R1) reward_raw = quality * (1 - brier) reward = apply_uncertain_floor(reward_raw, confidence, quality) # floor=0.50 final := clamp(reward, -1.0, 1.0) ``` **Hard rule (CLAUDE.md ยง13):** No LLM judge anywhere in this pipeline. Every reward bit traces to deterministic, schema-grounded checks against the episode trace + the (possibly drifted) vendor schemas in `data/`. Full spec: `docs/modules/rewards.md` in the source repo. ## Episode params (passed in `/reset`) | Field | Type | Range | Required | |---|---|---|---| | `seed` | int | โ€” | no | | `curriculum_stage` | int | 1โ€“3 | no | | `language_weights` | object | โ€” | no | | `audio_boundary_enabled` | bool | โ€” | no | `max_turns = 16` per episode. ## Build / deploy ```bash # from repo root bash deploy/env_space/build.sh # builds deploy/env_space/build/ bash deploy/env_space/build.sh --push # builds + uploads to HF_SPACE_REPO # env vars HF_SPACE_REPO default: DGXAI/driftcall-env HF_TOKEN required for --push ``` ## Sources This Space is built from `deploy/env_space/build.sh` which rsyncs the canonical sources at the repo root: - `app.py` โ€” FastAPI / OpenEnv server (786 LOC) - `cells/` โ€” importable modules (env, drift injector, rewards, โ€ฆ) - `data/` โ€” authored fixtures (briefs, drift patterns, schemas) - `Dockerfile` โ€” multi-stage CPU image; Kokoro + faster-whisper baked in - `openenv.yaml` โ€” manifest validated by `openenv validate .` - `requirements.txt` โ€” runtime deps (no training stack) The model + LoRA adapter are **not** baked into the Space โ€” eval calls reach out to HF Hub for the trained adapter (`DGXAI/gemma-3n-e2b-driftcall-lora`).