Spaces:
Runtime error
title: DriftCall Env
emoji: π«
colorFrom: indigo
colorTo: pink
sdk: docker
pinned: true
license: apache-2.0
short_description: Indic voice concierge env under schema drift
tags:
- openenv
- rl
- voice
- indic
- schema-drift
- grpo
DriftCall β OpenEnv Env Space
OpenEnv-compliant RL environment exposing DriftCall, a voice-first Indic consumer concierge env under schema / policy / pricing / auth drift.
REST surface (OpenEnv v1.0)
| Method | Path | Purpose |
|---|---|---|
GET |
/healthz |
Health probe (unauthenticated). |
POST |
/reset |
Create or recycle a session. |
POST |
/step |
Advance one turn. |
GET |
/state |
Read DriftCallState. |
POST |
/close |
Evict a session. |
All mutating endpoints require:
Authorization: Bearer <DRIFTCALL_ENV_TOKEN>
X-Session-Id: [A-Za-z0-9_-]{1,64}
Error envelope:
{ "error": { "code": "<slug>", "message": "<str>", "request_id": "<asgi-id>" } }
Cache-Control: no-store on every response. Only M5 max_sessions carries
Retry-After: 30. No stack traces ever leak.
Action / observation schemas
- Action:
cells.step_04_models:DriftCallAction - Observation:
cells.step_04_models:DriftCallObservation
Reward function
Reward is a scalar in [-1.0, 1.0], computed at episode termination from
five independent components, combined β calibrated β clamped:
| ID | Component | Weight | Implementation |
|---|---|---|---|
| R1 | task_completion |
0.40 | cells.step_08_rewards:task_completion |
| R2 | drift_detection |
0.20 | cells.step_08_rewards:drift_detection |
| R3 | constraint_adherence |
0.20 | cells.step_08_rewards:constraint_adherence |
| R4 | format_compliance |
0.10 | cells.step_08_rewards:format_compliance |
| R5 | anti_hack_penalty |
0.10 | cells.step_08_rewards:anti_hack_penalty |
Pipeline:
quality = combine_quality(R1..R5, weights)
brier = brier_penalty(confidence, R1)
reward_raw = quality * (1 - brier)
reward = apply_uncertain_floor(reward_raw, confidence, quality) # floor=0.50
final := clamp(reward, -1.0, 1.0)
Hard rule (CLAUDE.md Β§13): No LLM judge anywhere in this pipeline.
Every reward bit traces to deterministic, schema-grounded checks against
the episode trace + the (possibly drifted) vendor schemas in data/.
Full spec: docs/modules/rewards.md in the source repo.
Episode params (passed in /reset)
| Field | Type | Range | Required |
|---|---|---|---|
seed |
int | β | no |
curriculum_stage |
int | 1β3 | no |
language_weights |
object | β | no |
audio_boundary_enabled |
bool | β | no |
max_turns = 16 per episode.
Build / deploy
# from repo root
bash deploy/env_space/build.sh # builds deploy/env_space/build/
bash deploy/env_space/build.sh --push # builds + uploads to HF_SPACE_REPO
# env vars
HF_SPACE_REPO default: DGXAI/driftcall-env
HF_TOKEN required for --push
Sources
This Space is built from deploy/env_space/build.sh which rsyncs the
canonical sources at the repo root:
app.pyβ FastAPI / OpenEnv server (786 LOC)cells/β importable modules (env, drift injector, rewards, β¦)data/β authored fixtures (briefs, drift patterns, schemas)Dockerfileβ multi-stage CPU image; Kokoro + faster-whisper baked inopenenv.yamlβ manifest validated byopenenv validate .requirements.txtβ runtime deps (no training stack)
The model + LoRA adapter are not baked into the Space β eval calls reach
out to HF Hub for the trained adapter (DGXAI/gemma-3n-e2b-driftcall-lora).