driftcall-env / README.md
saumilyajj's picture
Upload folder using huggingface_hub
2725475 verified
metadata
title: DriftCall Env
emoji: πŸ›«
colorFrom: indigo
colorTo: pink
sdk: docker
pinned: true
license: apache-2.0
short_description: Indic voice concierge env under schema drift
tags:
  - openenv
  - rl
  - voice
  - indic
  - schema-drift
  - grpo

DriftCall β€” OpenEnv Env Space

OpenEnv-compliant RL environment exposing DriftCall, a voice-first Indic consumer concierge env under schema / policy / pricing / auth drift.

REST surface (OpenEnv v1.0)

Method Path Purpose
GET /healthz Health probe (unauthenticated).
POST /reset Create or recycle a session.
POST /step Advance one turn.
GET /state Read DriftCallState.
POST /close Evict a session.

All mutating endpoints require:

Authorization: Bearer <DRIFTCALL_ENV_TOKEN>
X-Session-Id:  [A-Za-z0-9_-]{1,64}

Error envelope:

{ "error": { "code": "<slug>", "message": "<str>", "request_id": "<asgi-id>" } }

Cache-Control: no-store on every response. Only M5 max_sessions carries Retry-After: 30. No stack traces ever leak.

Action / observation schemas

  • Action: cells.step_04_models:DriftCallAction
  • Observation: cells.step_04_models:DriftCallObservation

Reward function

Reward is a scalar in [-1.0, 1.0], computed at episode termination from five independent components, combined β†’ calibrated β†’ clamped:

ID Component Weight Implementation
R1 task_completion 0.40 cells.step_08_rewards:task_completion
R2 drift_detection 0.20 cells.step_08_rewards:drift_detection
R3 constraint_adherence 0.20 cells.step_08_rewards:constraint_adherence
R4 format_compliance 0.10 cells.step_08_rewards:format_compliance
R5 anti_hack_penalty 0.10 cells.step_08_rewards:anti_hack_penalty

Pipeline:

quality        = combine_quality(R1..R5, weights)
brier          = brier_penalty(confidence, R1)
reward_raw     = quality * (1 - brier)
reward         = apply_uncertain_floor(reward_raw, confidence, quality)  # floor=0.50
final         := clamp(reward, -1.0, 1.0)

Hard rule (CLAUDE.md Β§13): No LLM judge anywhere in this pipeline. Every reward bit traces to deterministic, schema-grounded checks against the episode trace + the (possibly drifted) vendor schemas in data/.

Full spec: docs/modules/rewards.md in the source repo.

Episode params (passed in /reset)

Field Type Range Required
seed int β€” no
curriculum_stage int 1–3 no
language_weights object β€” no
audio_boundary_enabled bool β€” no

max_turns = 16 per episode.

Build / deploy

# from repo root
bash deploy/env_space/build.sh           # builds deploy/env_space/build/
bash deploy/env_space/build.sh --push    # builds + uploads to HF_SPACE_REPO

# env vars
HF_SPACE_REPO  default: DGXAI/driftcall-env
HF_TOKEN       required for --push

Sources

This Space is built from deploy/env_space/build.sh which rsyncs the canonical sources at the repo root:

  • app.py β€” FastAPI / OpenEnv server (786 LOC)
  • cells/ β€” importable modules (env, drift injector, rewards, …)
  • data/ β€” authored fixtures (briefs, drift patterns, schemas)
  • Dockerfile β€” multi-stage CPU image; Kokoro + faster-whisper baked in
  • openenv.yaml β€” manifest validated by openenv validate .
  • requirements.txt β€” runtime deps (no training stack)

The model + LoRA adapter are not baked into the Space β€” eval calls reach out to HF Hub for the trained adapter (DGXAI/gemma-3n-e2b-driftcall-lora).