File size: 3,789 Bytes
7c1cd57
2725475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c1cd57
 
2725475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: DriftCall Env
emoji: πŸ›«
colorFrom: indigo
colorTo: pink
sdk: docker
pinned: true
license: apache-2.0
short_description: Indic voice concierge env under schema drift
tags:
  - openenv
  - rl
  - voice
  - indic
  - schema-drift
  - grpo
---

# DriftCall β€” OpenEnv Env Space

OpenEnv-compliant RL environment exposing **DriftCall**, a voice-first Indic
consumer concierge env under schema / policy / pricing / auth drift.

## REST surface (OpenEnv v1.0)

| Method | Path        | Purpose |
|--------|-------------|---------|
| `GET`  | `/healthz`  | Health probe (unauthenticated). |
| `POST` | `/reset`    | Create or recycle a session. |
| `POST` | `/step`     | Advance one turn. |
| `GET`  | `/state`    | Read `DriftCallState`. |
| `POST` | `/close`    | Evict a session. |

All mutating endpoints require:

```
Authorization: Bearer <DRIFTCALL_ENV_TOKEN>
X-Session-Id:  [A-Za-z0-9_-]{1,64}
```

Error envelope:

```json
{ "error": { "code": "<slug>", "message": "<str>", "request_id": "<asgi-id>" } }
```

`Cache-Control: no-store` on every response. Only `M5 max_sessions` carries
`Retry-After: 30`. No stack traces ever leak.

## Action / observation schemas

- Action:      `cells.step_04_models:DriftCallAction`
- Observation: `cells.step_04_models:DriftCallObservation`

## Reward function

Reward is a scalar in `[-1.0, 1.0]`, computed at episode termination from
five independent components, combined β†’ calibrated β†’ clamped:

| ID | Component | Weight | Implementation |
|---:|---|---:|---|
| R1 | `task_completion`      | 0.40 | `cells.step_08_rewards:task_completion` |
| R2 | `drift_detection`      | 0.20 | `cells.step_08_rewards:drift_detection` |
| R3 | `constraint_adherence` | 0.20 | `cells.step_08_rewards:constraint_adherence` |
| R4 | `format_compliance`    | 0.10 | `cells.step_08_rewards:format_compliance` |
| R5 | `anti_hack_penalty`    | 0.10 | `cells.step_08_rewards:anti_hack_penalty` |

Pipeline:

```python
quality        = combine_quality(R1..R5, weights)
brier          = brier_penalty(confidence, R1)
reward_raw     = quality * (1 - brier)
reward         = apply_uncertain_floor(reward_raw, confidence, quality)  # floor=0.50
final         := clamp(reward, -1.0, 1.0)
```

**Hard rule (CLAUDE.md Β§13):** No LLM judge anywhere in this pipeline.
Every reward bit traces to deterministic, schema-grounded checks against
the episode trace + the (possibly drifted) vendor schemas in `data/`.

Full spec: `docs/modules/rewards.md` in the source repo.

## Episode params (passed in `/reset`)

| Field | Type | Range | Required |
|---|---|---|---|
| `seed` | int | β€” | no |
| `curriculum_stage` | int | 1–3 | no |
| `language_weights` | object | β€” | no |
| `audio_boundary_enabled` | bool | β€” | no |

`max_turns = 16` per episode.

## Build / deploy

```bash
# from repo root
bash deploy/env_space/build.sh           # builds deploy/env_space/build/
bash deploy/env_space/build.sh --push    # builds + uploads to HF_SPACE_REPO

# env vars
HF_SPACE_REPO  default: DGXAI/driftcall-env
HF_TOKEN       required for --push
```

## Sources

This Space is built from `deploy/env_space/build.sh` which rsyncs the
canonical sources at the repo root:

- `app.py`             β€” FastAPI / OpenEnv server (786 LOC)
- `cells/`             β€” importable modules (env, drift injector, rewards, …)
- `data/`              β€” authored fixtures (briefs, drift patterns, schemas)
- `Dockerfile`         β€” multi-stage CPU image; Kokoro + faster-whisper baked in
- `openenv.yaml`       β€” manifest validated by `openenv validate .`
- `requirements.txt`   β€” runtime deps (no training stack)

The model + LoRA adapter are **not** baked into the Space β€” eval calls reach
out to HF Hub for the trained adapter (`DGXAI/gemma-3n-e2b-driftcall-lora`).