File size: 3,866 Bytes
ebf4715
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# ConfigDebuggerEnv

ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.

## Why this environment

Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.

## OpenEnv API

The server exposes the standard lifecycle:

- POST /reset
- POST /step
- GET /state

### Typed models

- Action model: ConfigAction
- Observation model: ConfigObservation
- Reward model: ConfigReward
- State model: EnvState

Models are defined in server/models.py and validated with Pydantic.

## Action space

ConfigAction fields:

- operation: edit | add | delete
- path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
- value: JSON-serializable payload for edit/add

## Observation space

ConfigObservation fields:

- task_id

- task_description
- current_config (YAML string)

- syntax_valid
- validation_errors

- schema_score (0.0 to 1.0)
- logic_score (0.0 to 1.0)

- overall_score (0.0 to 1.0)
- step_count

- max_steps

## Tasks and graders

Three deterministic tasks are included:

1. easy_docker (easy)

2. medium_k8s (medium)
3. hard_ml_config (hard)

Each task has:

- A broken starting configuration
- A target configuration
- Weighted required paths for schema grading
- Deterministic logic checks

Grading always returns normalized values in [0.0, 1.0].

## Reward design

Reward has dense progression with penalties:

- Base reward is current overall score
- Positive delta bonus on improvement
- Regression penalty on negative delta
- Loop penalty for repeated states
- Penalty for invalid actions
- Penalty for destructive top-level deletes
- Small completion bonus when solved

This creates meaningful signals across the full episode, not only at termination.

## Project structure

- openenv.yaml
- Dockerfile
- requirements.txt
- inference.py
- server/
  - data.py
  - env.py
  - main.py
  - models.py

## Local setup

1. Install dependencies

```bash

pip install -r requirements.txt

```

2. Run server

```bash

python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload

```

3. Quick API check

```bash

curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"

```

## Baseline inference

Heuristic baseline (fully reproducible):

```bash

python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42

```

OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):

```bash

set OPENAI_API_KEY=your_key_here

python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42

```

The script evaluates all three tasks and prints per-task and average scores.

## Docker

Build:

```bash

docker build -t configdebugger-env .

```

Run:

```bash

docker run -p 7860:7860 configdebugger-env

```

## Hugging Face Spaces notes

- Use Docker SDK
- Ensure Space port maps to 7860
- Add tag: openenv
- Include environment variables for external evaluation if needed

## Validation checklist

- Typed Observation/Action/Reward models: yes
- reset/step/state implemented: yes
- 3 tasks with deterministic graders: yes
- Reward in range [0.0, 1.0] with partial progress: yes
- Baseline inference script with OpenAI client: yes
- Dockerfile included: yes
- OpenEnv metadata file included: yes