File size: 6,141 Bytes
e2c6f56
 
 
 
 
 
 
 
 
 
 
9c195fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: Data Validation Pipeline
emoji: 🧹
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
tags:
  - openenv
---

# Data Validation Pipeline β€” OpenEnv Environment

An RL environment for training AI agents to clean and validate structured data. Built on the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework for the Meta-PyTorch Hackathon.

## 🌐 Environment Overview

The **Data Validation Pipeline** environment simulates real-world data quality challenges. An agent is presented with a "dirty" dataset containing various errors β€” missing values, type mismatches, format violations, range errors, and duplicates β€” and must systematically identify and fix each issue.

### Motivation

Data quality is a critical challenge in every organization. Poor data leads to incorrect analytics, broken ML models, and costly business decisions. This environment trains RL agents to become automated data stewards, capable of:
- Detecting and classifying data errors
- Applying appropriate fixes
- Optimizing their correction strategy for efficiency

## 🎯 Action Space

The agent can take the following **discrete actions**:

| Action Type | Description | Parameters |
|-------------|-------------|------------|
| `fix_missing` | Fill in a missing/empty value | `target_row`, `target_field`, `new_value` |
| `fix_type` | Correct a data type error (e.g., string β†’ float) | `target_row`, `target_field`, `new_value` |
| `fix_range` | Fix an out-of-range value | `target_row`, `target_field`, `new_value` |
| `fix_format` | Fix a format violation (e.g., date format) | `target_row`, `target_field`, `new_value` |
| `fix_duplicate` | Resolve a duplicate entry | `target_row`, `target_field`, `new_value` |
| `validate` | Check current progress | β€” |
| `skip` | Skip (no action) | β€” |

### Action JSON Schema
```json
{
  "action_type": "fix_missing|fix_type|fix_range|fix_format|fix_duplicate|validate|skip",
  "target_field": "column_name",
  "target_row": 0,
  "new_value": "corrected_value"
}
```

## πŸ‘οΈ Observation Space

Each observation includes:

| Field | Type | Description |
|-------|------|-------------|
| `task_name` | string | Current task identifier |
| `task_description` | string | What needs to be done |
| `dataset` | list[dict] | Current state of the dataset |
| `errors_found` | list[dict] | Remaining errors with details |
| `errors_remaining` | int | Count of unfixed errors |
| `errors_total` | int | Total errors at start |
| `errors_fixed` | int | Successfully fixed errors |
| `step_count` | int | Current step number |
| `max_steps` | int | Step budget |
| `reward` | float | Reward from last action |
| `cumulative_reward` | float | Total reward so far |
| `done` | bool | Episode finished? |
| `last_action_result` | string | Feedback from last action |
| `task_hint` | string | Hint for solving the task |
| `progress_pct` | float | Completion percentage |
| `field_names` | list[str] | Dataset column names |

## πŸ“‹ Tasks

### Task 1: Easy β€” Missing Values (difficulty: ⭐)
- **Dataset**: 5-row employee table
- **Errors**: 3 missing values (empty strings)
- **Max Steps**: 10
- **Strategy**: Find empty fields and fill with correct values
- **Solvable in**: ≀5 steps

### Task 2: Medium β€” Mixed Errors (difficulty: ⭐⭐)
- **Dataset**: 7-row product inventory
- **Errors**: 6 errors (type, format, missing, range, duplicate)
- **Max Steps**: 15
- **Strategy**: Classify error type, match to correct action
- **Requires**: Type awareness + format rules

### Task 3: Hard β€” Multi-Constraint (difficulty: ⭐⭐⭐)
- **Dataset**: 10-row customer orders
- **Errors**: 10 interrelated errors across all types
- **Max Steps**: 20
- **Strategy**: Plan error resolution order, handle dependencies
- **Requires**: Domain knowledge + planning

## πŸ—οΈ Setup & Usage

### Docker (Recommended)
```bash
docker build -t data-validation-env .
docker run -p 8000:8000 data-validation-env
```

### Local Development
```bash
pip install -r requirements.txt
uvicorn server:app --host 0.0.0.0 --port 8000
```

### Test Endpoints
```bash
# Health check
curl http://localhost:8000/health

# Reset with easy task
curl -X POST http://localhost:8000/reset \
  -H "Content-Type: application/json" \
  -d '{"task_name": "easy_missing_values", "seed": 42}'

# Take a step
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "fix_missing", "target_field": "email", "target_row": 1, "new_value": "bob@example.com"}'

# Check state
curl http://localhost:8000/state
```

### Run Inference Agent
```bash
export HF_TOKEN=your_token_here
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4.1-mini
python inference.py
```

## πŸ“Š Baseline Performance

| Task | Model | Avg Reward | Steps Used | Success Rate |
|------|-------|-----------|------------|-------------|
| easy_missing_values | gpt-4.1-mini | 0.85 | 4/10 | 90% |
| medium_mixed_errors | gpt-4.1-mini | 0.70 | 9/15 | 75% |
| hard_multi_constraint | gpt-4.1-mini | 0.55 | 15/20 | 50% |

## πŸ† Reward Design

- **Correct fix**: `+1.0 / total_errors` (proportional to error count)
- **Wrong value**: `-0.05` penalty
- **Wrong action type**: `-0.05` penalty
- **Repeated action**: `-0.1` penalty
- **Skip/Validate**: `0.0` (neutral)

The reward design encourages:
1. **Accuracy**: Correct fixes get proportional positive reward
2. **Efficiency**: Penalties for wrong attempts
3. **Exploration**: No penalty for validation checks
4. **Diversity**: Penalizes repeated identical actions

## πŸ“ Project Structure
```
β”œβ”€β”€ inference.py          ← LLM agent loop
β”œβ”€β”€ openenv.yaml          ← OpenEnv metadata
β”œβ”€β”€ Dockerfile            ← Container config
β”œβ”€β”€ requirements.txt      ← Python dependencies
β”œβ”€β”€ server.py             ← FastAPI app
β”œβ”€β”€ README.md             ← This file
└── env/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ models.py          ← Pydantic models
    β”œβ”€β”€ tasks.py           ← Task registry & graders
    └── environment.py     ← Core environment
```

## πŸ“œ License

BSD-3-Clause