File size: 2,008 Bytes
96e178c
 
 
 
 
 
 
 
 
d7cc083
 
 
 
 
 
 
f51df63
 
d7cc083
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96e178c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
title: Data Cleaning OpenEnv
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---

# 🧹 Data Cleaning OpenEnv

A complete **OpenEnv-compatible reinforcement learning environment** for data cleaning tasks.

**Team:** Soham Sandeep Kamathi, Manas Mahendra Patil, Shivam Jha
**Hackathon:** Meta x PyTorch OpenEnv Hackathon 2026

---

## What This Environment Does

An AI agent receives messy tabular datasets and must apply the correct cleaning operations to earn rewards. Three tasks of increasing difficulty simulate real-world data quality problems.

## Tasks

| Task | Difficulty | What the agent must do | Reward |
|------|-----------|----------------------|--------|
| `remove_nulls` | Easy | Drop rows containing null/missing values | 0.0–1.0 |
| `fix_dates` | Medium | Standardise inconsistent date formats to YYYY-MM-DD | 0.0–1.0 |
| `remove_outliers` | Hard | Remove statistical outliers via IQR method from salary and age columns | 0.0–1.0 |

## Observation Space

Each observation returns a `DatasetObservation` with:
- `dataset_preview` — first 5 rows as string
- `null_count` — number of missing values
- `date_format_errors` — number of non-standard dates
- `outlier_count` — number of outliers detected
- `task_description` — plain-English task description
- `hint` — suggested action

## Action Space

A `CleaningAction` with:
- `task_id` — 1, 2, or 3
- `action_type` — one of `remove_nulls`, `fix_dates`, `remove_outliers`
- `column` — optional column name (e.g. `hire_date`, `salary`, `all`)

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/reset?task_id=1` | Start new episode |
| `POST` | `/step` | Submit cleaning action |
| `GET`  | `/state?task_id=1` | Get session metadata |
| `GET`  | `/tasks` | List all tasks |
| `GET`  | `/health` | Health check |

## Setup & Run

```bash
# Install dependencies
pip install -r requirements.txt
# Run locally
uvicorn app:app --host 0.0.0.0 --port 7860