Spaces:
Sleeping
Sleeping
File size: 2,008 Bytes
96e178c d7cc083 f51df63 d7cc083 96e178c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ---
title: Data Cleaning OpenEnv
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# 🧹 Data Cleaning OpenEnv
A complete **OpenEnv-compatible reinforcement learning environment** for data cleaning tasks.
**Team:** Soham Sandeep Kamathi, Manas Mahendra Patil, Shivam Jha
**Hackathon:** Meta x PyTorch OpenEnv Hackathon 2026
---
## What This Environment Does
An AI agent receives messy tabular datasets and must apply the correct cleaning operations to earn rewards. Three tasks of increasing difficulty simulate real-world data quality problems.
## Tasks
| Task | Difficulty | What the agent must do | Reward |
|------|-----------|----------------------|--------|
| `remove_nulls` | Easy | Drop rows containing null/missing values | 0.0–1.0 |
| `fix_dates` | Medium | Standardise inconsistent date formats to YYYY-MM-DD | 0.0–1.0 |
| `remove_outliers` | Hard | Remove statistical outliers via IQR method from salary and age columns | 0.0–1.0 |
## Observation Space
Each observation returns a `DatasetObservation` with:
- `dataset_preview` — first 5 rows as string
- `null_count` — number of missing values
- `date_format_errors` — number of non-standard dates
- `outlier_count` — number of outliers detected
- `task_description` — plain-English task description
- `hint` — suggested action
## Action Space
A `CleaningAction` with:
- `task_id` — 1, 2, or 3
- `action_type` — one of `remove_nulls`, `fix_dates`, `remove_outliers`
- `column` — optional column name (e.g. `hire_date`, `salary`, `all`)
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/reset?task_id=1` | Start new episode |
| `POST` | `/step` | Submit cleaning action |
| `GET` | `/state?task_id=1` | Get session metadata |
| `GET` | `/tasks` | List all tasks |
| `GET` | `/health` | Health check |
## Setup & Run
```bash
# Install dependencies
pip install -r requirements.txt
# Run locally
uvicorn app:app --host 0.0.0.0 --port 7860 |