Spaces:
Sleeping
Sleeping
| title: Data Cleaning OpenEnv | |
| emoji: π§Ή | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| # π§Ή Data Cleaning OpenEnv | |
| A complete **OpenEnv-compatible reinforcement learning environment** for data cleaning tasks. | |
| **Team:** Soham Sandeep Kamathi, Manas Mahendra Patil, Shivam Jha | |
| **Hackathon:** Meta x PyTorch OpenEnv Hackathon 2026 | |
| --- | |
| ## What This Environment Does | |
| An AI agent receives messy tabular datasets and must apply the correct cleaning operations to earn rewards. Three tasks of increasing difficulty simulate real-world data quality problems. | |
| ## Tasks | |
| | Task | Difficulty | What the agent must do | Reward | | |
| |------|-----------|----------------------|--------| | |
| | `remove_nulls` | Easy | Drop rows containing null/missing values | 0.0β1.0 | | |
| | `fix_dates` | Medium | Standardise inconsistent date formats to YYYY-MM-DD | 0.0β1.0 | | |
| | `remove_outliers` | Hard | Remove statistical outliers via IQR method from salary and age columns | 0.0β1.0 | | |
| ## Observation Space | |
| Each observation returns a `DatasetObservation` with: | |
| - `dataset_preview` β first 5 rows as string | |
| - `null_count` β number of missing values | |
| - `date_format_errors` β number of non-standard dates | |
| - `outlier_count` β number of outliers detected | |
| - `task_description` β plain-English task description | |
| - `hint` β suggested action | |
| ## Action Space | |
| A `CleaningAction` with: | |
| - `task_id` β 1, 2, or 3 | |
| - `action_type` β one of `remove_nulls`, `fix_dates`, `remove_outliers` | |
| - `column` β optional column name (e.g. `hire_date`, `salary`, `all`) | |
| ## API Endpoints | |
| | Method | Path | Description | | |
| |--------|------|-------------| | |
| | `POST` | `/reset?task_id=1` | Start new episode | | |
| | `POST` | `/step` | Submit cleaning action | | |
| | `GET` | `/state?task_id=1` | Get session metadata | | |
| | `GET` | `/tasks` | List all tasks | | |
| | `GET` | `/health` | Health check | | |
| ## Setup & Run | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run locally | |
| uvicorn app:app --host 0.0.0.0 --port 7860 |