Spaces:
Sleeping
Sleeping
metadata
title: Data Cleaning OpenEnv
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
🧹 Data Cleaning OpenEnv
A complete OpenEnv-compatible reinforcement learning environment for data cleaning tasks.
Team: Soham Sandeep Kamathi, Manas Mahendra Patil, Shivam Jha Hackathon: Meta x PyTorch OpenEnv Hackathon 2026
What This Environment Does
An AI agent receives messy tabular datasets and must apply the correct cleaning operations to earn rewards. Three tasks of increasing difficulty simulate real-world data quality problems.
Tasks
| Task | Difficulty | What the agent must do | Reward |
|---|---|---|---|
remove_nulls |
Easy | Drop rows containing null/missing values | 0.0–1.0 |
fix_dates |
Medium | Standardise inconsistent date formats to YYYY-MM-DD | 0.0–1.0 |
remove_outliers |
Hard | Remove statistical outliers via IQR method from salary and age columns | 0.0–1.0 |
Observation Space
Each observation returns a DatasetObservation with:
dataset_preview— first 5 rows as stringnull_count— number of missing valuesdate_format_errors— number of non-standard datesoutlier_count— number of outliers detectedtask_description— plain-English task descriptionhint— suggested action
Action Space
A CleaningAction with:
task_id— 1, 2, or 3action_type— one ofremove_nulls,fix_dates,remove_outlierscolumn— optional column name (e.g.hire_date,salary,all)
API Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/reset?task_id=1 |
Start new episode |
POST |
/step |
Submit cleaning action |
GET |
/state?task_id=1 |
Get session metadata |
GET |
/tasks |
List all tasks |
GET |
/health |
Health check |
Setup & Run
# Install dependencies
pip install -r requirements.txt
# Run locally
uvicorn app:app --host 0.0.0.0 --port 7860