File size: 1,760 Bytes
a3b9b4b
2cc94a4
 
 
 
a3b9b4b
 
db12ca6
a3b9b4b
 
2cc94a4
56ddfd4
2cc94a4
56ddfd4
2cc94a4
56ddfd4
2cc94a4
 
 
 
 
 
 
56ddfd4
2cc94a4
56ddfd4
 
2cc94a4
 
56ddfd4
2cc94a4
 
56ddfd4
 
2cc94a4
56ddfd4
2cc94a4
 
 
 
 
 
 
 
 
 
56ddfd4
2cc94a4
56ddfd4
2cc94a4
 
 
 
 
56ddfd4
2cc94a4
56ddfd4
2cc94a4
56ddfd4
 
2cc94a4
56ddfd4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: OpenEnv Data Cleaner
emoji: 🧹
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
base_path: /web
---

# OpenEnv Data Cleaner

An OpenEnv-compliant AI-powered data cleaning environment built on `openenv-core`.

## Features

- **OpenEnv-native**: Built using `openenv-core` base classes
- **Data Cleaning Actions**: Drop nulls, fill nulls, remove duplicates, filter rows, drop columns, convert types, validate emails, outlier removal, normalization
- **Task-based Learning**: Three difficulty levels (easy, medium, hard)
- **Grading System**: Deterministic scoring based on data quality improvements
- **Reward System**: Structured rewards with quality, progress, and penalty components
- **Web Interface**: Interactive UI for manual data cleaning
- **Docker Ready**: Deployable to Hugging Face Spaces

## Quick Start

```bash
# Install dependencies
pip install -r requirements.txt

# Run the server
python app.py
```

## API Endpoints

- `GET /` - Web interface
- `GET /health` - Health check
- `POST /reset` - Initialize a new task
- `POST /step` - Execute a cleaning action
- `POST /submit` - Submit solution for grading
- `POST /revert` - Revert last action
- `GET /tasks` - List available tasks
- `GET /state` - Get current environment state
- `GET /dataset` - Get dataset information
- `GET /history` - Get action history

## Tasks

| Task ID | Difficulty | Description |
|---------|------------|-------------|
| easy_001 | Easy | Basic cleaning: drop nulls and remove duplicates |
| medium_001 | Medium | Intermediate: handle nulls, validate emails, remove outliers |
| hard_001 | Hard | Advanced: full pipeline with type conversion and normalization |

## Deployment

Deploy to Hugging Face Spaces:

```bash
openenv push ./env
```