env / README.md
sairaj2's picture
Upload folder using huggingface_hub
db12ca6 verified
metadata
title: OpenEnv Data Cleaner
emoji: 🧹
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
base_path: /web

OpenEnv Data Cleaner

An OpenEnv-compliant AI-powered data cleaning environment built on openenv-core.

Features

  • OpenEnv-native: Built using openenv-core base classes
  • Data Cleaning Actions: Drop nulls, fill nulls, remove duplicates, filter rows, drop columns, convert types, validate emails, outlier removal, normalization
  • Task-based Learning: Three difficulty levels (easy, medium, hard)
  • Grading System: Deterministic scoring based on data quality improvements
  • Reward System: Structured rewards with quality, progress, and penalty components
  • Web Interface: Interactive UI for manual data cleaning
  • Docker Ready: Deployable to Hugging Face Spaces

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run the server
python app.py

API Endpoints

  • GET / - Web interface
  • GET /health - Health check
  • POST /reset - Initialize a new task
  • POST /step - Execute a cleaning action
  • POST /submit - Submit solution for grading
  • POST /revert - Revert last action
  • GET /tasks - List available tasks
  • GET /state - Get current environment state
  • GET /dataset - Get dataset information
  • GET /history - Get action history

Tasks

Task ID Difficulty Description
easy_001 Easy Basic cleaning: drop nulls and remove duplicates
medium_001 Medium Intermediate: handle nulls, validate emails, remove outliers
hard_001 Hard Advanced: full pipeline with type conversion and normalization

Deployment

Deploy to Hugging Face Spaces:

openenv push ./env