Spaces:

sairaj2
/

AutoClean-Ai

Sleeping

App Files Files Community

sairaj2 commited on 4 days ago

Commit

8d340f1

verified ·

1 Parent(s): 21b61ef

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +80 -339
__init__.py +2 -2
client.py +5 -83
openenv.yaml +1 -1
pyproject.toml +18 -70
server/app.py +11 -11

README.md CHANGED Viewed

@@ -1,55 +1,48 @@
 ---
-title: HallucinationGuard-Env
-emoji: 🛡️
-colorFrom: blue
-colorTo: green
 sdk: docker
 app_port: 7860
 pinned: true
 tags:
   - openenv
   - reinforcement-learning
-  - hallucination-detection
-  - grounded-generation
-  - question-answering
-  - fact-checking
   - llm-training
-  - llm-evaluation
   - benchmark
   - ai-safety
 base_path: /web
 ---
-# 🛡️ HallucinationGuard-Env
-> **The production-grade OpenEnv RL environment for training and evaluating LLMs on hallucination avoidance.**
-**Server Version:** v4.2.0
 [![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
 [![Python](https://img.shields.io/badge/Python-3.10%20%7C%203.11%20%7C%203.12-blue)](#-quick-start)
 [![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
-[![Dataset](https://img.shields.io/badge/Dataset-1M%2B_examples-orange)](#-datasets)
 ---
-## 💡 The Inspiration
-During research for a Hackathon, an AI model confidently hallucinated a **"golden ticket backdoor"** — claiming that Ideathon winners could skip directly to the Grand Finale. This information existed nowhere in the official sources. The AI stated it with high confidence and even fabricated a supporting quote.
-That moment made one thing clear: hallucination isn't just an academic problem. It causes real confusion in high-stakes situations.
-**HallucinationGuard-Env** was built to fix that — training AI models to say *"I don't know"* when they don't, cite real sources when they do, and never fabricate with confidence.
----
 ## 🚀 Quick Start
 ### Run Locally
 ```bash
-git clone https://huggingface.co/spaces/SamSankar/hallucination-guard-env
-cd hallucination-guard-env
 pip install -e .
 uvicorn server.app:app --host 0.0.0.0 --port 7860
 curl http://localhost:7860/health
@@ -60,50 +53,38 @@ curl http://localhost:7860/health
 ```python
 import requests
-BASE = "https://samsankar-hallucination-guard-env.hf.space"
 # 1. Start episode
 obs = requests.post(f"{BASE}/reset", json={"difficulty": "beginner"}).json()
-print(obs["question"], obs["context"])
-# 2. Answer from context only
 result = requests.post(f"{BASE}/step", json={
-    "answer": "your answer from context",
-    "confidence": 0.85,
-    "source_quote": "verbatim quote from context",
     "session_id": obs.get("session_id"),
 }).json()
-print(f"Reward: {result['reward']}, Hallucinated: {result['is_hallucination']}")
 # 3. Score the episode
 grade = requests.post(f"{BASE}/grader", json={
-    "task_id": "task_1_factual_grounding",
     "step_rewards": [result['reward']],
-    "step_infos": [{"correctness": result.get('grounding_score', 0), "is_hallucination": result.get('is_hallucination', False)}],
 }).json()
 print(f"Episode score: {grade['score']}")
 ```
-### Run Baseline
-```bash
-# Heuristic baseline (no API key needed)
-python inference.py --heuristic --env-url http://localhost:7860
-# With an LLM (Groq, Ollama, OpenAI-compatible)
-export API_BASE_URL=https://api.groq.com/openai/v1
-export MODEL_NAME=llama-3.3-70b-versatile
-export HF_TOKEN=your_key_here
-python inference.py --env-url http://localhost:7860 --episodes 3 --steps 5
-```
 ### Validate OpenEnv Compliance
 ```bash
 # Local structure check
 openenv validate
-# Runtime check against live server (must pass all 6 criteria)
 openenv validate --url http://localhost:7860 --verbose
 ```
@@ -111,32 +92,29 @@ openenv validate --url http://localhost:7860 --verbose
 ## 🎯 Tasks
-3 named tasks in difficulty order:
-| # | task_id | Difficulty | Primary Datasets | Frontier LLM Score |
-|---|---------|-----------|-----------------|-------------------|
-| 1 | `task_1_factual_grounding` | 🟢 Beginner | SQuAD, BoolQ, ARC, OpenBookQA | 0.70–0.85 |
-| 2 | `task_2_multi_hop_synthesis` | 🟡 Intermediate | HotpotQA, CoQA, NQ-Open, MS-MARCO | 0.55–0.70 |
-| 3 | `task_3_adversarial_resistance` | 🔴 Advanced | HaluEval, TruthfulQA, FEVER, AdversarialQA | 0.40–0.60 |
 ---
-## 🎮 How The Environment Works
-The agent receives a **question** and a **source document**. It must answer using only what the document says, provide a direct quote supporting its answer, and state how confident it is.
 ### Action Space
-Every `POST /step` call accepts this JSON body (only `answer` is required):
 ```json
 {
-    "answer":           "string — derived ONLY from the provided context",
-    "confidence":       0.5,
-    "source_quote":     "string — verbatim phrase from context supporting the answer",
-    "reasoning":        "string — optional chain-of-thought",
-    "uncertainty_flags": [],
-    "session_id":       "string — from /reset response, for session isolation"
 }
 ```
@@ -144,136 +122,56 @@ Every `POST /step` call accepts this JSON body (only `answer` is required):
 ```json
 {
-    "question":            "The question to answer",
-    "context":             "Source document to answer from",
-    "reward":              0.75,
-    "feedback":            "Detailed human-readable feedback",
-    "is_hallucination":    false,
-    "hallucination_type":  "none",
-    "hallucination_severity": "NONE",
-    "grounding_score":     0.85,
-    "done":                false,
-    "session_id":          "ses_a1b2c3d4"
 }
 ```
-### Episode Flow
-```
-POST /reset  →  Sample question + context from dataset (curriculum-aware)
-                 Return observation with session_id
-POST /step   →  Grade answer across 9 components
-                 Detect hallucination type and severity
-                 Compute reward with ROUGE + BERTScore + AlignScore
-                 Adapt difficulty based on performance
-                 Return observation with reward + feedback
-POST /grader →  Aggregate per-step rewards into 0.0–1.0 task score
-```
 ---
-## 📊 Reward System (9 Components)
 | Component | Weight | Description |
 |-----------|--------|-------------|
-| Factual correctness | 0.35 | Exact/fuzzy match + semantic similarity to ground truth |
-| Source grounding | 0.20 | Verifies answer is supported by context (reduced for wrong answers) |
-| Citation accuracy | 0.10 | `source_quote` found verbatim in context |
-| Confidence calibration | 0.10 | ECE between stated confidence and correctness (overconfidence penalized more) |
-| Semantic consistency | 0.10 | NLI entailment score (DeBERTa-v3 CrossEncoder) |
-| Hallucination penalty | 0.10 | Penalises detected hallucinations by type and severity |
-| ROUGE (1/2/L) | 0.02 | Surface-form overlap with reference answer |
-| BERTScore | 0.02 | Token-level semantic similarity (roberta-base) |
-| AlignScore | 0.01 | Faithfulness to context (RoBERTa, ACL 2023; optional — falls back to 0.5) |
-Difficulty multiplier: `beginner × 0.9`, `intermediate × 1.0`, `advanced × 1.1`, `expert × 1.2`
-**Key behavior:**
-- Wrong answers capped at ~0.4 reward regardless of grounding
-- Grounding contribution reduced for incorrect answers
-- Consistency bonus for maintaining performance above 0.7
 ---
-## 🔬 Hallucination Detection
-### 8 Types Classified
-| Type | What It Catches |
-|---|---|
-| `FABRICATED_FACT` | Information stated that is not in the source |
-| `FALSE_CITATION` | `source_quote` that does not exist in the document |
-| `OVERCONFIDENT_WRONG` | High confidence on an incorrect answer |
-| `CONTEXT_DRIFT` | Answer gradually drifts away from source |
-| `NUMERICAL_FABRICATION` | Made-up statistics or numbers |
-| `ENTITY_CONFUSION` | Wrong names, organisations, or places |
-| `TEMPORAL_ERROR` | Incorrect dates or timelines |
-| `RELATIONSHIP_ERROR` | Incorrect relationships between entities |
-### "I Don't Know" Refusal Handling
-The grader detects when a model appropriately refuses to answer unanswerable questions:
-| Scenario | Reward | Behavior |
-|----------|--------|----------|
-| Proper refusal on unanswerable | 0.65–0.80 | Rewarded for honesty |
-| Refusal with low confidence | 0.50 | Partial credit |
-| Underconfident refusal (answer exists) | 0.30 | Penalized for not trying |
-**Detected refusal phrases:** "I cannot answer", "not in the context", "I don't know", "cannot determine", "insufficient information", etc.
-### 5 Severity Levels
-| Level | Score | Meaning |
-|---|---|---|
-| NONE | 0.0 | Fully grounded answer |
-| MINOR | 0.1–0.3 | Slight deviation from source |
-| MODERATE | 0.3–0.5 | Noticeable unsupported claims |
-| SEVERE | 0.5–0.7 | Significantly fabricated content |
-| CRITICAL | 0.7+ | Answer largely invented |
 ---
-## 📚 Datasets
-**1,090,163 total examples** across 38 real-world QA datasets — cached permanently, instant boot:
-| Source | Examples | Domain |
-|---|---|---|
-| SQuAD + SQuAD-v2 | 100,000 | Reading comprehension |
-| TriviaQA | 50,000 | Open-domain factual QA |
-| HotpotQA | 50,000 | Multi-hop reasoning |
-| DROP | 50,000 | Numerical reasoning |
-| RACE | 50,000 | Exam reading comprehension |
-| NewsQA | 50,000 | News article QA |
-| FaithDial | 49,649 | Faithful dialogue |
-| FEVER | 49,947 | Fact verification |
-| NQ Open | 50,000 | Natural questions |
-| AQUA-RAT | 97,467 | Math word problems |
-| XSum | 49,994 | Extreme summarisation |
-| CNN/DailyMail | 50,000 | News summarisation |
-| HellaSwag | 39,905 | Commonsense completion |
-| AdversarialQA | 30,000 | Adversarial reading comprehension |
-| WinoGrande | 40,398 | Commonsense inference |
-| CommonsenseQA | 9,741 | Commonsense reasoning |
-| BoolQ | 9,427 | Boolean yes/no QA |
-| CoQA | 7,199 | Conversational QA |
-| MedQA | 10,000 | Medical licensing exam |
-| MedMCQA | 20,000 | Medical entrance exam |
-| SciTail | 23,596 | Science entailment |
-| HaluEval | 10,000 | Hallucination evaluation |
-| TruthfulQA | 817 | Factuality benchmark |
-| SciQ | 11,679 | Science QA |
-| Arc | 2,590 | Science exam |
-| OpenBookQA | 4,957 | Common knowledge |
-| AG News | 50,000 | News classification |
-| Climate-FEVER | 881 | Climate fact verification |
-| MS MARCO | 30,568 | Web search QA |
-| + 10 more | ... | Medical, math, dialogue, summarisation |
-Datasets load from `SamSankar/hallucination-guard-cache` on HF Hub. Core 5 datasets load synchronously at startup (~86K examples); remaining 33 load in a background thread.
 ---
@@ -289,145 +187,15 @@ Datasets load from `SamSankar/hallucination-guard-cache` on HF Hub. Core 5 datas
 | `GET` | `/metadata` | Environment name, version, description |
 | `GET` | `/schema` | Action, observation, and state JSON schemas |
 | `GET` | `/health` | Health check |
-| `POST` | `/mcp` | MCP JSON-RPC endpoint |
 ### Environment
 | Method | Endpoint | Description |
 |--------|----------|-------------|
-| `POST` | `/reset` | Start new episode (returns `session_id`) |
-| `POST` | `/step` | Submit answer (accepts `session_id` for isolation) |
 | `GET` | `/state` | Get current episode state |
-### Evaluation & Leaderboard
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `POST` | `/batch/evaluate` | Evaluate multiple Q&A pairs |
-| `GET` | `/leaderboard` | View ranked model performance |
-| `POST` | `/leaderboard/submit` | Submit evaluation results |
-| `GET` | `/datasets` | Dataset statistics |
----
-## 📋 Baseline Scores
-All benchmarks: **3 episodes × 5 steps, seed=42**, against deployed HF Space.
-### Full Benchmark Results
-| # | Model | Provider | Overall | Task 1 | Task 2 | Task 3 | Time |
-|---|-------|----------|---------|--------|--------|--------|------|
-| 1 | Nemotron-3-Super 120B | OpenRouter | **0.553** | 0.599 | 0.535 | 0.524 | 10m 57s |
-| 2 | Llama 3.3 70B | Groq | **0.514** | 0.542 | 0.449 | 0.552 | 1m 12s |
-| 3 | Qwen3 32B | Groq | **0.513** | 0.564 | 0.453 | 0.522 | 4m 41s |
-| 4 | GPT-OSS 20B | Groq | **0.498** | 0.552 | 0.406 | 0.537 | 3m 53s |
-| 5 | Qwen2.5 72B Instruct | HF Router | **0.480** | 0.594 | 0.431 | 0.417 | 3m 05s |
-| 6 | GLM-4.5 Air | OpenRouter | **0.350** | 0.436 | 0.311 | 0.303 | 14m 01s |
-| 7 | Heuristic (no LLM) | — | **0.131** | 0.162 | 0.144 | 0.087 | 30s |
-### Heuristic Baseline (no LLM required)
-The heuristic baseline is a deterministic agent that extracts the first sentence of the context as the answer. It establishes a performance floor — any real LLM should beat this.
-```bash
-python inference.py --heuristic --env-url http://localhost:7860 --episodes 3 --steps 5 --seed 42
-```
-### Run LLM Baselines
-```bash
-# Groq (fast inference)
-export API_BASE_URL=https://api.groq.com/openai/v1
-export MODEL_NAME=llama-3.3-70b-versatile
-export HF_TOKEN=gsk_your_key
-python inference.py --env-url https://samsankar-hallucination-guard-env.hf.space --episodes 3 --steps 5
-# HF Router (open models)
-export API_BASE_URL=https://router.huggingface.co/v1
-export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
-export HF_TOKEN=hf_your_token
-python inference.py --env-url https://samsankar-hallucination-guard-env.hf.space --episodes 3 --steps 5
-# OpenRouter (free-tier models)
-export API_BASE_URL=https://openrouter.ai/api/v1
-export MODEL_NAME=nvidia/nemotron-3-super-120b-a12b:free
-export HF_TOKEN=sk-or-v1-your_key
-python inference.py --env-url https://samsankar-hallucination-guard-env.hf.space --episodes 3 --steps 5
-```
----
-## 🌐 Deployment
-### HuggingFace Spaces
-The environment uses a **two-phase loading strategy**:
-1. **Core datasets** (~86K examples) load synchronously at startup
-2. **Extended datasets** (~1M+ examples) load in background after server is healthy
-ML models (sentence-transformers, NLI CrossEncoder, ROUGE, BERTScore) preload during Docker build to avoid cold-start delays.
-### Configuration
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `USE_LARGE_NLI` | Use large NLI model (more accurate, more memory) | `false` |
-| `HF_HOME` | HuggingFace cache directory | `/tmp/hf_cache` |
----
-## 🔌 Integration Examples
-### OpenAI SDK
-```python
-# See examples/openai_integration.py for full implementation
-from openai import OpenAI
-import requests
-client = OpenAI()
-ENV_URL = "https://samsankar-hallucination-guard-env.hf.space"
-# 1. Reset
-obs = requests.post(f"{ENV_URL}/reset", json={"difficulty": "beginner"}).json()
-# 2. Get answer from GPT-4
-response = client.chat.completions.create(
-    model="gpt-4o-mini",
-    messages=[{"role": "user", "content": f"Answer ONLY from context.\n\nContext: {obs['context']}\n\nQuestion: {obs['question']}"}],
-    temperature=0.1
-)
-# 3. Submit to environment
-result = requests.post(f"{ENV_URL}/step", json={
-    "answer": response.choices[0].message.content,
-    "confidence": 0.8,
-    "session_id": obs.get("session_id"),
-}).json()
-print(f"Reward: {result['reward']}")
-```
-### Groq (Cloud — Best Performance)
-```bash
-export API_BASE_URL=https://api.groq.com/openai/v1
-export MODEL_NAME=llama-3.3-70b-versatile
-export HF_TOKEN=gsk_your_key_here
-python inference.py --env-url http://localhost:7860 --episodes 3 --steps 5 --seed 42
-```
-### Ollama (Local)
-```bash
-ollama pull qwen2.5:7b
-export API_BASE_URL=http://localhost:11434/v1
-export MODEL_NAME=qwen2.5:7b
-export HF_TOKEN=ollama  # Any non-empty value triggers LLM mode
-python inference.py --env-url http://localhost:7860 --episodes 3 --steps 5 --seed 42
-```
 ---
 ## 💻 Development
@@ -441,9 +209,6 @@ pytest tests/ -v
 # Validate OpenEnv compliance
 openenv validate --url http://localhost:7860 --verbose
-# Lint
-ruff check . --ignore E501,F401,F403
 ```
 ---
@@ -452,34 +217,10 @@ ruff check . --ignore E501,F401,F403
 | | |
 |---|---|
-| 🤗 HuggingFace Space | https://huggingface.co/spaces/SamSankar/hallucination-guard-env |
-| 📖 Interactive API Docs | https://samsankar-hallucination-guard-env.hf.space/redoc |
 | 🔧 OpenEnv Framework | https://github.com/meta-pytorch/OpenEnv |
 ---
-## Changelog
-### v4.2.0 (2026-04)
-- **Fixed** BERTScore crash on HF Spaces — switched from `microsoft/deberta-v3-base` to `roberta-base` (fast tokenizer incompatibility with transformers>=4.57)
-- **Fixed** OpenEnv validation failures — `/metadata` now returns `description`, `/schema` now returns `state` schema
-- **Fixed** Thread safety — `/reset` and `/step` use per-session environments with shared dataset loader
-- **Fixed** Numerical fabrication detection — numbers now extracted from original text before normalization replaces them with `NUM`
-- **Fixed** `inference.py` step_infos mapping — `correctness` and `grounding` no longer conflated
-- **Fixed** `/baseline` endpoint — proper `step_infos` with separate correctness/grounding/calibration keys
-- **Fixed** Leaderboard file I/O — proper `with` statements and UTF-8 encoding
-- **Fixed** `client.py` default port — changed from 8000 to 7860
-- **Fixed** Version mismatch — `openenv.yaml` updated to v4.2.0
-- **Added** Test suite — 42 tests across `test_grader.py` and `test_tasks.py`
-### v4.1.0 (2026-03)
-- OpenEnv compliant with `/tasks`, `/grader`, `/baseline` endpoints
-- `inference.py` hackathon submission script
-- 9-component reward system with ROUGE + BERTScore + AlignScore
-- 38 datasets, 1M+ examples
----
-*Built to train models to stop hallucination · MIT License*

 ---
+title: AutoClean-Ai
+emoji: 🧹
+colorFrom: green
+colorTo: blue
 sdk: docker
 app_port: 7860
 pinned: true
 tags:
   - openenv
   - reinforcement-learning
+  - data-cleaning
+  - data-preprocessing
   - llm-training
   - benchmark
   - ai-safety
+  - data-quality
+  - mlops
 base_path: /web
 ---
+# 🧹 AutoClean-Ai
+> **Production-grade OpenEnv RL environment for training AI models to clean tabular data automatically.**
+**Server Version:** v1.0.0
 [![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
 [![Python](https://img.shields.io/badge/Python-3.10%20%7C%203.11%20%7C%203.12-blue)](#-quick-start)
 [![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
+[![Dataset](https://img.shields.io/badge/Dataset-Realistic%20Generated-orange)](#-datasets)
 ---
+## 💡 The Problem
+80% of data scientist time is spent cleaning data. Bad data causes 60% of ML project failures. AutoClean-Ai was built to train AI agents that can automatically detect and fix common data quality issues in tabular datasets.
 ## 🚀 Quick Start
 ### Run Locally
 ```bash
+git clone https://github.com/SairajMN/WorkflowOps.git
+cd WorkflowOps
 pip install -e .
 uvicorn server.app:app --host 0.0.0.0 --port 7860
 curl http://localhost:7860/health
 ```python
 import requests
+BASE = "http://localhost:7860"
 # 1. Start episode
 obs = requests.post(f"{BASE}/reset", json={"difficulty": "beginner"}).json()
+print(obs["dataset_preview"], obs["column_info"])
+# 2. Submit cleaning action
 result = requests.post(f"{BASE}/step", json={
+    "action_type": "fix_missing_values",
+    "column_index": 2,
+    "confidence": 0.92,
+    "reasoning": "Mean imputation for numerical column",
     "session_id": obs.get("session_id"),
 }).json()
+print(f"Reward: {result['reward']}, Cleaned: {result['rows_cleaned']}")
 # 3. Score the episode
 grade = requests.post(f"{BASE}/grader", json={
+    "task_id": "task_1_basic_cleaning",
     "step_rewards": [result['reward']],
+    "step_infos": [result],
 }).json()
 print(f"Episode score: {grade['score']}")
 ```
 ### Validate OpenEnv Compliance
 ```bash
 # Local structure check
 openenv validate
+# Runtime check against live server
 openenv validate --url http://localhost:7860 --verbose
 ```
 ## 🎯 Tasks
+3 progressive difficulty tasks:
+| # | task_id | Difficulty | Description | Expected Agent Score |
+|---|---------|-----------|-------------|-------------------|
+| 1 | `task_1_basic_cleaning` | 🟢 Beginner | Fix missing values, standardize formats | 0.70–0.85 |
+| 2 | `task_2_advanced_cleaning` | 🟡 Intermediate | Handle outliers, correct data types, deduplication | 0.55–0.70 |
+| 3 | `task_3_full_pipeline` | 🔴 Advanced | Complete end-to-end data cleaning pipeline | 0.40–0.60 |
 ---
+## 🎮 Environment Workflow
+The agent receives a **tabular dataset** with known quality issues. It must select the appropriate cleaning operation, apply it correctly, and justify its choice.
 ### Action Space
 ```json
 {
+    "action_type":      "fix_missing_values | remove_outliers | standardize | deduplicate | correct_types | fill_dates",
+    "column_index":     3,
+    "confidence":       0.85,
+    "reasoning":        "string explaining the choice",
+    "session_id":       "session id from reset"
 }
 ```
 ```json
 {
+    "dataset_preview":   "First 5 rows of data",
+    "column_info":       "Column names, types, missing stats",
+    "reward":            0.75,
+    "feedback":          "Detailed human-readable feedback",
+    "rows_cleaned":      12,
+    "issues_remaining":  3,
+    "done":              false,
+    "session_id":        "ses_a1b2c3d4"
 }
 ```
 ---
+## 📊 Reward System (7 Components)
 | Component | Weight | Description |
 |-----------|--------|-------------|
+| Correctness | 0.35 | Operation actually fixed the issue |
+| Appropriate action | 0.25 | Right operation selected for the problem |
+| Confidence calibration | 0.15 | Confidence matches actual correctness |
+| No side effects | 0.15 | Cleaning didn't break other columns |
+| Efficiency | 0.10 | Minimum steps to clean dataset |
 ---
+## 📈 Metrics
+✅ Data Quality Score
+✅ Completeness Ratio
+✅ Uniqueness Ratio
+✅ Type Consistency
+✅ Cleaning Efficiency
+✅ Action Appropriateness
 ---
+## 📋 Supported Data Cleaning Operations
+| Operation | Description |
+|-----------|-------------|
+| `fix_missing_values` | Mean/median/mode imputation |
+| `remove_outliers` | IQR / Z-score outlier removal |
+| `standardize` | Normalize numerical columns |
+| `deduplicate` | Remove duplicate rows |
+| `correct_types` | Fix incorrect data types |
+| `fill_dates` | Standardize date formats |
+| `handle_categories` | Encode categorical columns |
+| `remove_duplicates` | Drop identical rows |
+| `trim_strings` | Clean whitespace from text columns |
+| `correct_values` | Fix known invalid values |
 ---
 | `GET` | `/metadata` | Environment name, version, description |
 | `GET` | `/schema` | Action, observation, and state JSON schemas |
 | `GET` | `/health` | Health check |
 ### Environment
 | Method | Endpoint | Description |
 |--------|----------|-------------|
+| `POST` | `/reset` | Start new episode |
+| `POST` | `/step` | Submit cleaning action |
 | `GET` | `/state` | Get current episode state |
 ---
 ## 💻 Development
 # Validate OpenEnv compliance
 openenv validate --url http://localhost:7860 --verbose
 ```
 ---
 | | |
 |---|---|
+| 📦 GitHub | https://github.com/SairajMN/WorkflowOps |
+| 📖 Interactive API Docs | http://localhost:7860/redoc |
 | 🔧 OpenEnv Framework | https://github.com/meta-pytorch/OpenEnv |
 ---
+*Built for Data Cleaning AI Agents · MIT License*

__init__.py CHANGED Viewed

@@ -1,3 +1,3 @@
-"""WorkflowOps OpenEnv Environment"""
-__version__ = "0.1.0"


1	+ """AutoClean-AI OpenEnv Environment"""
2
3	+ __version__ = "1.0.0"

client.py CHANGED Viewed

@@ -1,83 +1,5 @@
-"""HTTP/WebSocket client for HallucinationGuard-Env."""
-import requests
-from typing import Optional, Dict, Any
-from models import HallucinationAction, HallucinationObservation, HallucinationState
-class HallucinationClient:
-    """Client for interacting with the HallucinationGuard environment."""
-    def __init__(self, base_url: str = "http://localhost:7860"):
-        self.base_url = base_url.rstrip("/")
-        self.session = requests.Session()
-    def health_check(self) -> Dict[str, Any]:
-        """Check if the server is healthy."""
-        response = self.session.get(f"{self.base_url}/health")
-        response.raise_for_status()
-        return response.json()
-    def reset(self) -> HallucinationObservation:
-        """Reset the environment and get initial observation."""
-        response = self.session.post(f"{self.base_url}/reset")
-        response.raise_for_status()
-        data = response.json()
-        self._session_id = data.get("session_id")
-        return HallucinationObservation(**data)
-    def step(self, action: HallucinationAction) -> HallucinationObservation:
-        """Take a step in the environment."""
-        action_dict = {
-            "answer": action.answer,
-            "confidence": action.confidence,
-            "source_quote": action.source_quote,
-            "metadata": action.metadata
-        }
-        if getattr(self, '_session_id', None):
-            action_dict["session_id"] = self._session_id
-        response = self.session.post(
-            f"{self.base_url}/step",
-            json=action_dict
-        )
-        response.raise_for_status()
-        data = response.json()
-        return HallucinationObservation(**data)
-    def get_state(self) -> HallucinationState:
-        """Get the current environment state."""
-        response = self.session.get(f"{self.base_url}/state")
-        response.raise_for_status()
-        data = response.json()
-        return HallucinationState(**data)
-    def close(self) -> None:
-        """Close the client session."""
-        self.session.close()
-# Example usage
-if __name__ == "__main__":
-    client = HallucinationClient()
-    # Check health
-    print("Health:", client.health_check())
-    # Reset environment
-    obs = client.reset()
-    print(f"\nQuestion: {obs.question}")
-    print(f"Context: {obs.context[:200]}...")
-    # Take a step with a sample action
-    action = HallucinationAction(
-        answer="This is a test answer",
-        confidence=0.8,
-        source_quote="test quote"
-    )
-    obs = client.step(action)
-    print(f"\nReward: {obs.reward}")
-    print(f"Feedback: {obs.feedback}")
-    print(f"Is Hallucination: {obs.is_hallucination}")
-    client.close()

+"""AutoClean-AI Client Module"""
+class AutoCleanClient:
+    """Client interface for AutoClean environment"""
+    pass

openenv.yaml CHANGED Viewed

@@ -30,7 +30,7 @@ openenv:
   entry_points:
     server: server.app:app
-    client: client:HallucinationClient
 # Tasks (easy → medium → hard)
 tasks:

   entry_points:
     server: server.app:app
+    client:
 # Tasks (easy → medium → hard)
 tasks:

pyproject.toml CHANGED Viewed

@@ -1,70 +1,18 @@
-[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
-[project]
-name = "hallucination-guard-env"
-version = "4.2.0"
-description = "Production RL environment for training LLMs on hallucination avoidance — 1M+ examples across 38 datasets"
-readme = "README.md"
-license = {text = "MIT"}
-requires-python = ">=3.10"
-authors = [
-    {name = "HallucinationGuard-Env Contributors"}
-]
-keywords = [
-    "openenv",
-    "reinforcement-learning",
-    "hallucination-detection",
-    "question-answering",
-    "ai-safety"
-]
-classifiers = [
-    "Development Status :: 5 - Production/Stable",
-    "Intended Audience :: Developers",
-    "Intended Audience :: Science/Research",
-    "License :: OSI Approved :: MIT License",
-    "Programming Language :: Python :: 3",
-    "Programming Language :: Python :: 3.10",
-    "Programming Language :: Python :: 3.11",
-    "Programming Language :: Python :: 3.12",
-    "Topic :: Scientific/Engineering :: Artificial Intelligence",
-]
-dependencies = [
-    "openenv-core>=0.2.0",
-    "fastapi>=0.100.0",
-    "uvicorn>=0.23.0",
-    "requests>=2.31.0",
-    "huggingface_hub>=0.20.0",
-    "datasets>=2.14.0",
-    "sentence-transformers>=2.7.0,<3.0.0",
-    "transformers>=4.35.0,<5.0.0",
-    "numpy>=1.24.0,<2.0.0",
-    "protobuf>=3.20.0,<5.0.0",
-    "rouge-score>=0.1.2",
-    "bert-score>=0.3.13",
-    "pydantic>=2.0.0",
-    "aiofiles>=23.0.0",
-    "python-json-logger>=2.0.0",
-]
-[project.optional-dependencies]
-dev = [
-    "pytest>=7.0.0",
-    "pytest-asyncio>=0.21.0",
-    "httpx>=0.24.0",
-]
-[project.scripts]
-server = "server.app:main"
-[project.urls]
-Homepage = "https://huggingface.co/spaces/SamSankar/hallucination-guard-env"
-Repository = "https://github.com/SS-360/hallucination-guard-env"
-Documentation = "https://samsankar-hallucination-guard-env.hf.space/docs"
-[tool.hatch.build.targets.wheel]
-packages = ["server", "models.py", "client.py"]
-[tool.pytest.ini_options]
-testpaths = ["tests"]

+[project]
+name = "AutoClean-AI"
+version = "1.0.0"
+description = "OpenEnv environment for AI data cleaning tasks"
+authors = [
+  { name = "WorkflowOps" }
+]
+license = { file = "LICENSE" }
+dependencies = [
+  "openenv-core>=0.2.0",
+  "fastapi>=0.100.0",
+  "uvicorn>=0.23.0",
+  "requests>=2.31.0",
+  "openai>=1.0.0"
+]
+[tool.openenv]
+version = ">=0.2.0"

server/app.py CHANGED Viewed

@@ -1,5 +1,5 @@
 """
-HallucinationGuard-Env v4.2 — Production FastAPI Server
 Endpoints:
   Standard   : POST /reset  POST /step  GET /state  GET /health
@@ -1040,11 +1040,11 @@ function copyCode(btn, id) {
 # FASTAPI APP — session-isolated environments for thread safety
 # ═══════════════════════════════════════════════════════════════════════════════
-_default_env: Optional[HallucinationEnvironment] = None
 _env_loading = False
 _env_lock = threading.Lock()
-def _get_default_env() -> HallucinationEnvironment:
     """Get or create the shared dataset-loader environment (used only for dataset access)."""
     global _default_env, _env_loading
     if _default_env is not None:
@@ -1054,8 +1054,8 @@ def _get_default_env() -> HallucinationEnvironment:
             return _default_env
         _env_loading = True
         try:
-            logger.info("Creating HallucinationEnvironment (dataset loader)...")
-            _default_env = HallucinationEnvironment()
             logger.info(f"Environment ready — {_default_env.dataset_loader.get_total_examples():,} examples loaded.")
             return _default_env
         except Exception as e:
@@ -1077,22 +1077,22 @@ def _get_default_env() -> HallucinationEnvironment:
             _env_loading = False
-def _create_session_env(session_id: str) -> HallucinationEnvironment:
     """Create a fresh per-session environment that shares the dataset loader
     (expensive to load) but has its own episode state (safe for concurrent use)."""
     loader_env = _get_default_env()
     # Pass the shared loader directly into __init__ so we skip the expensive
     # DatasetLoader() construction and dataset loading that would otherwise
     # happen inside HallucinationEnvironment.__init__
-    env = HallucinationEnvironment(session_id=session_id, dataset_loader=loader_env.dataset_loader)
     return env
-_sessions: Dict[str, HallucinationEnvironment] = {}
 _session_lock = threading.Lock()
-def _get_session(session_id: str) -> Optional[HallucinationEnvironment]:
     """Retrieve an existing session environment."""
     with _session_lock:
         return _sessions.get(session_id)
@@ -1230,8 +1230,8 @@ async def step(action_data: Dict[str, Any]):
         if env is None:
             # Fallback: use default env (single-user mode)
             env = _get_default_env()
-        valid = set(HallucinationAction.model_fields.keys()) if hasattr(HallucinationAction, 'model_fields') else set(HallucinationAction.__fields__.keys())
-        action = HallucinationAction(**{k: v for k, v in action_data.items() if k in valid})
         result = _safe_dict(env.step(action))
         # If episode is done, clean up session
         if result.get("done", False) and session_id:

 """
+AutoClean-Ai v1.0.0 — Production FastAPI Server
 Endpoints:
   Standard   : POST /reset  POST /step  GET /state  GET /health
 # FASTAPI APP — session-isolated environments for thread safety
 # ═══════════════════════════════════════════════════════════════════════════════
+_default_env: Optional[DataCleaningEnvironment] = None
 _env_loading = False
 _env_lock = threading.Lock()
+def _get_default_env() -> DataCleaningEnvironment:
     """Get or create the shared dataset-loader environment (used only for dataset access)."""
     global _default_env, _env_loading
     if _default_env is not None:
             return _default_env
         _env_loading = True
         try:
+            logger.info("Creating DataCleaningEnvironment (dataset loader)...")
+            _default_env = DataCleaningEnvironment()
             logger.info(f"Environment ready — {_default_env.dataset_loader.get_total_examples():,} examples loaded.")
             return _default_env
         except Exception as e:
             _env_loading = False
+def _create_session_env(session_id: str) -> DataCleaningEnvironment:
     """Create a fresh per-session environment that shares the dataset loader
     (expensive to load) but has its own episode state (safe for concurrent use)."""
     loader_env = _get_default_env()
     # Pass the shared loader directly into __init__ so we skip the expensive
     # DatasetLoader() construction and dataset loading that would otherwise
     # happen inside HallucinationEnvironment.__init__
+    env = DataCleaningEnvironment(session_id=session_id, dataset_loader=loader_env.dataset_loader)
     return env
+_sessions: Dict[str, DataCleaningEnvironment] = {}
 _session_lock = threading.Lock()
+def _get_session(session_id: str) -> Optional[DataCleaningEnvironment]:
     """Retrieve an existing session environment."""
     with _session_lock:
         return _sessions.get(session_id)
         if env is None:
             # Fallback: use default env (single-user mode)
             env = _get_default_env()
+        valid = set(DataCleaningAction.model_fields.keys()) if hasattr(DataCleaningAction, 'model_fields') else set(DataCleaningAction.__fields__.keys())
+        action = DataCleaningAction(**{k: v for k, v in action_data.items() if k in valid})
         result = _safe_dict(env.step(action))
         # If episode is done, clean up session
         if result.get("done", False) and session_id: