Spaces:

SamSankar
/

hallucination-guard-env

Running

App Files Files Community

SamSankar commited on Mar 21

Commit

9d28801

verified ·

1 Parent(s): b2d2dc5

Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +168 -280
evaluate_groq.py +64 -0
hallucination_guard_sdk.py +345 -0
server/app.py +243 -70
server/dataset_loader.py +0 -0

README.md CHANGED Viewed

@@ -13,359 +13,247 @@ tags:
   - grounded-generation
   - question-answering
   - fact-checking
   - llm-training
 ---
-# 🛡️ HallucinationGuard-Env
-> **An OpenEnv reinforcement learning environment that trains AI models to answer only from verified context — penalizing hallucination and rewarding factual grounding.**
-[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
-[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
-[![Dataset](https://img.shields.io/badge/Dataset-2000%2B_examples-orange)](#datasets)
 ---
-## 💡 The Inspiration
-During research for the Meta PyTorch OpenEnv Hackathon, an AI model confidently hallucinated a **"golden ticket backdoor"** — claiming that Ideathon winners could skip directly to the Grand Finale. This information existed nowhere in the official sources. The AI stated it with high confidence and even fabricated a supporting quote.
-That moment made one thing clear: hallucination isn't just an academic problem. It causes real confusion in high-stakes situations.
-**HallucinationGuard-Env** was built to fix that — training AI models to say *"I don't know"* when they don't, cite real sources when they do, and never fabricate with confidence.
 ---
-## 🚀 Quick Start
-```bash
-# Install
-pip install -e .
-# Run locally
-uvicorn server.app:app --reload
-# Health check
-curl http://localhost:8000/health
-# → {"status": "healthy", "service": "HallucinationGuard-Env"}
-# Deploy to HuggingFace Spaces
-openenv push --repo-id your-username/hallucination-guard-env
-```
----
-## 🎮 How The Environment Works
-The agent receives a **question** and a **source document**. It must answer using only what the document says, provide a direct quote supporting its answer, and state how confident it is.
-### Action Space
 ```python
-@dataclass
-class HallucinationAction(Action):
-    answer: str          # The agent's answer
-    confidence: float    # Certainty 0.0 → 1.0
-    source_quote: str    # Direct quote from context supporting the answer
 ```
-### Observation Space
 ```python
-@dataclass
-class HallucinationObservation(Observation):
-    question: str                  # The question to answer
-    context: str                   # Source document to answer from
-    reward: float                  # Step reward
-    feedback: str                  # Detailed human-readable feedback
-    is_hallucination: bool         # Was hallucination detected?
-    hallucination_type: str        # Type of hallucination detected
-    hallucination_severity: str    # NONE / MINOR / MODERATE / SEVERE / CRITICAL
-    grounding_score: float         # How well answer is grounded in context
-    accuracy_so_far: float         # Running accuracy this episode
-    skill_rating: float            # ELO-style skill rating
-    attempts_remaining: int        # Steps left in episode
-    done: bool                     # Episode complete?
 ```
-### Episode Flow
-```
-reset()
-  → Sample question + context from dataset (curriculum-aware)
-  → Return initial observation
-step(action)
-  → Grade answer across 6 components
-  → Detect hallucination type and severity
-  → Compute multi-factor reward
-  → Adapt difficulty based on performance
-  → Return observation with reward + rich feedback
-state()
-  → Return episode metadata: ID, step count, skill rating, curriculum stage
-```
----
-## 🏆 Reward System
-Six components combine into a single reward signal in **[0.0, 1.0]**:
-| Component | Weight | What It Measures |
-|---|---|---|
-| **Factual Correctness** | 30% | Semantic similarity + entity overlap vs ground truth |
-| **Source Grounding** | 20% | Word coverage and context matching |
-| **Citation Accuracy** | 15% | Is source_quote actually in the document? |
-| **Confidence Calibration** | 15% | Does stated confidence match actual accuracy? |
-| **Semantic Consistency** | 10% | Logical coherence with context |
-| **Hallucination Penalty** | 10% | Penalty for fabricated content |
-**Difficulty multipliers:** beginner 0.9× → expert 1.2×
-**Consistency bonus:** up to +0.05 for sustained high performance
-```
-reward = clamp(Σ(weight × score) × difficulty_multiplier + consistency_bonus, 0.0, 1.0)
 ```
-**In practice:**
-- Hallucinated answer with false citation → reward ≈ **0.002–0.10**, CRITICAL severity
-- Grounded correct answer with real quote → reward ≈ **0.85–1.00**
 ---
-## 🔬 Hallucination Detection
-### 8 Types Classified
-| Type | What It Catches |
-|---|---|
-| `FABRICATED_FACT` | Information stated that is not in the source |
-| `FALSE_CITATION` | source_quote that does not exist in the document |
-| `OVERCONFIDENT_WRONG` | High confidence on an incorrect answer |
-| `CONTEXT_DRIFT` | Answer gradually drifts away from source |
-| `NUMERICAL_FABRICATION` | Made-up statistics or numbers |
-| `ENTITY_CONFUSION` | Wrong names, organisations, or places |
-| `TEMPORAL_ERROR` | Incorrect dates or timelines |
-| `RELATIONSHIP_ERROR` | Incorrect relationships between entities |
-### 5 Severity Levels
-| Level | Score | Meaning |
-|---|---|---|
-| NONE | 0.0 | Fully grounded answer |
-| MINOR | 0.1–0.3 | Slight deviation from source |
-| MODERATE | 0.3–0.5 | Noticeable unsupported claims |
-| SEVERE | 0.5–0.7 | Significantly fabricated content |
-| CRITICAL | 0.7+ | Answer largely invented |
-### Detection Algorithms
-- **Word coverage** — fraction of meaningful content words in answer found in context
-- **Entity hallucination** — novel entities in answer not found in source
-- **Numerical fabrication** — numbers in answer absent from context
-- **Sliding window fuzzy matching** — citation verification (threshold 0.7)
-- **Negation mismatch** — contradiction detection via negation word analysis
-- **Confidence calibration error** — `|confidence − correctness|` with 50% overconfidence surcharge
 ---
-## 📚 Datasets
-2,140+ total examples loaded at runtime across four difficulty levels:
-| Source | Examples | Type | Difficulty |
-|---|---|---|---|
-| Synthetic (built-in) | 140 | Hallucination traps, edge cases | All levels |
-| **SQuAD** | ~500 | Reading comprehension | Intermediate |
-| **TriviaQA** | ~500 | Open-domain factual QA | Intermediate |
-| **HaluEval** | ~500 | Hallucination evaluation | Advanced |
-| **TruthfulQA** | ~500 | Factuality benchmark | Advanced/Expert |
-Datasets load from Hugging Face automatically on first start (`pip install datasets`).
-A local disk cache (`server/cache/`) is used on subsequent starts for instant loading.
-### Built-in Synthetic Dataset Breakdown
-| Difficulty | Count | Focus |
-|---|---|---|
-| Beginner | 60 | Simple factual recall, API concepts, basic science |
-| Intermediate | 60 | Multi-hop reasoning, history, technology, biology |
-| Advanced | 10 | Hallucination traps, common misconceptions |
-| Expert | 10 | System mechanics, algorithms, quantum physics |
-### Add Custom Datasets
-```python
-from server.dataset_loader import DatasetLoader
-loader = DatasetLoader()
-loader.load_from_json("my_dataset.json")   # Custom JSON
-loader.load_from_huggingface("squad")      # Any HF dataset
 ```
-Custom JSON format:
-```json
-[
-  {
-    "question": "What is the prize pool?",
-    "context": "The hackathon has a total prize pool of $30,000 USD...",
-    "answer": "$30,000 USD",
-    "id": "q001",
-    "source": "custom",
-    "difficulty": "intermediate",
-    "category": "factual_recall"
-  }
-]
 ```
----
-## 🎓 Curriculum Learning
-The environment adapts difficulty in real-time using an ELO-style skill rating:
-| Trigger | Action |
-|---|---|
-| Recent avg reward > 0.7 | Increase difficulty |
-| Recent avg reward < 0.3 | Decrease difficulty |
-| Overall accuracy > 0.8 | EXPERT ceiling |
-| Overall accuracy > 0.6 | ADVANCED ceiling |
-| Overall accuracy > 0.4 | INTERMEDIATE ceiling |
-Episodes can use progressive difficulty mixing (beginner → expert within one episode) for maximum learning efficiency.
 ---
-## 🔌 Model-Agnostic Adapters
-Works with any LLM out of the box:
 ```python
-from model_adapters import create_adapter
-# OpenAI
-adapter = create_adapter("openai", model_name="gpt-4", api_key="sk-...")
-# Anthropic Claude
-adapter = create_adapter("anthropic", model_name="claude-sonnet-4-6", api_key="sk-ant-...")
-# HuggingFace (Llama, Mistral, Qwen...)
-adapter = create_adapter("huggingface", model_name="meta-llama/Llama-3-8B-Instruct")
-# Local Ollama
-adapter = create_adapter("ollama", model_name="llama3", api_base="http://localhost:11434")
-# Use it
-response = adapter.generate_response(
-    question="What is the prize pool?",
-    context="The hackathon has $30,000 USD in prizes...",
-    require_citation=True,
-    require_confidence=True
-)
 ```
----
-## 📊 Metrics & Monitoring
 ```bash
-curl http://localhost:8000/metrics                     # Live metrics
-curl http://localhost:8000/metrics/training-curves     # Reward curves
-curl http://localhost:8000/metrics/heatmap             # Hallucination heatmap
-curl http://localhost:8000/metrics/export?format=json  # Export data
-```
-Sample output after training:
-```
-Episodes: 15  |  Steps: 150
-Accuracy: 78.5%  |  Avg Reward: 0.742  |  Hallucination Rate: 12.3%
-Reward Trend: IMPROVING ↑   |  Recent Hallucination Rate: 8.2%
 ```
 ---
-## 🏗️ Project Structure
-```
-hallucination_guard_env/
-├── models.py               # HallucinationAction, Observation, State, Config
-├── client.py               # HTTP/WebSocket client
-├── model_adapters.py       # OpenAI, Anthropic, HuggingFace, Ollama adapters
-├── test_env.py             # Full test suite
-├── openenv.yaml            # Manifest
-├── pyproject.toml          # Package metadata
-└── server/
-    ├── environment.py      # Core RL environment logic
-    ├── app.py              # FastAPI server (stateless + session endpoints)
-    ├── grader.py           # 6-component reward + hallucination detection
-    ├── dataset_loader.py   # Multi-source dataset loader with caching
-    ├── metrics.py          # Real-time metrics tracker
-    ├── cache/              # Pre-built dataset cache (instant startup)
-    ├── requirements.txt
-    └── Dockerfile
-```
----
-## ⚙️ Configuration
-```python
-from models import EnvironmentConfig
-config = EnvironmentConfig(
-    max_questions_per_episode=10,
-    reward_weights={
-        "factual_correctness":    0.30,
-        "source_grounding":       0.20,
-        "citation_accuracy":      0.15,
-        "confidence_calibration": 0.15,
-        "semantic_consistency":   0.10,
-        "hallucination_penalty":  0.10,
-    },
-    adaptive_difficulty=True,
-    difficulty_threshold_increase=0.7,
-    difficulty_threshold_decrease=0.3,
-    curriculum_enabled=True,
-)
-env = HallucinationEnvironment(config=config)
-```
 ---
-## 🧪 Tests
-```bash
-python test_env.py
 ```
-Covers: dataset loading, grader components, reset/step/state, episode completion, hallucination type classification, curriculum difficulty, metrics tracking, model adapter factory.
----
-## 🔗 Links
-| | |
-|---|---|
-| 📖 OpenEnv Docs | https://github.com/meta-pytorch/OpenEnv |
-| 🎓 OpenEnv Course | https://github.com/huggingface/openenv-course |
 ---
-## 🏆 Why This Environment Stands Out
-| | |
-|---|---|
-| **Real-world origin** | Born from an actual AI hallucination experience during hackathon research |
-| **Solves the #1 LLM problem** | Hallucination is the most critical reliability issue in production AI |
-| **Novel** | First OpenEnv environment targeting hallucination and grounding |
-| **Rich reward signal** | 6-component system gives models precise, actionable feedback |
-| **2,140+ diverse examples** | SQuAD, TriviaQA, HaluEval, TruthfulQA + curated synthetic traps |
-| **Model-agnostic** | Works with GPT-4, Claude, Llama, Mistral, or any LLM |
-| **Production-ready** | Session management, metrics, caching, Dockerfile included |
-| **Adaptive** | ELO-based curriculum scales difficulty with the agent's skill |
 ---
-*Built for the Meta PyTorch OpenEnv Hackathon 2026 · MIT License*

   - grounded-generation
   - question-answering
   - fact-checking
+  - llm-evaluation
   - llm-training
+  - benchmark
 ---
+# 🛡️ HallucinationGuard-Env v3.0
+> **The production-grade OpenEnv RL environment for training and evaluating LLMs on hallucination avoidance.**
+[![Running](https://img.shields.io/badge/status-running-brightgreen)](https://huggingface.co/spaces/SamSankar/hallucination-guard-env)
+[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-blue)](https://github.com/meta-pytorch/OpenEnv)
+[![Datasets](https://img.shields.io/badge/datasets-50k%2B%20examples-orange)](#datasets)
+[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
 ---
+## Why HallucinationGuard?
+Large language models hallucinate — they confidently state false information not supported by any evidence. This is a critical problem for companies deploying LLMs in production.
+**HallucinationGuard-Env** provides a standardized, reproducible RL environment to:
+- 📊 **Benchmark** any LLM's hallucination rate across 50,000+ real-world QA examples
+- 🎯 **Train** models to stay grounded in provided context
+- 🏆 **Compare** models on a public leaderboard
+- 🔧 **Integrate** into any ML pipeline via REST API or Python SDK
 ---
+## Quick Start
+### Option 1 — Python SDK (recommended)
 ```python
+pip install requests
 ```
 ```python
+from hallucination_guard_sdk import HallucinationGuardEnv
+import anthropic
+client = anthropic.Anthropic(api_key="YOUR_KEY")
+def my_model(question: str, context: str) -> str:
+    """Your model function — takes question + context, returns answer."""
+    msg = client.messages.create(
+        model="claude-3-haiku-20240307",
+        max_tokens=256,
+        messages=[{
+            "role": "user",
+            "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer using ONLY the context above."
+        }]
+    )
+    return msg.content[0].text
+# Evaluate in 3 lines
+env = HallucinationGuardEnv()
+results = env.evaluate(my_model, episodes=10, model_name="claude-3-haiku")
+env.submit_to_leaderboard(results, organization="Anthropic")
 ```
+### Option 2 — REST API
+```bash
+BASE="https://samsankar-hallucination-guard-env.hf.space"
+# Start episode
+curl -X POST $BASE/reset
+# Submit answer
+curl -X POST $BASE/step \
+  -H "Content-Type: application/json" \
+  -d '{"answer": "Your answer based only on the context"}'
+# View leaderboard
+curl $BASE/leaderboard
+```
+### Option 3 — OpenAI compatible
+```python
+from openai import OpenAI
+from hallucination_guard_sdk import HallucinationGuardEnv
+client = OpenAI(api_key="YOUR_KEY")
+def gpt4_model(question, context):
+    response = client.chat.completions.create(
+        model="gpt-4o-mini",
+        messages=[
+            {"role": "system", "content": "Answer ONLY from the provided context."},
+            {"role": "user", "content": f"Context: {context}\n\nQ: {question}"}
+        ]
+    )
+    return response.choices[0].message.content
+env = HallucinationGuardEnv()
+results = env.evaluate(gpt4_model, episodes=10, model_name="gpt-4o-mini")
+env.submit_to_leaderboard(results, organization="OpenAI")
 ```
 ---
+## API Reference
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/reset` | Start a new episode, receive first question + context |
+| `POST` | `/step` | Submit answer, receive reward + next question |
+| `GET`  | `/state` | Current episode state |
+| `GET`  | `/health` | Health check |
+| `POST` | `/session/reset` | Create a stateful multi-turn session |
+| `POST` | `/session/step` | Step within a named session |
+| `GET`  | `/leaderboard` | Public model leaderboard |
+| `POST` | `/leaderboard/submit` | Submit evaluation results |
+| `GET`  | `/datasets` | Dataset statistics |
+| `GET`  | `/metrics` | Real-time usage metrics |
+| `GET`  | `/docs` | Interactive Swagger UI |
+---
+## Reward System
+Each answer is scored across 6 dimensions:
+| Component | Weight | Description |
+|-----------|--------|-------------|
+| Factual correctness | 35% | Does the answer match the ground truth? |
+| Source grounding | 30% | Is the answer supported by the context? |
+| Citation accuracy | 15% | Does the answer cite specific context passages? |
+| Confidence calibration | 10% | Is confidence appropriate to accuracy? |
+| Semantic consistency | 5% | Is the answer semantically coherent? |
+| Hallucination penalty | 5% | Was any fabricated content detected? |
+**Reward range:** -1.0 (complete hallucination) to +1.0 (perfect grounded answer)
 ---
+## Datasets
+50,000+ examples across 13 real-world QA datasets:
+| Dataset | Size | Category | Difficulty |
+|---------|------|----------|------------|
+| SQuAD | 5,000 | Reading comprehension | Intermediate |
+| TriviaQA | 5,000 | Trivia / general knowledge | Intermediate |
+| HaluEval | 2,000 | Hallucination detection | Advanced |
+| TruthfulQA | 817 | Factuality benchmark | Expert |
+| Natural Questions | 5,000 | Open-domain QA | Intermediate |
+| HotpotQA | 5,000 | Multi-hop reasoning | Advanced |
+| BoolQ | 5,000 | Yes/No questions | Beginner |
+| FaithDial | 5,000 | Hallucination in dialogue | Advanced |
+| FEVER | 5,000 | Fact verification | Advanced |
+| ARC-Challenge | 2,000 | Science exam | Advanced |
+| OpenBookQA | 2,000 | Science facts | Intermediate |
+| MS MARCO | 5,000 | Web search QA | Intermediate |
+| CoQA | 5,000 | Conversational QA | Intermediate |
+---
+## Curriculum Learning
+The environment implements adaptive difficulty:
 ```
+Beginner → Intermediate → Advanced → Expert
+  BoolQ      SQuAD          HotpotQA    TruthfulQA
+  (yes/no)   (reading)      (multi-hop) (factuality)
 ```
+Difficulty adjusts automatically based on the agent's rolling skill rating.
 ---
+## Leaderboard
+Submit your model's results to the public leaderboard:
 ```python
+env = HallucinationGuardEnv()
+results = env.evaluate(my_model, episodes=10)
+env.submit_to_leaderboard(results, organization="YourCompany")
 ```
+Or via API:
 ```bash
+curl -X POST https://samsankar-hallucination-guard-env.hf.space/leaderboard/submit \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model_name": "gpt-4o",
+    "avg_reward": 0.72,
+    "avg_accuracy": 0.81,
+    "hallucination_rate": 0.19,
+    "total_episodes": 10,
+    "total_steps": 100,
+    "organization": "OpenAI"
+  }'
 ```
 ---
+## Use Cases
+### For AI Companies
+Benchmark your models before deployment. Compare across model versions. Track hallucination regression.
+### For Researchers
+Standardized evaluation protocol. 50k+ diverse examples. Reproducible results via seed parameter.
+### For Developers
+REST API — works with any language. Python SDK — 3 lines to evaluate. Per-dataset caching for fast iteration.
+### For RL Training
+Full OpenEnv-compatible interface. Curriculum learning built-in. Reward signal optimized for RL training loops.
 ---
+## Architecture
 ```
+┌─────────────────────────────────────────────────┐
+│                FastAPI Server                    │
+│  /reset → /step → reward signal → /leaderboard  │
+├─────────────────────────────────────────────────┤
+│              HallucinationEnvironment            │
+│  Episode management · Curriculum learning        │
+├─────────────────────────────────────────────────┤
+│                 Grader                           │
+│  Semantic similarity · NLI · Citation detection  │
+├─────────────────────────────────────────────────┤
+│              Dataset Loader                      │
+│  13 datasets · 50k+ examples · Per-file cache   │
+└─────────────────────────────────────────────────┘
+```
 ---
+## License
+MIT License — free for research and commercial use.
 ---
+*Built for the Meta PyTorch OpenEnv Hackathon 2026*

evaluate_groq.py ADDED Viewed

	@@ -0,0 +1,64 @@

+"""
+HallucinationGuard-Env — Groq/Llama Evaluator (SDK version)
+Uses the HallucinationGuard SDK + Groq free tier
+Setup:
+    pip install groq requests
+    Get free key at https://console.groq.com
+    python evaluate_groq.py --api-key YOUR_GROQ_KEY --episodes 5
+"""
+import argparse
+import sys
+try:
+    from groq import Groq
+except ImportError:
+    print("Run: pip install groq requests")
+    sys.exit(1)
+from hallucination_guard_sdk import HallucinationGuardEnv
+MODEL = "llama-3.1-8b-instant"
+SYSTEM = """Answer questions using ONLY the provided context.
+If the context lacks real information, say: "The context does not contain enough information."
+Never use outside knowledge. Be concise."""
+def make_model_fn(client):
+    def model_fn(question: str, context: str) -> str:
+        r = client.chat.completions.create(
+            model=MODEL,
+            messages=[
+                {"role": "system", "content": SYSTEM},
+                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
+            ],
+            max_tokens=200,
+            temperature=0.1,
+        )
+        return r.choices[0].message.content.strip()
+    return model_fn
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--api-key",      required=True)
+    parser.add_argument("--episodes",     type=int, default=5)
+    parser.add_argument("--model-name",   default="llama-3.1-8b-groq")
+    parser.add_argument("--organization", default="")
+    parser.add_argument("--submit",       action="store_true",
+                        help="Submit results to leaderboard")
+    args = parser.parse_args()
+    client   = Groq(api_key=args.api_key)
+    model_fn = make_model_fn(client)
+    env     = HallucinationGuardEnv()
+    results = env.evaluate(model_fn, episodes=args.episodes,
+                           model_name=args.model_name)
+    env.save_results(results)
+    if args.submit:
+        env.submit_to_leaderboard(results, organization=args.organization)
+if __name__ == "__main__":
+    main()

hallucination_guard_sdk.py ADDED Viewed

	@@ -0,0 +1,345 @@

+"""
+HallucinationGuard SDK v3.0
+===========================
+The easiest way to evaluate any LLM for hallucination using HallucinationGuard-Env.
+Install:
+    pip install requests
+Usage (3 lines):
+    from hallucination_guard_sdk import HallucinationGuardEnv
+    env = HallucinationGuardEnv()
+    results = env.evaluate(your_model_fn, episodes=5)
+Full example:
+    import anthropic
+    client = anthropic.Anthropic(api_key="...")
+    def my_model(question, context):
+        msg = client.messages.create(
+            model="claude-3-haiku-20240307",
+            max_tokens=256,
+            messages=[{"role": "user", "content": f"Context: {context}\\n\\nQuestion: {question}\\n\\nAnswer using ONLY the context."}]
+        )
+        return msg.content[0].text
+    env = HallucinationGuardEnv()
+    results = env.evaluate(my_model, episodes=5, model_name="claude-3-haiku")
+    env.print_report(results)
+    env.submit_to_leaderboard(results)
+"""
+import time
+import json
+import sys
+from typing import Callable, Optional, Dict, Any, List
+try:
+    import requests
+except ImportError:
+    print("Run: pip install requests")
+    sys.exit(1)
+class HallucinationGuardEnv:
+    """
+    Python SDK for HallucinationGuard-Env.
+    Parameters
+    ----------
+    base_url : str
+        URL of the deployed environment. Defaults to the live HF Space.
+    verbose : bool
+        Print step-by-step output during evaluation.
+    """
+    BASE_URL = "https://samsankar-hallucination-guard-env.hf.space"
+    def __init__(
+        self,
+        base_url: str = BASE_URL,
+        verbose: bool = True,
+    ):
+        self.base_url = base_url.rstrip("/")
+        self.verbose  = verbose
+        self._check_health()
+    # ── Core methods ──────────────────────────────────────────────────────────
+    def reset(self, difficulty: Optional[str] = None, seed: Optional[int] = None) -> Dict:
+        """Reset the environment. Returns the first observation."""
+        body = {}
+        if difficulty: body["difficulty"] = difficulty
+        if seed is not None: body["seed"] = seed
+        return self._post("/reset", body)
+    def step(self, answer: str) -> Dict:
+        """Submit an answer. Returns reward, hallucination flag, feedback, next question."""
+        return self._post("/step", {"answer": answer})
+    def health(self) -> Dict:
+        """Check if the environment is running."""
+        return self._get("/health")
+    def leaderboard(self) -> Dict:
+        """Get the current leaderboard."""
+        return self._get("/leaderboard")
+    def dataset_info(self) -> Dict:
+        """Get statistics about loaded datasets."""
+        return self._get("/datasets")
+    # ── High-level evaluate() ─────────────────────────────────────────────────
+    def evaluate(
+        self,
+        model_fn: Callable[[str, str], str],
+        episodes: int = 3,
+        difficulty: Optional[str] = None,
+        model_name: str = "my_model",
+        delay: float = 0.5,
+    ) -> Dict[str, Any]:
+        """
+        Run a full evaluation of your model against the environment.
+        Parameters
+        ----------
+        model_fn : callable
+            Function that takes (question: str, context: str) → answer: str
+        episodes : int
+            Number of episodes to run (default: 3)
+        difficulty : str, optional
+            Force a difficulty level: beginner | intermediate | advanced | expert
+        model_name : str
+            Name for the leaderboard
+        delay : float
+            Seconds to wait between API calls (be gentle with free tier)
+        Returns
+        -------
+        dict with summary stats and full episode logs
+        Example
+        -------
+        >>> def my_model(question, context):
+        ...     # call your LLM here
+        ...     return "answer from context"
+        >>> env = HallucinationGuardEnv()
+        >>> results = env.evaluate(my_model, episodes=5)
+        """
+        if self.verbose:
+            print(f"\n🛡️  HallucinationGuard-Env — Evaluating: {model_name}")
+            print(f"   Episodes : {episodes}")
+            print(f"   Difficulty: {difficulty or 'mixed'}")
+            print(f"   Endpoint : {self.base_url}\n")
+        all_episodes = []
+        for ep_num in range(1, episodes + 1):
+            if self.verbose:
+                print(f"{'='*60}")
+                print(f"  EPISODE {ep_num}/{episodes}")
+                print(f"{'='*60}")
+            ep_result = self._run_episode(model_fn, ep_num, difficulty, delay)
+            all_episodes.append(ep_result)
+            if self.verbose:
+                print(f"  ─ Episode {ep_num} complete │ "
+                      f"accuracy: {ep_result['accuracy']*100:.0f}% │ "
+                      f"reward: {ep_result['avg_reward']:.3f} │ "
+                      f"hallucinations: {ep_result['hallucinations']}/{ep_result['steps']}")
+            time.sleep(delay)
+        # ── Aggregate ─────────────────────────────────────────────────────────
+        total_steps  = sum(e["steps"]          for e in all_episodes)
+        total_halluc = sum(e["hallucinations"] for e in all_episodes)
+        avg_accuracy = sum(e["accuracy"]       for e in all_episodes) / len(all_episodes)
+        avg_reward   = sum(e["avg_reward"]     for e in all_episodes) / len(all_episodes)
+        avg_skill    = sum(e["final_skill"]    for e in all_episodes) / len(all_episodes)
+        best_streak  = max(e["best_streak"]    for e in all_episodes)
+        halluc_rate  = total_halluc / max(total_steps, 1)
+        results = {
+            "model_name":        model_name,
+            "episodes":          episodes,
+            "total_steps":       total_steps,
+            "avg_accuracy":      round(avg_accuracy, 4),
+            "avg_reward":        round(avg_reward, 4),
+            "hallucination_rate": round(halluc_rate, 4),
+            "best_streak":       best_streak,
+            "avg_skill_rating":  round(avg_skill, 4),
+            "episode_logs":      all_episodes,
+        }
+        if self.verbose:
+            self.print_report(results)
+        return results
+    def _run_episode(self, model_fn, ep_num, difficulty, delay) -> Dict:
+        obs = self.reset(difficulty=difficulty)
+        step_logs = []
+        step = 0
+        while not obs.get("done", False):
+            question = obs.get("question", "")
+            context  = obs.get("context", "")
+            step    += 1
+            if not question:
+                break
+            if self.verbose:
+                q_display = question[:75] + "..." if len(question) > 75 else question
+                print(f"\n  Step {step} [{obs.get('source_dataset','?')}]")
+                print(f"  Q: {q_display}")
+            # Call the model
+            try:
+                answer = model_fn(question, context)
+            except Exception as e:
+                answer = f"Error calling model: {e}"
+            if self.verbose:
+                a_display = answer[:90] + "..." if len(answer) > 90 else answer
+                print(f"  A: {a_display}")
+            obs = self.step(answer)
+            reward    = obs.get("reward", 0) or 0
+            is_halluc = obs.get("is_hallucination", False)
+            status    = "❌ HALLUCINATION" if is_halluc else "✅ OK"
+            if self.verbose:
+                print(f"  {status} │ reward: {reward:.3f} │ skill: {obs.get('skill_rating', 0):.3f}")
+            step_logs.append({
+                "step":               step,
+                "question":           question,
+                "answer":             answer,
+                "reward":             reward,
+                "is_hallucination":   is_halluc,
+                "hallucination_type": obs.get("hallucination_type"),
+                "source":             obs.get("source_dataset", ""),
+            })
+            time.sleep(delay)
+        accuracy    = obs.get("accuracy_so_far", 0)
+        best_streak = obs.get("best_streak", 0)
+        final_skill = obs.get("skill_rating", 0)
+        avg_reward  = sum(s["reward"] for s in step_logs) / max(len(step_logs), 1)
+        hallucinations = sum(1 for s in step_logs if s["is_hallucination"])
+        return {
+            "episode":       ep_num,
+            "steps":         len(step_logs),
+            "accuracy":      accuracy,
+            "avg_reward":    avg_reward,
+            "best_streak":   best_streak,
+            "hallucinations": hallucinations,
+            "final_skill":   final_skill,
+            "step_logs":     step_logs,
+        }
+    # ── Reporting ──────────────────────────────────────────────────────────────
+    def print_report(self, results: Dict) -> None:
+        """Print a formatted evaluation report."""
+        print(f"\n{'='*60}")
+        print(f"  📊 EVALUATION REPORT — {results['model_name']}")
+        print(f"{'='*60}")
+        print(f"  Episodes run        : {results['episodes']}")
+        print(f"  Total steps         : {results['total_steps']}")
+        print(f"  Avg accuracy        : {results['avg_accuracy']*100:.1f}%")
+        print(f"  Avg reward          : {results['avg_reward']:.4f}")
+        print(f"  Hallucination rate  : {results['hallucination_rate']*100:.1f}%")
+        print(f"  Best answer streak  : {results['best_streak']}")
+        print(f"  Avg skill rating    : {results['avg_skill_rating']:.4f}")
+        print(f"{'='*60}\n")
+    def save_results(self, results: Dict, filepath: str = "evaluation_results.json") -> None:
+        """Save evaluation results to a JSON file."""
+        with open(filepath, "w") as f:
+            json.dump(results, f, indent=2)
+        print(f"Results saved to: {filepath}")
+    def submit_to_leaderboard(
+        self,
+        results: Dict,
+        organization: str = "",
+        notes: str = "",
+    ) -> Dict:
+        """
+        Submit your evaluation results to the public leaderboard.
+        Parameters
+        ----------
+        results : dict
+            Output from evaluate()
+        organization : str
+            Your company/institution name
+        notes : str
+            Any notes about the evaluation setup
+        """
+        payload = {
+            "model_name":        results["model_name"],
+            "avg_reward":        results["avg_reward"],
+            "avg_accuracy":      results["avg_accuracy"],
+            "hallucination_rate": results["hallucination_rate"],
+            "total_episodes":    results["episodes"],
+            "total_steps":       results["total_steps"],
+            "organization":      organization,
+            "notes":             notes,
+        }
+        response = self._post("/leaderboard/submit", payload)
+        if self.verbose:
+            print(f"🏆 Submitted to leaderboard: {results['model_name']}")
+            print(f"   View at: {self.base_url}/leaderboard")
+        return response
+    # ── HTTP helpers ───────────────────────────────────────────────────────────
+    def _get(self, path: str) -> Dict:
+        try:
+            r = requests.get(f"{self.base_url}{path}", timeout=30)
+            r.raise_for_status()
+            return r.json()
+        except Exception as e:
+            raise ConnectionError(f"GET {path} failed: {e}")
+    def _post(self, path: str, body: Dict = {}) -> Dict:
+        try:
+            r = requests.post(f"{self.base_url}{path}", json=body, timeout=30)
+            r.raise_for_status()
+            return r.json()
+        except Exception as e:
+            raise ConnectionError(f"POST {path} failed: {e}")
+    def _check_health(self) -> None:
+        try:
+            h = self._get("/health")
+            if self.verbose:
+                print(f"✅ Connected to HallucinationGuard-Env ({h.get('version','?')})")
+        except Exception as e:
+            print(f"⚠️  Could not reach {self.base_url}: {e}")
+# ── CLI quick-test ─────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    """Quick smoke-test using a simple rule-based 'model'."""
+    def dummy_model(question: str, context: str) -> str:
+        """Answers only from context — extracts a key phrase."""
+        words = context.split()
+        if len(words) > 5:
+            return " ".join(words[:10])
+        return context
+    env = HallucinationGuardEnv()
+    results = env.evaluate(dummy_model, episodes=2, model_name="dummy-baseline")
+    env.save_results(results, "dummy_results.json")
+    env.submit_to_leaderboard(results, organization="Test Org", notes="Baseline run")

server/app.py CHANGED Viewed

@@ -1,35 +1,89 @@
-"""FastAPI server for HallucinationGuard-Env with session management.
-Standard endpoints (/reset, /step, /state, /health) — stateless, new env per request.
-Session endpoints (/session/reset, /session/step) — stateful, env persists across calls.
 """
-import sys, os, uuid, logging, dataclasses, enum
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastapi import FastAPI, HTTPException, Header
 from fastapi.responses import JSONResponse, RedirectResponse
-from typing import Dict, Any, Optional
 from models import HallucinationAction, HallucinationObservation, HallucinationState
 from environment import HallucinationEnvironment
 from metrics import get_tracker
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 logger = logging.getLogger(__name__)
 app = FastAPI(
     title="HallucinationGuard-Env",
-    description="OpenEnv RL environment for training AI to avoid hallucinations",
-    version="2.0.0",
 )
-# Session storage for stateful HTTP interactions
 _sessions: Dict[str, HallucinationEnvironment] = {}
-# Shared stateless env instance for standard endpoints
 _default_env: Optional[HallucinationEnvironment] = None
 def _get_default_env() -> HallucinationEnvironment:
     global _default_env
@@ -39,12 +93,8 @@ def _get_default_env() -> HallucinationEnvironment:
 def _safe_dict(obj):
-    """Recursively convert dataclass/enum/dict to JSON-safe structure."""
     if dataclasses.is_dataclass(obj):
-        result = {}
-        for f in dataclasses.fields(obj):
-            result[f.name] = _safe_dict(getattr(obj, f.name))
-        return result
     elif isinstance(obj, enum.Enum):
         return obj.value
     elif isinstance(obj, dict):
@@ -56,11 +106,27 @@ def _safe_dict(obj):
     return str(obj)
 # ── Standard stateless endpoints ──────────────────────────────────────────────
-@app.post("/reset")
 async def reset(body: Dict[str, Any] = {}):
-    """Reset environment and return initial observation."""
     try:
         env = _get_default_env()
         obs = env.reset(**{k: v for k, v in body.items()
@@ -71,9 +137,19 @@ async def reset(body: Dict[str, Any] = {}):
         raise HTTPException(status_code=500, detail=str(e))
-@app.post("/step")
 async def step(action_data: Dict[str, Any]):
-    """Take a step with the provided action."""
     try:
         env = _get_default_env()
         valid = {f.name for f in dataclasses.fields(HallucinationAction)}
@@ -84,23 +160,25 @@ async def step(action_data: Dict[str, Any]):
         raise HTTPException(status_code=500, detail=str(e))
-@app.get("/state")
 async def get_state():
-    """Get current environment state."""
     try:
         return JSONResponse(content=_safe_dict(_get_default_env().state()))
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
-# ── Session-based stateful endpoints ──────────────────────────────────────────
-@app.post("/session/reset")
-async def session_reset(
-    body: Dict[str, Any] = {},
-    x_session_id: Optional[str] = Header(None),
-) -> Dict[str, Any]:
-    """Create or reset a named session."""
     session_id = x_session_id or str(uuid.uuid4())
     if session_id in _sessions:
         _sessions[session_id].close()
@@ -110,16 +188,13 @@ async def session_reset(
                                                    "enable_multi_turn", "enable_context_retrieval")})
     result = _safe_dict(obs)
     result["session_id"] = session_id
-    logger.info(f"Created session {session_id}")
     return result
-@app.post("/session/step")
-async def session_step(
-    action_data: Dict[str, Any],
-    x_session_id: str = Header(...),
-) -> Dict[str, Any]:
-    """Execute a step in an existing session."""
     if x_session_id not in _sessions:
         raise HTTPException(status_code=404,
                             detail=f"Session {x_session_id} not found. Call /session/reset first.")
@@ -131,8 +206,8 @@ async def session_step(
     return result
-@app.delete("/session")
-async def close_session(x_session_id: str = Header(...)) -> Dict[str, str]:
     """Close and clean up a session."""
     if x_session_id in _sessions:
         _sessions[x_session_id].close()
@@ -140,19 +215,140 @@ async def close_session(x_session_id: str = Header(...)) -> Dict[str, str]:
     return {"status": "closed", "session_id": x_session_id}
-@app.get("/session/list")
-async def list_sessions() -> Dict[str, Any]:
     return {"active_sessions": len(_sessions), "session_ids": list(_sessions.keys())}
-# ── Utility endpoints ──────────────────────────────────────────────────────────
-@app.get("/health")
 async def health():
-    return {"status": "healthy", "service": "HallucinationGuard-Env", "version": "2.0.0"}
-@app.get("/metrics")
 async def get_metrics():
     try:
         return get_tracker().get_real_time_metrics()
@@ -160,7 +356,7 @@ async def get_metrics():
         raise HTTPException(status_code=500, detail=str(e))
-@app.get("/metrics/summary")
 async def metrics_summary():
     try:
         return {"summary": get_tracker().generate_summary_report()}
@@ -168,24 +364,7 @@ async def metrics_summary():
         raise HTTPException(status_code=500, detail=str(e))
-@app.get("/environment/info")
-async def env_info():
-    return {
-        "name": "HallucinationGuard-Env",
-        "version": "2.0.0",
-        "endpoints": {
-            "standard": ["/reset", "/step", "/state", "/health"],
-            "session":  ["/session/reset", "/session/step", "/session", "/session/list"],
-            "metrics":  ["/metrics", "/metrics/summary"],
-        },
-        "difficulty_levels": ["beginner", "intermediate", "advanced", "expert"],
-        "hallucination_types": [
-            "fabricated_fact", "false_citation", "overconfident_wrong",
-            "context_drift", "numerical_fabrication", "entity_confusion",
-        ],
-        "supported_models": ["openai", "anthropic", "huggingface", "ollama", "generic"],
-    }
 @app.middleware("http")
 async def log_requests(request, call_next):
@@ -194,12 +373,6 @@ async def log_requests(request, call_next):
     return response
-@app.get("/")
-async def root():
-    return RedirectResponse(url="/docs")
 if __name__ == "__main__":
     import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=8000)

+"""
+HallucinationGuard-Env v3.0 — Production FastAPI Server
+Endpoints:
+  Standard  : POST /reset  POST /step  GET /state  GET /health
+  Session   : POST /session/reset  POST /session/step  DELETE /session
+  Leaderboard: GET /leaderboard  POST /leaderboard/submit  DELETE /leaderboard/{model}
+  Info      : GET /  GET /docs  GET /environment/info  GET /datasets
+              GET /metrics  GET /metrics/summary
 """
+import sys, os, uuid, logging, dataclasses, enum, time
 sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastapi import FastAPI, HTTPException, Header
 from fastapi.responses import JSONResponse, RedirectResponse
+from fastapi.middleware.cors import CORSMiddleware
+from typing import Dict, Any, Optional, List
 from models import HallucinationAction, HallucinationObservation, HallucinationState
 from environment import HallucinationEnvironment
 from metrics import get_tracker
+logging.basicConfig(level=logging.INFO,
+                    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
 logger = logging.getLogger(__name__)
 app = FastAPI(
     title="HallucinationGuard-Env",
+    description="""
+## 🛡️ HallucinationGuard-Env v3.0
+**The production-grade OpenEnv RL environment for training and evaluating LLMs on hallucination avoidance.**
+Built on 50,000+ examples across 13 real-world QA datasets:
+SQuAD · TriviaQA · HaluEval · TruthfulQA · Natural Questions · HotpotQA ·
+BoolQ · FaithDial · FEVER · ARC · OpenBookQA · MS MARCO · CoQA
+### Quick Start
+```python
+pip install requests
+import requests
+BASE = "https://samsankar-hallucination-guard-env.hf.space"
+# 1. Start episode
+obs = requests.post(f"{BASE}/reset").json()
+print(obs["question"], obs["context"])
+# 2. Answer from context only
+result = requests.post(f"{BASE}/step", json={"answer": "your answer"}).json()
+print(result["reward"], result["is_hallucination"])
+```
+### Python SDK
+```python
+pip install hallucination-guard-sdk   # coming soon
+from hallucination_guard import HallucinationGuardEnv
+env = HallucinationGuardEnv()
+obs = env.reset()
+result = env.step(obs["question"], obs["context"], your_model)
+```
+    """,
+    version="3.0.0",
+    contact={"name": "HallucinationGuard", "url": "https://huggingface.co/spaces/SamSankar/hallucination-guard-env"},
+    license_info={"name": "MIT"},
+)
+# CORS — allow all origins so any company/researcher can call this API
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
 )
+# ── State ──────────────────────────────────────────────────────────────────────
 _sessions: Dict[str, HallucinationEnvironment] = {}
 _default_env: Optional[HallucinationEnvironment] = None
+# Leaderboard: { model_name: {score, hallucination_rate, episodes, submitted_at} }
+_leaderboard: Dict[str, Dict[str, Any]] = {}
 def _get_default_env() -> HallucinationEnvironment:
     global _default_env
 def _safe_dict(obj):
     if dataclasses.is_dataclass(obj):
+        return {f.name: _safe_dict(getattr(obj, f.name)) for f in dataclasses.fields(obj)}
     elif isinstance(obj, enum.Enum):
         return obj.value
     elif isinstance(obj, dict):
     return str(obj)
+# ── Root ───────────────────────────────────────────────────────────────────────
+@app.get("/", include_in_schema=False)
+async def root():
+    return RedirectResponse(url="/docs")
 # ── Standard stateless endpoints ──────────────────────────────────────────────
+@app.post("/reset", summary="Start a new episode", tags=["Environment"])
 async def reset(body: Dict[str, Any] = {}):
+    """
+    Reset the environment and receive the first question + context.
+    **Returns:** question, context, difficulty, attempts_remaining, skill_rating
+    **Optional body params:**
+    - `seed` (int): reproducible episode
+    - `difficulty` (str): beginner | intermediate | advanced | expert
+    - `episode_id` (str): custom episode ID
+    """
     try:
         env = _get_default_env()
         obs = env.reset(**{k: v for k, v in body.items()
         raise HTTPException(status_code=500, detail=str(e))
+@app.post("/step", summary="Submit an answer", tags=["Environment"])
 async def step(action_data: Dict[str, Any]):
+    """
+    Submit an answer to the current question.
+    **Body:**
+    ```json
+    {"answer": "Your answer based ONLY on the provided context"}
+    ```
+    **Returns:** reward (-1 to 1), is_hallucination, hallucination_type,
+    grounding_score, feedback, next question + context
+    """
     try:
         env = _get_default_env()
         valid = {f.name for f in dataclasses.fields(HallucinationAction)}
         raise HTTPException(status_code=500, detail=str(e))
+@app.get("/state", summary="Get current episode state", tags=["Environment"])
 async def get_state():
+    """Returns full episode state: step count, accuracy, skill rating, streaks."""
     try:
         return JSONResponse(content=_safe_dict(_get_default_env().state()))
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
+# ── Session endpoints ──────────────────────────────────────────────────────────
+@app.post("/session/reset", summary="Create a stateful session", tags=["Sessions"])
+async def session_reset(body: Dict[str, Any] = {},
+                        x_session_id: Optional[str] = Header(None)):
+    """
+    Create a persistent session for multi-turn evaluation.
+    Pass `X-Session-Id` header to reuse an existing session.
+    Returns a `session_id` to use in subsequent calls.
+    """
     session_id = x_session_id or str(uuid.uuid4())
     if session_id in _sessions:
         _sessions[session_id].close()
                                                    "enable_multi_turn", "enable_context_retrieval")})
     result = _safe_dict(obs)
     result["session_id"] = session_id
     return result
+@app.post("/session/step", summary="Step in a session", tags=["Sessions"])
+async def session_step(action_data: Dict[str, Any],
+                       x_session_id: str = Header(...)):
+    """Submit an answer within a named session. Requires `X-Session-Id` header."""
     if x_session_id not in _sessions:
         raise HTTPException(status_code=404,
                             detail=f"Session {x_session_id} not found. Call /session/reset first.")
     return result
+@app.delete("/session", summary="Close a session", tags=["Sessions"])
+async def close_session(x_session_id: str = Header(...)):
     """Close and clean up a session."""
     if x_session_id in _sessions:
         _sessions[x_session_id].close()
     return {"status": "closed", "session_id": x_session_id}
+@app.get("/session/list", summary="List active sessions", tags=["Sessions"])
+async def list_sessions():
     return {"active_sessions": len(_sessions), "session_ids": list(_sessions.keys())}
+# ── Leaderboard ─────────────────────────────────────��──────────────────────────
+@app.get("/leaderboard", summary="Model leaderboard", tags=["Leaderboard"])
+async def get_leaderboard():
+    """
+    Returns ranked leaderboard of all submitted model evaluations.
+    Ranked by avg_reward descending.
+    """
+    if not _leaderboard:
+        return {"leaderboard": [], "total_models": 0,
+                "message": "No models submitted yet. Use POST /leaderboard/submit"}
+    ranked = sorted(_leaderboard.values(), key=lambda x: x.get("avg_reward", 0), reverse=True)
+    for i, entry in enumerate(ranked):
+        entry["rank"] = i + 1
+    return {
+        "leaderboard": ranked,
+        "total_models": len(ranked),
+        "last_updated": max(e.get("submitted_at", 0) for e in ranked),
+    }
+@app.post("/leaderboard/submit", summary="Submit model evaluation results", tags=["Leaderboard"])
+async def submit_to_leaderboard(data: Dict[str, Any]):
+    """
+    Submit your model's evaluation results to the leaderboard.
+    **Required fields:**
+    ```json
+    {
+      "model_name": "gpt-4o",
+      "avg_reward": 0.72,
+      "avg_accuracy": 0.81,
+      "hallucination_rate": 0.19,
+      "total_episodes": 10,
+      "total_steps": 100
+    }
+    ```
+    **Optional:** `organization`, `model_version`, `notes`
+    """
+    required = ["model_name", "avg_reward", "avg_accuracy",
+                "hallucination_rate", "total_episodes", "total_steps"]
+    missing = [f for f in required if f not in data]
+    if missing:
+        raise HTTPException(status_code=422,
+                            detail=f"Missing required fields: {missing}")
+    model_name = data["model_name"]
+    _leaderboard[model_name] = {
+        "model_name":        model_name,
+        "organization":      data.get("organization", ""),
+        "model_version":     data.get("model_version", ""),
+        "avg_reward":        round(float(data["avg_reward"]), 4),
+        "avg_accuracy":      round(float(data["avg_accuracy"]), 4),
+        "hallucination_rate": round(float(data["hallucination_rate"]), 4),
+        "total_episodes":    int(data["total_episodes"]),
+        "total_steps":       int(data["total_steps"]),
+        "notes":             data.get("notes", ""),
+        "submitted_at":      time.time(),
+    }
+    logger.info(f"Leaderboard submission: {model_name} reward={data['avg_reward']:.3f}")
+    return {"status": "submitted", "model_name": model_name,
+            "message": f"'{model_name}' added to leaderboard. View at /leaderboard"}
+@app.delete("/leaderboard/{model_name}", summary="Remove from leaderboard", tags=["Leaderboard"])
+async def remove_from_leaderboard(model_name: str):
+    """Remove a model entry from the leaderboard."""
+    if model_name not in _leaderboard:
+        raise HTTPException(status_code=404, detail=f"Model '{model_name}' not found")
+    del _leaderboard[model_name]
+    return {"status": "removed", "model_name": model_name}
+# ── Info & metrics ─────────────────────────────────────────────────────────────
+@app.get("/health", summary="Health check", tags=["Info"])
 async def health():
+    return {"status": "healthy", "service": "HallucinationGuard-Env", "version": "3.0.0"}
+@app.get("/environment/info", summary="Full environment spec", tags=["Info"])
+async def env_info():
+    return {
+        "name":    "HallucinationGuard-Env",
+        "version": "3.0.0",
+        "description": "Production RL environment for hallucination detection & prevention",
+        "datasets": {
+            "count": 13,
+            "total_examples": "50,000+",
+            "sources": [
+                "squad", "trivia_qa", "halueval", "truthful_qa",
+                "natural_questions", "hotpotqa", "boolq", "faithdial",
+                "fever", "arc", "openbookqa", "ms_marco", "coqa",
+            ],
+        },
+        "endpoints": {
+            "environment": ["/reset", "/step", "/state"],
+            "sessions":    ["/session/reset", "/session/step", "/session/list", "/session"],
+            "leaderboard": ["/leaderboard", "/leaderboard/submit"],
+            "info":        ["/health", "/environment/info", "/datasets", "/metrics"],
+        },
+        "difficulty_levels":    ["beginner", "intermediate", "advanced", "expert"],
+        "hallucination_types":  [
+            "fabricated_fact", "false_citation", "overconfident_wrong",
+            "context_drift", "numerical_fabrication", "entity_confusion",
+        ],
+        "reward_range":    [-1.0, 1.0],
+        "supported_frameworks": ["OpenAI Gym", "OpenEnv", "custom Python", "REST API"],
+    }
+@app.get("/datasets", summary="Dataset statistics", tags=["Info"])
+async def dataset_info():
+    """Returns breakdown of loaded datasets by source, difficulty, and category."""
+    try:
+        env = _get_default_env()
+        stats = env.dataset_loader.get_statistics()
+        return {
+            "total_examples":         stats.total_examples,
+            "by_source":              stats.examples_by_source,
+            "by_difficulty":          stats.examples_by_difficulty,
+            "by_category":            stats.examples_by_category,
+            "avg_context_length":     round(stats.average_context_length, 1),
+            "avg_question_length":    round(stats.average_question_length, 1),
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/metrics", summary="Real-time metrics", tags=["Metrics"])
 async def get_metrics():
     try:
         return get_tracker().get_real_time_metrics()
         raise HTTPException(status_code=500, detail=str(e))
+@app.get("/metrics/summary", summary="Metrics summary report", tags=["Metrics"])
 async def metrics_summary():
     try:
         return {"summary": get_tracker().generate_summary_report()}
         raise HTTPException(status_code=500, detail=str(e))
+# ── Middleware ─────────────────────────────────────────────────────────────────
 @app.middleware("http")
 async def log_requests(request, call_next):
     return response
 if __name__ == "__main__":
     import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

server/dataset_loader.py CHANGED Viewed

The diff for this file is too large to render. See raw diff