SamSankar commited on
Commit
7e29df1
·
1 Parent(s): 4c3ad6e

Streamline CLAUDE.md: remove session history, add sync workflow info

Browse files
Files changed (1) hide show
  1. CLAUDE.md +168 -0
CLAUDE.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ HallucinationGuard-Env is an OpenEnv RL environment for training LLMs to avoid hallucinations. It runs as a FastAPI server on HuggingFace Spaces with 3 graded tasks (beginner → advanced) and a 9-component reward system.
8
+
9
+ ## Key Commands
10
+
11
+ ```bash
12
+ # Install dependencies
13
+ pip install -r server/requirements.txt
14
+
15
+ # Run server locally (port 7860)
16
+ uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
17
+
18
+ # Run heuristic baseline (no API key needed)
19
+ python inference.py --heuristic --env-url http://localhost:7860
20
+
21
+ # Run tests
22
+ pytest tests/ # All tests
23
+ pytest tests/test_grader.py -v # Specific test file
24
+ pytest tests/test_grader.py::TestGraderScoreRange -v # Specific test class
25
+
26
+ # Lint (CI uses this)
27
+ ruff check . --ignore E501,F401,F403
28
+
29
+ # Docker build
30
+ docker build -t hallucination-guard-env .
31
+ docker run -p 7860:7860 hallucination-guard-env
32
+ ```
33
+
34
+ ## Running with LLM APIs
35
+
36
+ ### Groq (cloud)
37
+ ```bash
38
+ export API_BASE_URL=https://api.groq.com/openai/v1
39
+ export MODEL_NAME=llama-3.3-70b-versatile
40
+ export HF_TOKEN=gsk_your_key_here
41
+ python inference.py --env-url http://localhost:7860 --episodes 3 --steps 5 --seed 42
42
+ ```
43
+
44
+ ### Ollama (local)
45
+ ```bash
46
+ ollama pull qwen2.5:7b
47
+ export API_BASE_URL=http://localhost:11434/v1
48
+ export MODEL_NAME=qwen2.5:7b
49
+ export HF_TOKEN=ollama # Any non-empty value triggers LLM mode
50
+ python inference.py --env-url http://localhost:7860 --episodes 3 --steps 5 --seed 42
51
+ ```
52
+
53
+ ## Critical Dependencies
54
+
55
+ - **NumPy must be <2.0.0** — Pre-compiled packages (sentence-transformers, bert-score) crash with NumPy 2.x. Pinned in requirements.
56
+ - **Protobuf required** — BERTScore dependency; explicitly listed in requirements.
57
+
58
+ ## Architecture
59
+
60
+ ```
61
+ server/
62
+ ├── app.py # FastAPI endpoints
63
+ ├── environment.py # HallucinationEnvironment class (OpenEnv step/reset/state)
64
+ ├── grader.py # 9-component reward calculation + refusal handling + explanations
65
+ ├── dataset_loader.py # Loads 38 datasets from HF cache
66
+ └── tasks.py # Task registry with difficulty-weighted graders
67
+
68
+ models.py # Pydantic models: HallucinationAction, HallucinationObservation, HallucinationState
69
+ inference.py # Hackathon submission script (OpenAI-compatible client)
70
+ ```
71
+
72
+ ### Data Flow
73
+
74
+ 1. **reset()** → Samples question from dataset_loader, returns HallucinationObservation
75
+ 2. **step(HallucinationAction)** → Grades answer via grader.py, returns reward + feedback
76
+ 3. **grader.calculate_reward()** → 9 components (see Reward System below)
77
+ 4. **tasks.compute_task_score()** → Aggregates per-step rewards into 0.0-1.0 task score
78
+
79
+ ### API Endpoints
80
+
81
+ | Category | Method | Endpoint | Description |
82
+ |----------|--------|----------|-------------|
83
+ | Environment | POST | `/reset` | Start new episode |
84
+ | Environment | POST | `/step` | Submit answer |
85
+ | Environment | GET | `/state` | Get episode state |
86
+ | Batch | POST | `/batch/evaluate` | Evaluate multiple Q&A pairs |
87
+ | Batch | POST | `/batch/stream` | Streaming batch (NDJSON) |
88
+ | Metrics | GET | `/metrics/timing` | Time-per-step latency stats |
89
+ | Leaderboard | GET | `/leaderboard/viz` | Chart data (bar, scatter, tiers) |
90
+ | OpenEnv | GET | `/tasks` | List tasks + action schema |
91
+ | OpenEnv | POST | `/grader` | Score completed episode |
92
+ | OpenEnv | POST | `/baseline` | Run heuristic baseline |
93
+
94
+ ### Dataset Loading
95
+
96
+ Datasets load from `SamSankar/hallucination-guard-cache` HF Dataset repo. Core datasets load synchronously on startup; extended datasets load in background thread. Cached locally at `/tmp/halluguard_cache/`.
97
+
98
+ ### Model Preloading
99
+
100
+ ML models (sentence-transformers, CrossEncoder/NLI, ROUGE, BERTScore) preload at server startup in `lifespan()` to avoid 30-60s cold start delays. Environment variable `HF_HOME=/tmp/hf_cache` replaces deprecated `TRANSFORMERS_CACHE`.
101
+
102
+ ## Reward System (grader.py)
103
+
104
+ 9-component reward system:
105
+
106
+ | Component | Weight | Description |
107
+ |-----------|--------|-------------|
108
+ | factual_correctness | 0.35 | Exact/fuzzy match + semantic similarity to ground truth |
109
+ | source_grounding | 0.20 | Answer supported by context (reduced for wrong answers) |
110
+ | citation_accuracy | 0.10 | source_quote found verbatim in context |
111
+ | confidence_calibration | 0.10 | ECE between stated confidence and correctness |
112
+ | semantic_consistency | 0.10 | NLI entailment score (DeBERTa-v3) |
113
+ | hallucination_penalty | 0.10 | Penalizes detected hallucinations |
114
+ | rouge_score | 0.02 | ROUGE-1/2/L overlap with reference |
115
+ | bertscore | 0.02 | Token-level semantic similarity |
116
+ | alignscore | 0.01 | Faithfulness to context (RoBERTa) |
117
+
118
+ **Key behavior:**
119
+ - Wrong answers capped at ~0.4 reward regardless of grounding
120
+ - Grounding contribution reduced for incorrect answers
121
+ - Difficulty multiplier: beginner×0.9, intermediate×1.0, advanced×1.1, expert×1.2
122
+
123
+ ## Refusal Handling
124
+
125
+ The grader detects when models appropriately refuse to answer unanswerable questions:
126
+
127
+ | Scenario | Reward | Behavior |
128
+ |----------|--------|----------|
129
+ | Proper refusal on unanswerable | 0.65–0.80 | Rewarded for honesty |
130
+ | Refusal with low confidence | 0.50 | Partial credit |
131
+ | Underconfident refusal (answer exists) | 0.30 | Penalized for not trying |
132
+
133
+ Detected phrases: "I cannot answer", "not in the context", "I don't know", "cannot determine", "insufficient information". See `is_refusal_answer()` in grader.py.
134
+
135
+ ## Pydantic Models
136
+
137
+ All models inherit from `openenv.core.env_server.Action`, `Observation`, `State` (Pydantic BaseModel, not dataclass). When modifying:
138
+ - Use `Field(default_factory=...)` not `field(default_factory=...)`
139
+ - Use `str` for enum values in model fields (e.g., `difficulty: str = "intermediate"`)
140
+ - Serialization uses `_safe_dict()` in app.py which handles Pydantic models via `model_dump()`
141
+
142
+ ## Test Structure
143
+
144
+ ```
145
+ tests/
146
+ ├── test_grader.py # 20 tests: reward calculation, refusal handling, hallucination detection
147
+ ├── test_adversarial.py # 18 tests: HaluEval, TruthfulQA edge cases
148
+ ├── test_endpoints.py # 15 tests: batch eval, metrics, leaderboard endpoints
149
+ ├── test_environment.py # 13 tests: reset/step behavior
150
+ └── test_dataset_loader.py # 14 tests: dataset loading, caching
151
+ ```
152
+
153
+ Run with `pytest tests/ -v`. CI runs automatically via `.github/workflows/test.yml`.
154
+
155
+ ## Repositories
156
+
157
+ - **GitHub:** https://github.com/SS-360/hallucination-guard-env
158
+ - **HuggingFace Space:** https://huggingface.co/spaces/SamSankar/hallucination-guard-env
159
+
160
+ Changes pushed to GitHub automatically sync to HuggingFace Spaces via `.github/workflows/sync-to-hf.yml`. Requires `HF_TOKEN` secret with write permissions in GitHub repo settings.
161
+
162
+ ## Baseline Scores
163
+
164
+ Heuristic agent (seed=42, 3 episodes × 5 steps):
165
+ - task_1_factual_grounding: 0.29 (±0.15)
166
+ - task_2_multi_hop_synthesis: 0.25 (±0.14)
167
+ - task_3_adversarial_resistance: 0.22 (±0.16)
168
+ - Overall: 0.25