kumar6591 commited on
Commit
640f531
·
verified ·
1 Parent(s): c10fded

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -255
README.md CHANGED
@@ -1,262 +1,9 @@
1
- # DataQualityEnv
2
  ---
3
  title: data-quality-env
 
4
  emoji: 🚀
5
  colorFrom: blue
6
  colorTo: green
7
- sdk: docker
8
- pinned: false
9
- ---
10
- ## Environment description
11
- DataQualityEnv is an OpenEnv-compliant RL environment where an agent acts as a data quality auditor.
12
- For each episode, the environment generates a seeded dirty relational dataset, loads it into in-memory DuckDB, and exposes schema + row count.
13
- The agent performs multi-turn SQL `SELECT` investigation and submits a structured JSON audit report for deterministic grading.
14
-
15
- ## Plain-English summary
16
- This project trains and evaluates an AI agent that behaves like a data quality analyst.
17
-
18
- - The environment creates broken data on purpose.
19
- - The agent investigates the data with safe SQL queries.
20
- - The agent writes a final audit report.
21
- - The grader scores how accurately the report matches the hidden faults.
22
-
23
- In short: **inspect the data, reason about the problems, and submit a correct audit report**.
24
-
25
- ### Motivation (real-world utility)
26
- Modern analytics pipelines fail silently when null explosions, schema drift, and referential drift go unnoticed.
27
- This environment simulates a real data quality analyst workflow: inspect tables, run targeted SQL diagnostics, and submit an actionable incident report.
28
-
29
- ### Why this is useful
30
- - It models a real job that people actually do in production.
31
- - It gives agents a meaningful multi-step reasoning task.
32
- - It provides deterministic scores, which makes it suitable for RL training and benchmarking.
33
- - It is safe by design because only non-destructive SQL is allowed.
34
-
35
- ## How the environment works
36
- 1. Call `reset(task_id, seed)`.
37
- 2. The environment creates a reproducible dirty dataset and loads it into DuckDB.
38
- 3. The agent reads the schema and row count.
39
- 4. The agent uses `step(query)` to inspect the data.
40
- 5. The environment returns query results and partial reward signals.
41
- 6. When the agent is ready, it submits `step(submit_report)`.
42
- 7. The grader compares the report with the hidden truth and returns the final score.
43
-
44
- ### Score meaning
45
- - `1.0` = perfect audit report
46
- - `0.7` = partially correct, some key evidence missing
47
- - `0.0` = wrong or empty report
48
-
49
- ## Action space
50
- - query: `{"action_type": "query", "sql": "SELECT ..."}`
51
- - submit_report: `{"action_type": "submit_report", "report": AuditReport}`
52
-
53
- ## Observation space
54
- `task_description`, `table_name`, `schema`, `row_count`, `step`, `max_steps`, `last_query_result`, `last_action_error`
55
-
56
- ## Tasks
57
- | ID | Name | Difficulty | What agent must find |
58
- |----|------|-----------|---------------------|
59
- | 1 | Null & duplicate detection | Easy | Null counts per column, duplicate rows |
60
- | 2 | Schema violation repair | Medium | Type mismatches, range violations |
61
- | 3 | Silent data drift | Hard | Statistical shift, new categories, referential drift |
62
-
63
- ## What each task teaches
64
- - Task 1: basic data profiling and deduplication logic
65
- - Task 2: schema validation and data cleaning checks
66
- - Task 3: cross-snapshot drift analysis and anomaly detection
67
-
68
- ## Reward design
69
- - Final reward (on `submit_report`) is task score in `[0.0, 1.0]` from deterministic graders.
70
- - Intermediate query reward gives partial credit for meaningful investigative probes.
71
- - Example: detecting null-focused SQL probes, duplicate-analysis queries, cross-snapshot drift probes.
72
- - Safety penalty: destructive SQL attempts (`DROP`, `TRUNCATE`, etc.) return `-0.2`.
73
- - Efficiency penalty: repeating the exact same query incurs a small negative penalty.
74
-
75
- ## Recommended way to run this project
76
- If you are starting from the `meta` folder, use the helper scripts:
77
-
78
- ```bash
79
- ./run_env_server.sh
80
- ./run_high_grade_agent.sh
81
- ```
82
-
83
- If you want to run the environment directly:
84
-
85
- ```bash
86
- cd /Users/hemanthkunta/meta/data-quality-env
87
- python3 -m uvicorn env.app:app --app-dir /Users/hemanthkunta/meta/data-quality-env --host 0.0.0.0 --port 7860
88
- ```
89
-
90
- Then verify it:
91
-
92
- ```bash
93
- curl http://localhost:7860/health
94
- ```
95
-
96
- ## Baseline scores (seed=42, model=meta-llama/Llama-3.1-8B-Instruct)
97
- Task 1: ~0.82
98
- Task 2: ~0.61
99
- Task 3: ~0.34
100
-
101
- ## Setup
102
- ```bash
103
- docker build -t data-quality-env .
104
- docker run -p 7860:7860 \
105
- -e API_BASE_URL=https://router.huggingface.co/v1 \
106
- -e MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct \
107
- -e HF_TOKEN=your_token \
108
- -e ENV_URL=http://localhost:7860 \
109
- data-quality-env
110
- ```
111
-
112
- ## Local server run
113
- If you are running from the `meta` folder, start the server with the helper script:
114
-
115
- ```bash
116
- ./run_env_server.sh
117
- ```
118
-
119
- Or directly:
120
-
121
- ```bash
122
- cd /Users/hemanthkunta/meta/data-quality-env
123
- python3 -m uvicorn env.app:app --app-dir /Users/hemanthkunta/meta/data-quality-env --host 0.0.0.0 --port 7860
124
- ```
125
-
126
- ## Running inference
127
- ```bash
128
- python inference.py
129
- ```
130
-
131
- ### Judge compatibility (important)
132
- - Judges set `API_BASE_URL`, `HF_TOKEN`, and `MODEL_NAME`, then run `python inference.py`.
133
- - `inference.py` must use those env vars directly and execute real OpenAI-compatible chat calls.
134
- - If env vars are ignored (or values are hardcoded), the run can produce no LLM output and score `0`.
135
- - Deterministic fallback is intended only for missing/invalid credentials or explicit override.
136
- - For Phase 2 behavior, avoid replacing successful LLM execution with heuristic shortcuts.
137
-
138
- ## Chat-style assistant mode (ChatGPT/Gemini/Claude-like UX)
139
- You can run a conversational wrapper over the same OpenEnv backend:
140
-
141
- ```bash
142
- python chat_agent.py --task-id 1 --seed 42
143
- ```
144
-
145
- This adds a natural chat loop while preserving hackathon-required endpoints (`/reset`, `/step`, `/state`) and graders.
146
-
147
- ## High-grade hybrid tool agent
148
- For a stronger agentic runner (policy-guided query ordering + OpenAI report polishing):
149
-
150
- ```bash
151
- python high_grade_agent.py
152
- ```
153
-
154
- Optional:
155
- - train local RL policy first and reuse it for ordering probes:
156
- ```bash
157
- python scripts/train_rl_agent.py train --episodes 300 --output outputs/rl_policy.json
158
- RL_POLICY_PATH=outputs/rl_policy.json python high_grade_agent.py
159
- ```
160
-
161
- Advanced mode details:
162
- - Query planning uses an explicit bank of `100,000` deterministic algorithm configurations.
163
- - Each candidate algorithm is checked against environment safety/step constraints before selection.
164
- - Selection balances coverage, statistical signal, novelty, safety risk, and efficiency.
165
- - SQL planning is augmented with a reusable SQL probe library (`env/sql_brain.py`) and reference guide (`SQL_AGENT_MIND.md`).
166
-
167
- Validate the 100k bank:
168
- ```bash
169
- python scripts/check_100k_algorithms.py
170
- ```
171
-
172
- Read the full SQL command/function guide:
173
- ```bash
174
- cat SQL_AGENT_MIND.md
175
- ```
176
-
177
- Run deeper multi-seed scoring (robust test):
178
- ```bash
179
- python scripts/deep_evaluate_agent.py --seed-start 42 --runs 5
180
- ```
181
-
182
- If you are in the `meta` folder:
183
- ```bash
184
- python3 deep_evaluate_agent.py --seed-start 42 --runs 5
185
- ```
186
-
187
- ## Advanced shield architecture
188
- This project now includes all requested advanced components while staying hackathon-compliant:
189
-
190
- - **LLM reasoning**: hypothesis hints before planning (`high_grade_agent.py`)
191
- - **Planner-Executor-Critic loop**: LLM planner proposes extra probes, executor runs SQL tools, critic repairs final report schema
192
- - **RL fine-tuning**: tabular Q-learning policy training (`scripts/train_rl_agent.py`)
193
- - **Tool use**: SQL querying + report submission via `/step`
194
- - **Memory**: persistent successful plans (`env/agent_memory.py`, `outputs/agent_memory.json`)
195
- - **Knowledge brain**: deterministic evidence-to-report auto-fixer (`env/knowledge_brain.py`)
196
- - **Self-improvement loop**: iterative train + evaluate (`scripts/self_improve_loop.py`)
197
- - **Chat-style assistant**: multi-agent conversation wrapper (`chat_agent.py`) with planner/critic behavior
198
-
199
- If `API_BASE_URL` / `MODEL_NAME` / `HF_TOKEN` are missing, the advanced agent runs in deterministic fallback mode (no LLM calls) and still functions.
200
-
201
- Run full self-improvement cycle:
202
- ```bash
203
- python scripts/self_improve_loop.py --cycles 3 --episodes-per-cycle 200
204
- ```
205
-
206
- Or via make:
207
- ```bash
208
- make self-improve
209
- ```
210
-
211
- ## Self-learning RL policy (optional advanced track)
212
- This repo includes a lightweight tabular Q-learning trainer that learns a query policy from shaped rewards:
213
-
214
- ```bash
215
- python scripts/train_rl_agent.py train --episodes 300 --output outputs/rl_policy.json
216
- python scripts/train_rl_agent.py eval --policy outputs/rl_policy.json --episodes-per-task 5
217
- ```
218
-
219
- If you are in the `meta` folder, you can also run the root wrapper:
220
-
221
- ```bash
222
- python3 train_rl_agent.py train --episodes 300 --output data-quality-env/outputs/rl_policy.json
223
- ```
224
-
225
- Notes:
226
- - This is a practical local RL loop over a compact action set (SQL probe selection + submit).
227
- - It is designed for hackathon constraints (2 vCPU / 8GB RAM, <20 minute runtime).
228
- - Frontier-scale LLM RL (GRPO/PPO over billions of params) is out of scope for the submission runtime budget, but this environment is compatible with external RL trainers.
229
-
230
- ## Validate before submission
231
- ```bash
232
- openenv validate
233
- ./validate-submission.sh http://localhost:7860
234
- python scripts/local_qa.py
235
- python scripts/check_graders.py
236
- ```
237
-
238
- ## Troubleshooting
239
- - If you see `ModuleNotFoundError: No module named 'env'`, you started the server from the wrong directory. Use `./run_env_server.sh`.
240
- - If you see `address already in use`, the server is already running on port `7860`.
241
- - If the agent says the server is unreachable, run `curl http://localhost:7860/health` first.
242
- - If you want LLM-backed behavior, set `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN`.
243
-
244
- ## Hugging Face Spaces deployment (Docker SDK)
245
- 1. Create a public Docker Space.
246
- 2. Add `openenv` tag in Space settings.
247
- 3. Set variables/secrets:
248
- - `API_BASE_URL`
249
- - `MODEL_NAME`
250
- - `HF_TOKEN`
251
- - `ENV_URL` is not required for the Space UI path.
252
- - Keep these set in the Space even during evaluation so the app remains healthy 24/7.
253
- - Judges still inject their own values when they run `inference.py`.
254
- 4. The Space entrypoint is `space_app.py`, which mounts a Gradio UI and calls the environment in-process.
255
- 4. Verify:
256
- - `GET /health`
257
- - `POST /reset`
258
- - run `validate-submission.sh` against the Space URL.
259
-
260
  ---
261
 
262
  ## Description
@@ -352,4 +99,4 @@ python inference.py
352
  ## Validation
353
  ```bash
354
  ./validate-submission.sh https://your-space.hf.space
355
- ```
 
 
1
  ---
2
  title: data-quality-env
3
+ sdk: docker
4
  emoji: 🚀
5
  colorFrom: blue
6
  colorTo: green
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  ## Description
 
99
  ## Validation
100
  ```bash
101
  ./validate-submission.sh https://your-space.hf.space
102
+ ```