Siddhesh-Ai9797 commited on
Commit Β·
5cc7a6c
1
Parent(s): 6ef4315
Add README config
Browse files
README.md
CHANGED
|
@@ -1,78 +1,11 @@
|
|
| 1 |
-
# π€ Self-Correcting Data Validation Agent
|
| 2 |
-
### Schema-Enforced, No-Hallucination Agentic Data Pipeline
|
| 3 |
-
|
| 4 |
-
A production-style AI system that converts messy employee text into schema-valid JSON using:
|
| 5 |
-
|
| 6 |
-
- LangGraph (state machine orchestration)
|
| 7 |
-
- OpenAI LLMs (structured extraction)
|
| 8 |
-
- Pydantic v2 (strict schema validation)
|
| 9 |
-
- Pandas (deterministic execution)
|
| 10 |
-
- Streamlit (interactive UI)
|
| 11 |
-
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
## π§ Problem
|
| 15 |
-
|
| 16 |
-
LLMs hallucinate missing fields and fabricate identifiers.
|
| 17 |
-
|
| 18 |
-
This project enforces:
|
| 19 |
-
|
| 20 |
-
- Strict schema validation
|
| 21 |
-
- Bounded self-correction retries
|
| 22 |
-
- Deterministic query execution
|
| 23 |
-
- Explicit rejection handling
|
| 24 |
-
- Zero fabrication of required fields
|
| 25 |
-
|
| 26 |
---
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
β Self-Correction Loop (if needed)
|
| 34 |
-
β Final Valid JSON
|
| 35 |
-
|
| 36 |
-
Records missing required fields are rejected β never hallucinated.
|
| 37 |
-
|
| 38 |
---
|
| 39 |
|
| 40 |
-
#
|
| 41 |
-
|
| 42 |
-
extract β validate
|
| 43 |
-
if fail β correct β validate β repeat
|
| 44 |
-
finalize
|
| 45 |
-
|
| 46 |
-
Retry attempts are limited and controlled.
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
## π¬ Deterministic Query Engine
|
| 51 |
-
|
| 52 |
-
1. LLM generates structured query plan
|
| 53 |
-
2. Pandas executes it
|
| 54 |
-
3. LLM summarizes computed results
|
| 55 |
-
|
| 56 |
-
No synthetic answers.
|
| 57 |
-
|
| 58 |
-
---
|
| 59 |
-
|
| 60 |
-
## π Run Locally
|
| 61 |
-
|
| 62 |
-
```bash
|
| 63 |
-
git clone git@github.com:Siddhesh-Ai9797/self-correcting-data-validation-agent.git
|
| 64 |
-
cd self-correcting-data-validation-agent
|
| 65 |
-
|
| 66 |
-
python -m venv .venv
|
| 67 |
-
source .venv/bin/activate
|
| 68 |
-
pip install -r requirements.txt
|
| 69 |
-
|
| 70 |
-
export OPENAI_API_KEY="your_key_here"
|
| 71 |
-
streamlit run app.py
|
| 72 |
-
|
| 73 |
-
---
|
| 74 |
-
|
| 75 |
-
# Stress Testing
|
| 76 |
-
python -m src.eval.run_agent_suite
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Self Correcting Data Validation Agent
|
| 3 |
+
emoji: π€
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Self-Correcting Data Validation Agent
|
| 11 |
+
Production-style LLM agent using LangGraph, Pydantic v2 and self-correction loops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|