Buckets:
166 kB
39 files
Updated about 1 month ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| __pycache__ | 1 items | ||
| agent | 6 items | ||
| frontend | 3 items | ||
| model_server | 4 items | ||
| rag | 19 items | ||
| .env | 353 Bytes xet | db3f11a9 | |
| .env.example | 628 Bytes xet | 1f9b2e43 | |
| .gitignore | 72 Bytes xet | aa9bf21c | |
| README.md | 4.94 kB xet | f66408a1 | |
| main.py | 15.6 kB xet | aa63dc2a | |
| requirements.txt | 284 Bytes xet | f4c6412b |
Insurance RAG System (Production-Ready)
A complete Retrieval-Augmented Generation (RAG) pipeline for insurance brokers:
- Ingests 3 policy documents (health, auto, home)
- Chunks and embeds locally with Hugging Face BGE
- Stores vectors in FAISS
- Retrieves + optional cross-encoder reranking
- Generates grounded answers via local Hugging Face LLM
- Adds source attribution (document + section)
- Supports continuous multi-turn chat sessions
- Ships with a frontend to test quality and performance
1) Setup
cd insurance-rag
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Create .env from template:
cp .env.example .env
Set at least:
HF_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct
HF_LLM_DEVICE=auto
HF_LLM_DTYPE=auto
HF_LLM_LOAD_IN_4BIT=false
CHUNK_SIZE=512
CHUNK_OVERLAP=64
TOP_K=4
GPU profile recommendations:
4GB VRAM:HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct,HF_LLM_LOAD_IN_4BIT=false16GB VRAM(better quality):HF_LLM_MODEL=Raj-Maharajwala/Open-Insurance-LLM-Llama3-8B,HF_LLM_LOAD_IN_4BIT=true
2) Build Index (One-time)
python -c "from rag.vector_store import build_index; build_index()"
Or through API:
curl -X POST http://localhost:8000/index/build
3) Run API + Frontend
python -m uvicorn main:app --reload --port 8000
Open browser:
http://localhost:8000/(frontend test console)- API docs:
http://localhost:8000/docs
4) Quick Tests
Health:
curl http://localhost:8000/health
RAG query:
curl -X POST http://localhost:8000/rag/query \
-H "Content-Type: application/json" \
-d '{"question": "Is flood damage covered in home insurance?", "session_id": "test-1", "use_reranker": true}'
Continuous chat:
curl -X POST http://localhost:8000/chat/continuous \
-H "Content-Type: application/json" \
-d '{"message": "What is the flood deductible?", "session_id": "broker-42", "use_rag_context": true, "use_reranker": true}'
Agent chat (tool-calling loop):
curl -X POST http://localhost:8000/agent/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is my premium for coverage 120000 and risk score 1.2?","session_id":"agent-1","use_reranker":true,"max_turns":6}'
5) Architecture
- Broker asks question
BAAI/bge-small-en-v1.5creates query embedding (local)- FAISS retrieves top-k policy chunks
- Cross-encoder reranker reorders chunks (bonus layer)
- Prompt injects context + question
- Local Hugging Face LLM produces grounded answer with citations
6) Chunking Strategy (Justification)
- Split preference: paragraph (
\n\n) -> line (\n) -> sentence (.) -> word () CHUNK_SIZE=512: insurance clauses are usually 300-500 words, so one clause fits in one chunkCHUNK_OVERLAP=64: avoids orphaned tail phrases like "without rider" when they straddle boundaries
Example:
Section 8: Exclusions ... Flood (without rider)- Overlap preserves the clause boundary so retrieval still captures meaning and context
7) Continuous Chat
- Session memory keyed by
session_id - Stores recent turns up to
MAX_HISTORY - Optional
use_rag_context=trueinjects retrieved policy context every turn - Streaming variant:
/chat/continuous/stream
8) Tool-Calling Agent (Part 3)
- Endpoint:
/agent/chat - Stateful per-session history (in-memory)
- Hand-rolled loop with max 6 turns
- Tool set:
search_policy(query)-> RAG retrievalcalculate_premium(coverage, risk_score)->coverage * 0.002 * risk_scorecheck_claim_status(claim_id)-> hardcoded mock claim map
- Malformed tool-call JSON is handled gracefully and retried inside the loop
- Session management:
GET /agent/sessionsDELETE /agent/session/{session_id}
9) Performance Testing
Use frontend + API:
/rag/benchmarkfor repeated run stats (avg,p50,p95,min,max)/rag/evaluate/jsonto validate model answers against a JSON QA set (example:../rag.json)/metricsfor rolling endpoint latency- Frontend includes one-click benchmark and metrics refresh
10) Project Layout
insurance-rag/
├── .env
├── .env.example
├── requirements.txt
├── README.md
├── main.py
├── rag/
│ ├── __init__.py
│ ├── documents/
│ │ ├── health_policy.txt
│ │ ├── auto_policy.txt
│ │ └── home_policy.txt
│ ├── embedder.py
│ ├── chunker.py
│ ├── vector_store.py
│ ├── reranker.py
│ └── pipeline.py
├── model_server/
│ ├── __init__.py
│ └── llm_client.py
├── agent/
│ ├── __init__.py
│ ├── engine.py
│ └── tools.py
└── frontend/
├── index.html
├── styles.css
└── app.js
- Total size
- 166 kB
- Files
- 39
- Last updated
- Apr 10
- Pre-warmed CDN
- US EU US EU