Buckets:
| # Insurance RAG System (Production-Ready) | |
| A complete Retrieval-Augmented Generation (RAG) pipeline for insurance brokers: | |
| - Ingests 3 policy documents (health, auto, home) | |
| - Chunks and embeds locally with Hugging Face BGE | |
| - Stores vectors in FAISS | |
| - Retrieves + optional cross-encoder reranking | |
| - Generates grounded answers via local Hugging Face LLM | |
| - Adds source attribution (document + section) | |
| - Supports continuous multi-turn chat sessions | |
| - Ships with a frontend to test quality and performance | |
| ## 1) Setup | |
| ```bash | |
| cd insurance-rag | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| Create `.env` from template: | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| Set at least: | |
| ```env | |
| HF_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 | |
| HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct | |
| HF_LLM_DEVICE=auto | |
| HF_LLM_DTYPE=auto | |
| HF_LLM_LOAD_IN_4BIT=false | |
| CHUNK_SIZE=512 | |
| CHUNK_OVERLAP=64 | |
| TOP_K=4 | |
| ``` | |
| GPU profile recommendations: | |
| - `4GB VRAM`: `HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct`, `HF_LLM_LOAD_IN_4BIT=false` | |
| - `16GB VRAM` (better quality): `HF_LLM_MODEL=Raj-Maharajwala/Open-Insurance-LLM-Llama3-8B`, `HF_LLM_LOAD_IN_4BIT=true` | |
| ## 2) Build Index (One-time) | |
| ```bash | |
| python -c "from rag.vector_store import build_index; build_index()" | |
| ``` | |
| Or through API: | |
| ```bash | |
| curl -X POST http://localhost:8000/index/build | |
| ``` | |
| ## 3) Run API + Frontend | |
| ```bash | |
| python -m uvicorn main:app --reload --port 8000 | |
| ``` | |
| Open browser: | |
| - `http://localhost:8000/` (frontend test console) | |
| - API docs: `http://localhost:8000/docs` | |
| ## 4) Quick Tests | |
| Health: | |
| ```bash | |
| curl http://localhost:8000/health | |
| ``` | |
| RAG query: | |
| ```bash | |
| curl -X POST http://localhost:8000/rag/query \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"question": "Is flood damage covered in home insurance?", "session_id": "test-1", "use_reranker": true}' | |
| ``` | |
| Continuous chat: | |
| ```bash | |
| curl -X POST http://localhost:8000/chat/continuous \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"message": "What is the flood deductible?", "session_id": "broker-42", "use_rag_context": true, "use_reranker": true}' | |
| ``` | |
| Agent chat (tool-calling loop): | |
| ```bash | |
| curl -X POST http://localhost:8000/agent/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"message":"What is my premium for coverage 120000 and risk score 1.2?","session_id":"agent-1","use_reranker":true,"max_turns":6}' | |
| ``` | |
| ## 5) Architecture | |
| 1. Broker asks question | |
| 2. `BAAI/bge-small-en-v1.5` creates query embedding (local) | |
| 3. FAISS retrieves top-k policy chunks | |
| 4. Cross-encoder reranker reorders chunks (bonus layer) | |
| 5. Prompt injects context + question | |
| 6. Local Hugging Face LLM produces grounded answer with citations | |
| ## 6) Chunking Strategy (Justification) | |
| - Split preference: paragraph (`\n\n`) -> line (`\n`) -> sentence (`.`) -> word (` `) | |
| - `CHUNK_SIZE=512`: insurance clauses are usually 300-500 words, so one clause fits in one chunk | |
| - `CHUNK_OVERLAP=64`: avoids orphaned tail phrases like "without rider" when they straddle boundaries | |
| Example: | |
| - `Section 8: Exclusions ... Flood (without rider)` | |
| - Overlap preserves the clause boundary so retrieval still captures meaning and context | |
| ## 7) Continuous Chat | |
| - Session memory keyed by `session_id` | |
| - Stores recent turns up to `MAX_HISTORY` | |
| - Optional `use_rag_context=true` injects retrieved policy context every turn | |
| - Streaming variant: `/chat/continuous/stream` | |
| ## 8) Tool-Calling Agent (Part 3) | |
| - Endpoint: `/agent/chat` | |
| - Stateful per-session history (in-memory) | |
| - Hand-rolled loop with max 6 turns | |
| - Tool set: | |
| - `search_policy(query)` -> RAG retrieval | |
| - `calculate_premium(coverage, risk_score)` -> `coverage * 0.002 * risk_score` | |
| - `check_claim_status(claim_id)` -> hardcoded mock claim map | |
| - Malformed tool-call JSON is handled gracefully and retried inside the loop | |
| - Session management: | |
| - `GET /agent/sessions` | |
| - `DELETE /agent/session/{session_id}` | |
| ## 9) Performance Testing | |
| Use frontend + API: | |
| - `/rag/benchmark` for repeated run stats (`avg`, `p50`, `p95`, `min`, `max`) | |
| - `/rag/evaluate/json` to validate model answers against a JSON QA set (example: `../rag.json`) | |
| - `/metrics` for rolling endpoint latency | |
| - Frontend includes one-click benchmark and metrics refresh | |
| ## 10) Project Layout | |
| ```text | |
| insurance-rag/ | |
| ├── .env | |
| ├── .env.example | |
| ├── requirements.txt | |
| ├── README.md | |
| ├── main.py | |
| ├── rag/ | |
| │ ├── __init__.py | |
| │ ├── documents/ | |
| │ │ ├── health_policy.txt | |
| │ │ ├── auto_policy.txt | |
| │ │ └── home_policy.txt | |
| │ ├── embedder.py | |
| │ ├── chunker.py | |
| │ ├── vector_store.py | |
| │ ├── reranker.py | |
| │ └── pipeline.py | |
| ├── model_server/ | |
| │ ├── __init__.py | |
| │ └── llm_client.py | |
| ├── agent/ | |
| │ ├── __init__.py | |
| │ ├── engine.py | |
| │ └── tools.py | |
| └── frontend/ | |
| ├── index.html | |
| ├── styles.css | |
| └── app.js | |
| ``` | |
Xet Storage Details
- Size:
- 4.94 kB
- Xet hash:
- f66408a139332175ee3206a7168686e61754e334c9ae49e805d332f19ffc3c07
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.