166 kB
39 files
Updated about 1 month ago
NameSize
__pycache__
agent
frontend
model_server
rag
.env353 Bytes
xet
.env.example628 Bytes
xet
.gitignore72 Bytes
xet
README.md4.94 kB
xet
main.py15.6 kB
xet
requirements.txt284 Bytes
xet
README.md

Insurance RAG System (Production-Ready)

A complete Retrieval-Augmented Generation (RAG) pipeline for insurance brokers:

  • Ingests 3 policy documents (health, auto, home)
  • Chunks and embeds locally with Hugging Face BGE
  • Stores vectors in FAISS
  • Retrieves + optional cross-encoder reranking
  • Generates grounded answers via local Hugging Face LLM
  • Adds source attribution (document + section)
  • Supports continuous multi-turn chat sessions
  • Ships with a frontend to test quality and performance

1) Setup

cd insurance-rag
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create .env from template:

cp .env.example .env

Set at least:

HF_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct
HF_LLM_DEVICE=auto
HF_LLM_DTYPE=auto
HF_LLM_LOAD_IN_4BIT=false
CHUNK_SIZE=512
CHUNK_OVERLAP=64
TOP_K=4

GPU profile recommendations:

  • 4GB VRAM: HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct, HF_LLM_LOAD_IN_4BIT=false
  • 16GB VRAM (better quality): HF_LLM_MODEL=Raj-Maharajwala/Open-Insurance-LLM-Llama3-8B, HF_LLM_LOAD_IN_4BIT=true

2) Build Index (One-time)

python -c "from rag.vector_store import build_index; build_index()"

Or through API:

curl -X POST http://localhost:8000/index/build

3) Run API + Frontend

python -m uvicorn main:app --reload --port 8000

Open browser:

  • http://localhost:8000/ (frontend test console)
  • API docs: http://localhost:8000/docs

4) Quick Tests

Health:

curl http://localhost:8000/health

RAG query:

curl -X POST http://localhost:8000/rag/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Is flood damage covered in home insurance?", "session_id": "test-1", "use_reranker": true}'

Continuous chat:

curl -X POST http://localhost:8000/chat/continuous \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the flood deductible?", "session_id": "broker-42", "use_rag_context": true, "use_reranker": true}'

Agent chat (tool-calling loop):

curl -X POST http://localhost:8000/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"What is my premium for coverage 120000 and risk score 1.2?","session_id":"agent-1","use_reranker":true,"max_turns":6}'

5) Architecture

  1. Broker asks question
  2. BAAI/bge-small-en-v1.5 creates query embedding (local)
  3. FAISS retrieves top-k policy chunks
  4. Cross-encoder reranker reorders chunks (bonus layer)
  5. Prompt injects context + question
  6. Local Hugging Face LLM produces grounded answer with citations

6) Chunking Strategy (Justification)

  • Split preference: paragraph (\n\n) -> line (\n) -> sentence (.) -> word ( )
  • CHUNK_SIZE=512: insurance clauses are usually 300-500 words, so one clause fits in one chunk
  • CHUNK_OVERLAP=64: avoids orphaned tail phrases like "without rider" when they straddle boundaries

Example:

  • Section 8: Exclusions ... Flood (without rider)
  • Overlap preserves the clause boundary so retrieval still captures meaning and context

7) Continuous Chat

  • Session memory keyed by session_id
  • Stores recent turns up to MAX_HISTORY
  • Optional use_rag_context=true injects retrieved policy context every turn
  • Streaming variant: /chat/continuous/stream

8) Tool-Calling Agent (Part 3)

  • Endpoint: /agent/chat
  • Stateful per-session history (in-memory)
  • Hand-rolled loop with max 6 turns
  • Tool set:
    • search_policy(query) -> RAG retrieval
    • calculate_premium(coverage, risk_score) -> coverage * 0.002 * risk_score
    • check_claim_status(claim_id) -> hardcoded mock claim map
  • Malformed tool-call JSON is handled gracefully and retried inside the loop
  • Session management:
    • GET /agent/sessions
    • DELETE /agent/session/{session_id}

9) Performance Testing

Use frontend + API:

  • /rag/benchmark for repeated run stats (avg, p50, p95, min, max)
  • /rag/evaluate/json to validate model answers against a JSON QA set (example: ../rag.json)
  • /metrics for rolling endpoint latency
  • Frontend includes one-click benchmark and metrics refresh

10) Project Layout

insurance-rag/
├── .env
├── .env.example
├── requirements.txt
├── README.md
├── main.py
├── rag/
│   ├── __init__.py
│   ├── documents/
│   │   ├── health_policy.txt
│   │   ├── auto_policy.txt
│   │   └── home_policy.txt
│   ├── embedder.py
│   ├── chunker.py
│   ├── vector_store.py
│   ├── reranker.py
│   └── pipeline.py
├── model_server/
│   ├── __init__.py
│   └── llm_client.py
├── agent/
│   ├── __init__.py
│   ├── engine.py
│   └── tools.py
└── frontend/
    ├── index.html
    ├── styles.css
    └── app.js
Total size
166 kB
Files
39
Last updated
Apr 10
Pre-warmed CDN
US EU US EU

Contributors