Buckets:

meet4150
/

insurence_model1

Files

xet

meet4150/insurence_model1 / README.md

meet4150

about 2 months ago

preview code

download

raw

4.94 kB

Insurance RAG System (Production-Ready)

A complete Retrieval-Augmented Generation (RAG) pipeline for insurance brokers:

Ingests 3 policy documents (health, auto, home)
Chunks and embeds locally with Hugging Face BGE
Stores vectors in FAISS
Retrieves + optional cross-encoder reranking
Generates grounded answers via local Hugging Face LLM
Adds source attribution (document + section)
Supports continuous multi-turn chat sessions
Ships with a frontend to test quality and performance

1) Setup

cd insurance-rag
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create .env from template:

cp .env.example .env

Set at least:

HF_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct
HF_LLM_DEVICE=auto
HF_LLM_DTYPE=auto
HF_LLM_LOAD_IN_4BIT=false
CHUNK_SIZE=512
CHUNK_OVERLAP=64
TOP_K=4

GPU profile recommendations:

4GB VRAM: HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct, HF_LLM_LOAD_IN_4BIT=false
16GB VRAM (better quality): HF_LLM_MODEL=Raj-Maharajwala/Open-Insurance-LLM-Llama3-8B, HF_LLM_LOAD_IN_4BIT=true

2) Build Index (One-time)

python -c "from rag.vector_store import build_index; build_index()"

Or through API:

curl -X POST http://localhost:8000/index/build

3) Run API + Frontend

python -m uvicorn main:app --reload --port 8000

Open browser:

http://localhost:8000/ (frontend test console)
API docs: http://localhost:8000/docs

4) Quick Tests

Health:

curl http://localhost:8000/health

RAG query:

curl -X POST http://localhost:8000/rag/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Is flood damage covered in home insurance?", "session_id": "test-1", "use_reranker": true}'

Continuous chat:

curl -X POST http://localhost:8000/chat/continuous \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the flood deductible?", "session_id": "broker-42", "use_rag_context": true, "use_reranker": true}'

Agent chat (tool-calling loop):

curl -X POST http://localhost:8000/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"What is my premium for coverage 120000 and risk score 1.2?","session_id":"agent-1","use_reranker":true,"max_turns":6}'

5) Architecture

Broker asks question
BAAI/bge-small-en-v1.5 creates query embedding (local)
FAISS retrieves top-k policy chunks
Cross-encoder reranker reorders chunks (bonus layer)
Prompt injects context + question
Local Hugging Face LLM produces grounded answer with citations

6) Chunking Strategy (Justification)

Split preference: paragraph (\n\n) -> line (\n) -> sentence (.) -> word ( )
CHUNK_SIZE=512: insurance clauses are usually 300-500 words, so one clause fits in one chunk
CHUNK_OVERLAP=64: avoids orphaned tail phrases like "without rider" when they straddle boundaries

Example:

Section 8: Exclusions ... Flood (without rider)
Overlap preserves the clause boundary so retrieval still captures meaning and context

7) Continuous Chat

Session memory keyed by session_id
Stores recent turns up to MAX_HISTORY
Optional use_rag_context=true injects retrieved policy context every turn
Streaming variant: /chat/continuous/stream

8) Tool-Calling Agent (Part 3)

Endpoint: /agent/chat
Stateful per-session history (in-memory)
Hand-rolled loop with max 6 turns
Tool set:
- search_policy(query) -> RAG retrieval
- calculate_premium(coverage, risk_score) -> coverage * 0.002 * risk_score
- check_claim_status(claim_id) -> hardcoded mock claim map
Malformed tool-call JSON is handled gracefully and retried inside the loop
Session management:
- GET /agent/sessions
- DELETE /agent/session/{session_id}

9) Performance Testing

Use frontend + API:

/rag/benchmark for repeated run stats (avg, p50, p95, min, max)
/rag/evaluate/json to validate model answers against a JSON QA set (example: ../rag.json)
/metrics for rolling endpoint latency
Frontend includes one-click benchmark and metrics refresh

10) Project Layout

insurance-rag/
├── .env
├── .env.example
├── requirements.txt
├── README.md
├── main.py
├── rag/
│   ├── __init__.py
│   ├── documents/
│   │   ├── health_policy.txt
│   │   ├── auto_policy.txt
│   │   └── home_policy.txt
│   ├── embedder.py
│   ├── chunker.py
│   ├── vector_store.py
│   ├── reranker.py
│   └── pipeline.py
├── model_server/
│   ├── __init__.py
│   └── llm_client.py
├── agent/
│   ├── __init__.py
│   ├── engine.py
│   └── tools.py
└── frontend/
    ├── index.html
    ├── styles.css
    └── app.js

Xet Storage Details

Size:: 4.94 kB
Xet hash:: f66408a139332175ee3206a7168686e61754e334c9ae49e805d332f19ffc3c07

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.