Buckets:

meet4150
/

insurence_model1

Files

xet

meet4150/insurence_model1 / README.md

meet4150

about 2 months ago

preview code

download

raw

4.94 kB

	# Insurance RAG System (Production-Ready)

	A complete Retrieval-Augmented Generation (RAG) pipeline for insurance brokers:

	- Ingests 3 policy documents (health, auto, home)
	- Chunks and embeds locally with Hugging Face BGE
	- Stores vectors in FAISS
	- Retrieves + optional cross-encoder reranking
	- Generates grounded answers via local Hugging Face LLM
	- Adds source attribution (document + section)
	- Supports continuous multi-turn chat sessions
	- Ships with a frontend to test quality and performance

	## 1) Setup

	```bash
	cd insurance-rag
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	Create `.env` from template:

	```bash
	cp .env.example .env
	```

	Set at least:

	```env
	HF_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
	HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct
	HF_LLM_DEVICE=auto
	HF_LLM_DTYPE=auto
	HF_LLM_LOAD_IN_4BIT=false
	CHUNK_SIZE=512
	CHUNK_OVERLAP=64
	TOP_K=4
	```

	GPU profile recommendations:

	- `4GB VRAM`: `HF_LLM_MODEL=Qwen/Qwen2.5-1.5B-Instruct`, `HF_LLM_LOAD_IN_4BIT=false`
	- `16GB VRAM` (better quality): `HF_LLM_MODEL=Raj-Maharajwala/Open-Insurance-LLM-Llama3-8B`, `HF_LLM_LOAD_IN_4BIT=true`

	## 2) Build Index (One-time)

	```bash
	python -c "from rag.vector_store import build_index; build_index()"
	```

	Or through API:

	```bash
	curl -X POST http://localhost:8000/index/build
	```

	## 3) Run API + Frontend

	```bash
	python -m uvicorn main:app --reload --port 8000
	```

	Open browser:

	- `http://localhost:8000/` (frontend test console)
	- API docs: `http://localhost:8000/docs`

	## 4) Quick Tests

	Health:

	```bash
	curl http://localhost:8000/health
	```

	RAG query:

	```bash
	curl -X POST http://localhost:8000/rag/query \
	-H "Content-Type: application/json" \
	-d '{"question": "Is flood damage covered in home insurance?", "session_id": "test-1", "use_reranker": true}'
	```

	Continuous chat:

	```bash
	curl -X POST http://localhost:8000/chat/continuous \
	-H "Content-Type: application/json" \
	-d '{"message": "What is the flood deductible?", "session_id": "broker-42", "use_rag_context": true, "use_reranker": true}'
	```

	Agent chat (tool-calling loop):

	```bash
	curl -X POST http://localhost:8000/agent/chat \
	-H "Content-Type: application/json" \
	-d '{"message":"What is my premium for coverage 120000 and risk score 1.2?","session_id":"agent-1","use_reranker":true,"max_turns":6}'
	```

	## 5) Architecture

	1. Broker asks question
	2. `BAAI/bge-small-en-v1.5` creates query embedding (local)
	3. FAISS retrieves top-k policy chunks
	4. Cross-encoder reranker reorders chunks (bonus layer)
	5. Prompt injects context + question
	6. Local Hugging Face LLM produces grounded answer with citations

	## 6) Chunking Strategy (Justification)

	- Split preference: paragraph (`\n\n`) -> line (`\n`) -> sentence (`.`) -> word (` `)
	- `CHUNK_SIZE=512`: insurance clauses are usually 300-500 words, so one clause fits in one chunk
	- `CHUNK_OVERLAP=64`: avoids orphaned tail phrases like "without rider" when they straddle boundaries

	Example:

	- `Section 8: Exclusions ... Flood (without rider)`
	- Overlap preserves the clause boundary so retrieval still captures meaning and context

	## 7) Continuous Chat

	- Session memory keyed by `session_id`
	- Stores recent turns up to `MAX_HISTORY`
	- Optional `use_rag_context=true` injects retrieved policy context every turn
	- Streaming variant: `/chat/continuous/stream`

	## 8) Tool-Calling Agent (Part 3)

	- Endpoint: `/agent/chat`
	- Stateful per-session history (in-memory)
	- Hand-rolled loop with max 6 turns
	- Tool set:
	- `search_policy(query)` -> RAG retrieval
	- `calculate_premium(coverage, risk_score)` -> `coverage * 0.002 * risk_score`
	- `check_claim_status(claim_id)` -> hardcoded mock claim map
	- Malformed tool-call JSON is handled gracefully and retried inside the loop
	- Session management:
	- `GET /agent/sessions`
	- `DELETE /agent/session/{session_id}`

	## 9) Performance Testing

	Use frontend + API:

	- `/rag/benchmark` for repeated run stats (`avg`, `p50`, `p95`, `min`, `max`)
	- `/rag/evaluate/json` to validate model answers against a JSON QA set (example: `../rag.json`)
	- `/metrics` for rolling endpoint latency
	- Frontend includes one-click benchmark and metrics refresh

	## 10) Project Layout

	```text
	insurance-rag/
	├── .env
	├── .env.example
	├── requirements.txt
	├── README.md
	├── main.py
	├── rag/
	│ ├── __init__.py
	│ ├── documents/
	│ │ ├── health_policy.txt
	│ │ ├── auto_policy.txt
	│ │ └── home_policy.txt
	│ ├── embedder.py
	│ ├── chunker.py
	│ ├── vector_store.py
	│ ├── reranker.py
	│ └── pipeline.py
	├── model_server/
	│ ├── __init__.py
	│ └── llm_client.py
	├── agent/
	│ ├── __init__.py
	│ ├── engine.py
	│ └── tools.py
	└── frontend/
	├── index.html
	├── styles.css
	└── app.js
	```

Xet Storage Details

Size:: 4.94 kB
Xet hash:: f66408a139332175ee3206a7168686e61754e334c9ae49e805d332f19ffc3c07

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.