Spaces:
Sleeping
Sleeping
| # Architecture | |
| **Canonical source: [`README.md`](README.md) Β§4 "How it works, end to end".** | |
| The full, maintained architecture β request flow diagram, the single-brain | |
| design, the fallback chain, voice, retrieval, profile/personalisation β lives | |
| inline in the README so there is exactly **one** place to keep accurate (the | |
| historical split between this file and `70-docs/` is what allowed both to | |
| drift out of date). This file is a one-screen orientation; the README is the | |
| authority. | |
| ## One-screen summary | |
| - **Frontend** β Next.js 16 / React 19 / Tailwind v4, static export, served by | |
| the backend. `frontend/src/app/page.tsx`. Voice = Web Speech (interim) + | |
| `MediaRecorder` (authoritative) β Sarvam STT; Sarvam TTS replies. | |
| - **Backend** β FastAPI (`backend/main.py`); `uvicorn` on port 7860 in the | |
| Space. Endpoints: `/api/chat`, `/api/transcribe`, `/api/upload-policy`, | |
| `/api/coverage`, `/api/profile*`, `/api/scorecard`, `/api/session*`, | |
| `/api/admin/*`. | |
| - **Brain** β one LLM call per turn: Google Gemini | |
| (`gemini-2.5-flash`) + function-calling tools | |
| (`save_profile_field`, `retrieve_policies`, `get_policy_facts`, | |
| `mark_recommendation`) in | |
| `backend/single_brain.py` / `backend/brain_tools.py`. A single call owns | |
| the whole turn: fact-find, retrieval, QA, and recommendation. On a | |
| transient Gemini error / cold-start 503 β small `backend/nim_fallback.py` | |
| (NVIDIA NIM) so the turn still completes. Fail-loud, never silently wrong. | |
| The legacy multi-pass design (orchestrator / sales-brain / QA-brain / | |
| separate faithfulness judge / profile_extractor / tiered brain) was | |
| removed β it does not exist in the codebase. | |
| - **Retrieval** β Chroma vector store, BGE-small-en-v1.5 local 384-d | |
| embeddings (`rag/retrieve.py`). Shared `policies` collection (148 | |
| catalogued plans across 21 insurers, ~7.3K chunks) + a per-session | |
| `user_uploads_quarantine` collection (24h TTL, session-isolated). Per | |
| ADR-044 (2026-05-27), uploaded PDFs dual-write into both collections β | |
| the upload becomes a first-class marketplace card with the same | |
| scorecard / premium / RAG endpoints as the catalogued 148. | |
| - **Upload safety** β `backend/security.py`, 8 gates, before any embedding. | |
| - **Data** β three repos: code (HF Space `origin` + GitHub `github` | |
| mirror), the `rohitsar567/insurance-bot-data` HF dataset (corpus + | |
| vectors, pulled at Docker build), and `40-data/` curated facts versioned | |
| with the code. | |
| - **Deploy** β HF Space Docker; `entrypoint.sh` runs `uvicorn`; the build | |
| `snapshot_download`s the data dataset. | |
| For anything beyond this, read `README.md` β do not treat older | |
| `70-docs/`/ADR prose as the present-state map (it predates the single-brain | |
| rewrite and is being reconciled). | |