# Architecture **Canonical source: [`README.md`](README.md) §4 "How it works, end to end".** The full, maintained architecture — request flow diagram, the single-brain design, the fallback chain, voice, retrieval, profile/personalisation — lives inline in the README so there is exactly **one** place to keep accurate (the historical split between this file and `70-docs/` is what allowed both to drift out of date). This file is a one-screen orientation; the README is the authority. ## One-screen summary - **Frontend** — Next.js 16 / React 19 / Tailwind v4, static export, served by the backend. `frontend/src/app/page.tsx`. Voice = Web Speech (interim) + `MediaRecorder` (authoritative) → Sarvam STT; Sarvam TTS replies. - **Backend** — FastAPI (`backend/main.py`); `uvicorn` on port 7860 in the Space. Endpoints: `/api/chat`, `/api/transcribe`, `/api/upload-policy`, `/api/coverage`, `/api/profile*`, `/api/scorecard`, `/api/session*`, `/api/admin/*`. - **Brain** — one LLM call per turn: Google Gemini (`gemini-2.5-flash`) + function-calling tools (`save_profile_field`, `retrieve_policies`, `get_policy_facts`, `mark_recommendation`) in `backend/single_brain.py` / `backend/brain_tools.py`. A single call owns the whole turn: fact-find, retrieval, QA, and recommendation. On a transient Gemini error / cold-start 503 → small `backend/nim_fallback.py` (NVIDIA NIM) so the turn still completes. Fail-loud, never silently wrong. The legacy multi-pass design (orchestrator / sales-brain / QA-brain / separate faithfulness judge / profile_extractor / tiered brain) was removed — it does not exist in the codebase. - **Retrieval** — Chroma vector store, BGE-small-en-v1.5 local 384-d embeddings (`rag/retrieve.py`). Shared `policies` collection (148 catalogued plans across 21 insurers, ~7.3K chunks) + a per-session `user_uploads_quarantine` collection (24h TTL, session-isolated). Per ADR-044 (2026-05-27), uploaded PDFs dual-write into both collections — the upload becomes a first-class marketplace card with the same scorecard / premium / RAG endpoints as the catalogued 148. - **Upload safety** — `backend/security.py`, 8 gates, before any embedding. - **Data** — three repos: code (HF Space `origin` + GitHub `github` mirror), the `rohitsar567/insurance-bot-data` HF dataset (corpus + vectors, pulled at Docker build), and `40-data/` curated facts versioned with the code. - **Deploy** — HF Space Docker; `entrypoint.sh` runs `uvicorn`; the build `snapshot_download`s the data dataset. For anything beyond this, read `README.md` — do not treat older `70-docs/`/ADR prose as the present-state map (it predates the single-brain rewrite and is being reconciled).