Spaces:
Sleeping
Sleeping
Architecture
Canonical source: README.md Β§4 "How it works, end to end".
The full, maintained architecture β request flow diagram, the single-brain
design, the fallback chain, voice, retrieval, profile/personalisation β lives
inline in the README so there is exactly one place to keep accurate (the
historical split between this file and 70-docs/ is what allowed both to
drift out of date). This file is a one-screen orientation; the README is the
authority.
One-screen summary
- Frontend β Next.js 16 / React 19 / Tailwind v4, static export, served by
the backend.
frontend/src/app/page.tsx. Voice = Web Speech (interim) +MediaRecorder(authoritative) β Sarvam STT; Sarvam TTS replies. - Backend β FastAPI (
backend/main.py);uvicornon port 7860 in the Space. Endpoints:/api/chat,/api/transcribe,/api/upload-policy,/api/coverage,/api/profile*,/api/scorecard,/api/session*,/api/admin/*. - Brain β one LLM call per turn: Google Gemini
(
gemini-2.5-flash) + function-calling tools (save_profile_field,retrieve_policies,get_policy_facts,mark_recommendation) inbackend/single_brain.py/backend/brain_tools.py. A single call owns the whole turn: fact-find, retrieval, QA, and recommendation. On a transient Gemini error / cold-start 503 β smallbackend/nim_fallback.py(NVIDIA NIM) so the turn still completes. Fail-loud, never silently wrong. The legacy multi-pass design (orchestrator / sales-brain / QA-brain / separate faithfulness judge / profile_extractor / tiered brain) was removed β it does not exist in the codebase. - Retrieval β Chroma vector store, BGE-small-en-v1.5 local 384-d
embeddings (
rag/retrieve.py). Sharedpoliciescollection (148 catalogued plans across 21 insurers, ~7.3K chunks) + a per-sessionuser_uploads_quarantinecollection (24h TTL, session-isolated). Per ADR-044 (2026-05-27), uploaded PDFs dual-write into both collections β the upload becomes a first-class marketplace card with the same scorecard / premium / RAG endpoints as the catalogued 148. - Upload safety β
backend/security.py, 8 gates, before any embedding. - Data β three repos: code (HF Space
origin+ GitHubgithubmirror), therohitsar567/insurance-bot-dataHF dataset (corpus + vectors, pulled at Docker build), and40-data/curated facts versioned with the code. - Deploy β HF Space Docker;
entrypoint.shrunsuvicorn; the buildsnapshot_downloads the data dataset.
For anything beyond this, read README.md β do not treat older
70-docs//ADR prose as the present-state map (it predates the single-brain
rewrite and is being reconciled).