InsuranceBot / ARCHITECTURE.md
rohitsar567's picture
docs: stale-references sweep β€” align legacy docs with ADR-044 + 8-gate state
21e4b2a
|
Raw
History Blame Contribute Delete
2.75 kB

Architecture

Canonical source: README.md Β§4 "How it works, end to end".

The full, maintained architecture β€” request flow diagram, the single-brain design, the fallback chain, voice, retrieval, profile/personalisation β€” lives inline in the README so there is exactly one place to keep accurate (the historical split between this file and 70-docs/ is what allowed both to drift out of date). This file is a one-screen orientation; the README is the authority.

One-screen summary

  • Frontend β€” Next.js 16 / React 19 / Tailwind v4, static export, served by the backend. frontend/src/app/page.tsx. Voice = Web Speech (interim) + MediaRecorder (authoritative) β†’ Sarvam STT; Sarvam TTS replies.
  • Backend β€” FastAPI (backend/main.py); uvicorn on port 7860 in the Space. Endpoints: /api/chat, /api/transcribe, /api/upload-policy, /api/coverage, /api/profile*, /api/scorecard, /api/session*, /api/admin/*.
  • Brain β€” one LLM call per turn: Google Gemini (gemini-2.5-flash) + function-calling tools (save_profile_field, retrieve_policies, get_policy_facts, mark_recommendation) in backend/single_brain.py / backend/brain_tools.py. A single call owns the whole turn: fact-find, retrieval, QA, and recommendation. On a transient Gemini error / cold-start 503 β†’ small backend/nim_fallback.py (NVIDIA NIM) so the turn still completes. Fail-loud, never silently wrong. The legacy multi-pass design (orchestrator / sales-brain / QA-brain / separate faithfulness judge / profile_extractor / tiered brain) was removed β€” it does not exist in the codebase.
  • Retrieval β€” Chroma vector store, BGE-small-en-v1.5 local 384-d embeddings (rag/retrieve.py). Shared policies collection (148 catalogued plans across 21 insurers, ~7.3K chunks) + a per-session user_uploads_quarantine collection (24h TTL, session-isolated). Per ADR-044 (2026-05-27), uploaded PDFs dual-write into both collections β€” the upload becomes a first-class marketplace card with the same scorecard / premium / RAG endpoints as the catalogued 148.
  • Upload safety β€” backend/security.py, 8 gates, before any embedding.
  • Data β€” three repos: code (HF Space origin + GitHub github mirror), the rohitsar567/insurance-bot-data HF dataset (corpus + vectors, pulled at Docker build), and 40-data/ curated facts versioned with the code.
  • Deploy β€” HF Space Docker; entrypoint.sh runs uvicorn; the build snapshot_downloads the data dataset.

For anything beyond this, read README.md β€” do not treat older 70-docs//ADR prose as the present-state map (it predates the single-brain rewrite and is being reconciled).