InsuranceBot / backend /README.md
rohitsar567's picture
docs: Cluster A (count drift) + Cluster B (deleted-module refs) sweep
4c728a9
|
Raw
History Blame Contribute Delete
6.71 kB

backend/ β€” FastAPI orchestrator

FastAPI + Pydantic service that fronts every chat turn. The HTTP entry point is main.py; everything else is a typed module the orchestrator composes per turn.

Entry points

File Role
main.py FastAPI app, all HTTP routes (/api/chat, /api/profile, /api/policies, admin endpoints). Wires CORS, request validation, returns the typed response that the frontend's openapi-typescript codegen consumes.
orchestrator.py The brain of a turn: classify_intent, pick_brain, fact-find routing, profile-RAG injection, faithfulness gate dispatch. Pinned by tests/test_routing_regression.py.
config.py Pydantic-settings β€” single source of truth for env-vars, model IDs, chunk sizes, chain budgets.

Per-turn helpers

File Role Related ADR
needs_finder.py 9-slot fact-find SCHEMA (GRAPH). Post-KI-167 the prompt_en / prompt_hi strings are dead text β€” the LLM owns voice + cadence end-to-end via its system prompt. The data structure is retained as the schema source for single_brain.py. ADR-039, ADR-040
single_brain.py KI-225 single-LLM brain (2026-05-15). One Gemini-with-function-calling call per turn. Tools: save_profile_field, retrieve_policies, get_policy_facts, mark_recommendation. Replaced the multi-brain stack (sales_brain + qa_brain + faithfulness + persona + translator + profile_extractor + orchestrator, ~5,200 LOC removed). ADR-040, KI-225
sales_brain_normalizer.py KI-167 deterministic post-processor. Pure-rule mapping of the LLM's loose captures dict to canonical {field: validated_value}: alias resolution (location β†’ location_tier), enum coercion (Bangalore β†’ metro), INR parsing, type / bounds validation, KI-094 null-drop. No LLM calls. ADR-039
fact_find_brain.py (deleted in KI-167) Was the ADR-030 one-call brain with <FF>...</FF> trailer convention + _canonical_fallback. Removed entirely. superseded by ADR-039
question_paraphraser.py (deleted in KI-070) Was an LLM rewrite of canonical slot questions (ADR-027). Superseded first by fact_find_brain.py, now by sales_brain.py. superseded by ADR-039
fact_find_normalizer.py LLM-driven free-text β†’ slot-value coercion (e.g. "32 lakh" β†’ 3200000). Goes through NimChainLLM, not a single client (KI-033). β€”
profile_extractor.py LLM extractor that pulls profile updates out of conversational asides ("by the way, my dad has diabetes"). Chain-pattern, never a hardcoded model. ADR-022
profile_store.py NEW (KI-040). Persistent named-profile JSON store under 40-data/profiles/. O(1) name-keyed lookup; mirrors into profile_rag on every save. β€”
profile_rag.py Embeds the user's profile as a Chroma chunk so the brain sees it alongside policy chunks for "what's best for me?" turns. Per-chunk session_id metadata + doc_type=profile exclusion from main retrieval + Python-side triple-check on per-session lookup (KI-102). All collection.get(...) calls wrapped in _safe_collection_get so never-existed sessions return None instead of raising (KI-107). ADR-022
session_state.py In-memory session map; tracks fact-find progress + chat history per session_id.
faithfulness.py 4-gate hallucination guard (retrieval floor β†’ citation integrity β†’ regex numeric grounding β†’ LLM judge). Blocks land in logs/hallucinations.jsonl. β€”
scorecard.py Pure-function 6-sub-score scorer over the 62-field extracted JSON. No LLM. β€”
translator.py Sarvam-M Indic ↔ English translator wrapper. ADR-006
translation_check.py Post-hoc detector for mixed-script replies; flags Hinglish leakage. β€”
persona.py The consultative-advisor system prompt + view-aware prompt overlays. ADR-008, ADR-021
voice_format.py Strips markdown / lists / bullet glyphs so TTS sounds natural. tts_preprocess also kills CoT leakage: <think>...</think> blocks, **Reasoning:** / **Thought:** labels, [INTERNAL] blocks, sentence-anchored CoT starters; emergency-fallback acknowledger if the whole reply is CoT-shaped (KI-104). β€”
premium_calculator.py Looks up 40-data/premiums/illustrative_premiums.json + applies the documented scaling factors. Never claims a real quote. ADR-007
security.py Request rate-limiting, input sanitisation. β€”
admin.py Admin-only routes (live LLM-health, usage rollups, hallucination tail). ADR-023
llm_health.py Lightweight probe that pings each provider and writes 40-data/llm_health.json for the admin tab. β€”

Subdirectory

providers/ β€” concrete STT / TTS / LLM / embeddings client implementations:

  • sarvam_stt.py / sarvam_tts.py β€” Sarvam Saarika v2.5 + Bulbul v2.
  • google_gemini_llm.py β€” Google AI Studio (gemini-2.0-flash, gemini-2.5-flash); Tier 0 primary on Brain Fast + Brain Main (ADR-040).
  • nvidia_nim_llm.py β€” NIM client + NimChainLLM dispatcher (the elector + cross-provider BACKUP machinery from KI-080). Tier 1 fallback across all chains; judge primary stays on NIM Mistral Large 3 675B.
  • openrouter_llm.py β€” OpenRouter client; Tier 2 cross-provider diversity using OR's models: [...] server-side fallback (KI-176).
  • local_embeddings.py β€” BGE-small-en-v1.5 sentence-transformers.

All LLM access goes through NimChainLLM(chain=...) β€” never instantiate a single-provider client directly. See backend/providers/README.md.

Where to read for what

  • System tour: root README.md
  • Stable contracts a new contributor must know: root CLAUDE.md
  • Decisions with alternatives: 70-docs/60-decisions/ADR-*.md
  • Routing invariants: tests/test_routing_regression.py
  • Defect register: 80-audit/ENTERPRISE_AUDIT.md