Spaces:
Sleeping
Sleeping
backend/ β FastAPI orchestrator
FastAPI + Pydantic service that fronts every chat turn. The HTTP entry point is main.py; everything else is a typed module the orchestrator composes per turn.
Entry points
| File | Role |
|---|---|
main.py |
FastAPI app, all HTTP routes (/api/chat, /api/profile, /api/policies, admin endpoints). Wires CORS, request validation, returns the typed response that the frontend's openapi-typescript codegen consumes. |
orchestrator.py |
The brain of a turn: classify_intent, pick_brain, fact-find routing, profile-RAG injection, faithfulness gate dispatch. Pinned by tests/test_routing_regression.py. |
config.py |
Pydantic-settings β single source of truth for env-vars, model IDs, chunk sizes, chain budgets. |
Per-turn helpers
| File | Role | Related ADR |
|---|---|---|
needs_finder.py |
9-slot fact-find SCHEMA (GRAPH). Post-KI-167 the prompt_en / prompt_hi strings are dead text β the LLM owns voice + cadence end-to-end via its system prompt. The data structure is retained as the schema source for single_brain.py. |
ADR-039, ADR-040 |
single_brain.py |
KI-225 single-LLM brain (2026-05-15). One Gemini-with-function-calling call per turn. Tools: save_profile_field, retrieve_policies, get_policy_facts, mark_recommendation. Replaced the multi-brain stack (sales_brain + qa_brain + faithfulness + persona + translator + profile_extractor + orchestrator, ~5,200 LOC removed). |
ADR-040, KI-225 |
sales_brain_normalizer.py |
KI-167 deterministic post-processor. Pure-rule mapping of the LLM's loose captures dict to canonical {field: validated_value}: alias resolution (location β location_tier), enum coercion (Bangalore β metro), INR parsing, type / bounds validation, KI-094 null-drop. No LLM calls. |
ADR-039 |
fact_find_brain.py |
Was the ADR-030 one-call brain with <FF>...</FF> trailer convention + _canonical_fallback. Removed entirely. |
superseded by ADR-039 |
question_paraphraser.py |
Was an LLM rewrite of canonical slot questions (ADR-027). Superseded first by fact_find_brain.py, now by sales_brain.py. |
superseded by ADR-039 |
fact_find_normalizer.py |
LLM-driven free-text β slot-value coercion (e.g. "32 lakh" β 3200000). Goes through NimChainLLM, not a single client (KI-033). |
β |
profile_extractor.py |
LLM extractor that pulls profile updates out of conversational asides ("by the way, my dad has diabetes"). Chain-pattern, never a hardcoded model. | ADR-022 |
profile_store.py |
NEW (KI-040). Persistent named-profile JSON store under 40-data/profiles/. O(1) name-keyed lookup; mirrors into profile_rag on every save. |
β |
profile_rag.py |
Embeds the user's profile as a Chroma chunk so the brain sees it alongside policy chunks for "what's best for me?" turns. Per-chunk session_id metadata + doc_type=profile exclusion from main retrieval + Python-side triple-check on per-session lookup (KI-102). All collection.get(...) calls wrapped in _safe_collection_get so never-existed sessions return None instead of raising (KI-107). |
ADR-022 |
session_state.py |
In-memory session map; tracks fact-find progress + chat history per session_id. |
|
faithfulness.py |
4-gate hallucination guard (retrieval floor β citation integrity β regex numeric grounding β LLM judge). Blocks land in logs/hallucinations.jsonl. |
β |
scorecard.py |
Pure-function 6-sub-score scorer over the 62-field extracted JSON. No LLM. | β |
translator.py |
Sarvam-M Indic β English translator wrapper. | ADR-006 |
translation_check.py |
Post-hoc detector for mixed-script replies; flags Hinglish leakage. | β |
persona.py |
The consultative-advisor system prompt + view-aware prompt overlays. | ADR-008, ADR-021 |
voice_format.py |
Strips markdown / lists / bullet glyphs so TTS sounds natural. tts_preprocess also kills CoT leakage: <think>...</think> blocks, **Reasoning:** / **Thought:** labels, [INTERNAL] blocks, sentence-anchored CoT starters; emergency-fallback acknowledger if the whole reply is CoT-shaped (KI-104). |
β |
premium_calculator.py |
Looks up 40-data/premiums/illustrative_premiums.json + applies the documented scaling factors. Never claims a real quote. |
ADR-007 |
security.py |
Request rate-limiting, input sanitisation. | β |
admin.py |
Admin-only routes (live LLM-health, usage rollups, hallucination tail). | ADR-023 |
llm_health.py |
Lightweight probe that pings each provider and writes 40-data/llm_health.json for the admin tab. |
β |
Subdirectory
providers/ β concrete STT / TTS / LLM / embeddings client implementations:
sarvam_stt.py/sarvam_tts.pyβ Sarvam Saarika v2.5 + Bulbul v2.google_gemini_llm.pyβ Google AI Studio (gemini-2.0-flash,gemini-2.5-flash); Tier 0 primary on Brain Fast + Brain Main (ADR-040).nvidia_nim_llm.pyβ NIM client +NimChainLLMdispatcher (the elector + cross-provider BACKUP machinery from KI-080). Tier 1 fallback across all chains; judge primary stays on NIM Mistral Large 3 675B.openrouter_llm.pyβ OpenRouter client; Tier 2 cross-provider diversity using OR'smodels: [...]server-side fallback (KI-176).local_embeddings.pyβ BGE-small-en-v1.5 sentence-transformers.
All LLM access goes through NimChainLLM(chain=...) β never instantiate a single-provider client directly. See backend/providers/README.md.
Where to read for what
- System tour: root
README.md - Stable contracts a new contributor must know: root
CLAUDE.md - Decisions with alternatives:
70-docs/60-decisions/ADR-*.md - Routing invariants:
tests/test_routing_regression.py - Defect register:
80-audit/ENTERPRISE_AUDIT.md