# `backend/` — FastAPI orchestrator FastAPI + Pydantic service that fronts every chat turn. The HTTP entry point is `main.py`; everything else is a typed module the orchestrator composes per turn. ## Entry points | File | Role | | --- | --- | | `main.py` | FastAPI app, all HTTP routes (`/api/chat`, `/api/profile`, `/api/policies`, admin endpoints). Wires CORS, request validation, returns the typed response that the frontend's `openapi-typescript` codegen consumes. | | `orchestrator.py` | The brain of a turn: `classify_intent`, `pick_brain`, fact-find routing, profile-RAG injection, faithfulness gate dispatch. Pinned by `tests/test_routing_regression.py`. | | `config.py` | Pydantic-settings — single source of truth for env-vars, model IDs, chunk sizes, chain budgets. | ## Per-turn helpers | File | Role | Related ADR | | --- | --- | --- | | `needs_finder.py` | 9-slot fact-find SCHEMA (`GRAPH`). Post-KI-167 the `prompt_en` / `prompt_hi` strings are dead text — the LLM owns voice + cadence end-to-end via its system prompt. The data structure is retained as the schema source for `single_brain.py`. | [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md), [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md) | | `single_brain.py` | **KI-225 single-LLM brain (2026-05-15).** One Gemini-with-function-calling call per turn. Tools: `save_profile_field`, `retrieve_policies`, `get_policy_facts`, `mark_recommendation`. Replaced the multi-brain stack (`sales_brain` + `qa_brain` + `faithfulness` + `persona` + `translator` + `profile_extractor` + `orchestrator`, ~5,200 LOC removed). | [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md), KI-225 | | `sales_brain_normalizer.py` | **KI-167 deterministic post-processor.** Pure-rule mapping of the LLM's loose `captures` dict to canonical `{field: validated_value}`: alias resolution (`location` → `location_tier`), enum coercion (`Bangalore` → `metro`), INR parsing, type / bounds validation, KI-094 null-drop. No LLM calls. | [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) | | ~~`fact_find_brain.py`~~ *(deleted in KI-167)* | Was the ADR-030 one-call brain with `...` trailer convention + `_canonical_fallback`. Removed entirely. | superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) | | ~~`question_paraphraser.py`~~ *(deleted in KI-070)* | Was an LLM rewrite of canonical slot questions (ADR-027). Superseded first by `fact_find_brain.py`, now by `sales_brain.py`. | superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) | | `fact_find_normalizer.py` | LLM-driven free-text → slot-value coercion (e.g. "32 lakh" → `3200000`). Goes through `NimChainLLM`, not a single client (KI-033). | — | | `profile_extractor.py` | LLM extractor that pulls profile updates out of conversational asides ("by the way, my dad has diabetes"). Chain-pattern, never a hardcoded model. | [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) | | `profile_store.py` | **NEW (KI-040).** Persistent named-profile JSON store under `40-data/profiles/`. O(1) name-keyed lookup; mirrors into `profile_rag` on every save. | — | | `profile_rag.py` | Embeds the user's profile as a Chroma chunk so the brain sees it alongside policy chunks for "what's best for me?" turns. Per-chunk `session_id` metadata + `doc_type=profile` exclusion from main retrieval + Python-side triple-check on per-session lookup (KI-102). All `collection.get(...)` calls wrapped in `_safe_collection_get` so never-existed sessions return `None` instead of raising (KI-107). | [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) | | `session_state.py` | In-memory session map; tracks fact-find progress + chat history per `session_id`. | | `faithfulness.py` | 4-gate hallucination guard (retrieval floor → citation integrity → regex numeric grounding → LLM judge). Blocks land in `logs/hallucinations.jsonl`. | — | | `scorecard.py` | Pure-function 6-sub-score scorer over the 62-field extracted JSON. No LLM. | — | | `translator.py` | Sarvam-M Indic ↔ English translator wrapper. | [ADR-006](../70-docs/60-decisions/ADR-006-sarvam-first-stack.md) | | `translation_check.py` | Post-hoc detector for mixed-script replies; flags Hinglish leakage. | — | | `persona.py` | The consultative-advisor system prompt + view-aware prompt overlays. | [ADR-008](../70-docs/60-decisions/ADR-008-consultative-advisor-persona.md), [ADR-021](../70-docs/60-decisions/ADR-021-view-aware-system-prompt.md) | | `voice_format.py` | Strips markdown / lists / bullet glyphs so TTS sounds natural. `tts_preprocess` also kills CoT leakage: `...` blocks, `**Reasoning:**` / `**Thought:**` labels, `[INTERNAL]` blocks, sentence-anchored CoT starters; emergency-fallback acknowledger if the whole reply is CoT-shaped (KI-104). | — | | `premium_calculator.py` | Looks up `40-data/premiums/illustrative_premiums.json` + applies the documented scaling factors. Never claims a real quote. | [ADR-007](../70-docs/60-decisions/ADR-007-illustrative-pricing.md) | | `security.py` | Request rate-limiting, input sanitisation. | — | | `admin.py` | Admin-only routes (live LLM-health, usage rollups, hallucination tail). | ADR-023 | | `llm_health.py` | Lightweight probe that pings each provider and writes `40-data/llm_health.json` for the admin tab. | — | ## Subdirectory `providers/` — concrete STT / TTS / LLM / embeddings client implementations: - `sarvam_stt.py` / `sarvam_tts.py` — Sarvam Saarika v2.5 + Bulbul v2. - `google_gemini_llm.py` — Google AI Studio (`gemini-2.0-flash`, `gemini-2.5-flash`); Tier 0 primary on Brain Fast + Brain Main ([ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md)). - `nvidia_nim_llm.py` — NIM client + `NimChainLLM` dispatcher (the elector + cross-provider BACKUP machinery from KI-080). Tier 1 fallback across all chains; judge primary stays on NIM Mistral Large 3 675B. - `openrouter_llm.py` — OpenRouter client; Tier 2 cross-provider diversity using OR's `models: [...]` server-side fallback (KI-176). - `local_embeddings.py` — BGE-small-en-v1.5 sentence-transformers. All LLM access goes through `NimChainLLM(chain=...)` — never instantiate a single-provider client directly. See `backend/providers/README.md`. ## Where to read for what - **System tour:** root `README.md` - **Stable contracts a new contributor must know:** root `CLAUDE.md` - **Decisions with alternatives:** `70-docs/60-decisions/ADR-*.md` - **Routing invariants:** `tests/test_routing_regression.py` - **Defect register:** `80-audit/ENTERPRISE_AUDIT.md`