Spaces:

rohitsar567
/

InsuranceBot

Sleeping

App Files Files Community

InsuranceBot / backend /README.md

rohitsar567

docs: Cluster A (count drift) + Cluster B (deleted-module refs) sweep

4c728a9 about 1 month ago

preview code

Raw

History Blame Contribute Delete

6.71 kB

	# `backend/` — FastAPI orchestrator

	FastAPI + Pydantic service that fronts every chat turn. The HTTP entry point is `main.py`; everything else is a typed module the orchestrator composes per turn.

	## Entry points

	\| File \| Role \|
	\| --- \| --- \|
	\| `main.py` \| FastAPI app, all HTTP routes (`/api/chat`, `/api/profile`, `/api/policies`, admin endpoints). Wires CORS, request validation, returns the typed response that the frontend's `openapi-typescript` codegen consumes. \|
	\| `orchestrator.py` \| The brain of a turn: `classify_intent`, `pick_brain`, fact-find routing, profile-RAG injection, faithfulness gate dispatch. Pinned by `tests/test_routing_regression.py`. \|
	\| `config.py` \| Pydantic-settings — single source of truth for env-vars, model IDs, chunk sizes, chain budgets. \|

	## Per-turn helpers

	\| File \| Role \| Related ADR \|
	\| --- \| --- \| --- \|
	\| `needs_finder.py` \| 9-slot fact-find SCHEMA (`GRAPH`). Post-KI-167 the `prompt_en` / `prompt_hi` strings are dead text — the LLM owns voice + cadence end-to-end via its system prompt. The data structure is retained as the schema source for `single_brain.py`. \| [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md), [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md) \|
	\| `single_brain.py` \| KI-225 single-LLM brain (2026-05-15). One Gemini-with-function-calling call per turn. Tools: `save_profile_field`, `retrieve_policies`, `get_policy_facts`, `mark_recommendation`. Replaced the multi-brain stack (`sales_brain` + `qa_brain` + `faithfulness` + `persona` + `translator` + `profile_extractor` + `orchestrator`, ~5,200 LOC removed). \| [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md), KI-225 \|
	\| `sales_brain_normalizer.py` \| KI-167 deterministic post-processor. Pure-rule mapping of the LLM's loose `captures` dict to canonical `{field: validated_value}`: alias resolution (`location` → `location_tier`), enum coercion (`Bangalore` → `metro`), INR parsing, type / bounds validation, KI-094 null-drop. No LLM calls. \| [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| ~~`fact_find_brain.py`~~ (deleted in KI-167) \| Was the ADR-030 one-call brain with `<FF>...</FF>` trailer convention + `_canonical_fallback`. Removed entirely. \| superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| ~~`question_paraphraser.py`~~ (deleted in KI-070) \| Was an LLM rewrite of canonical slot questions (ADR-027). Superseded first by `fact_find_brain.py`, now by `sales_brain.py`. \| superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| `fact_find_normalizer.py` \| LLM-driven free-text → slot-value coercion (e.g. "32 lakh" → `3200000`). Goes through `NimChainLLM`, not a single client (KI-033). \| — \|
	\| `profile_extractor.py` \| LLM extractor that pulls profile updates out of conversational asides ("by the way, my dad has diabetes"). Chain-pattern, never a hardcoded model. \| [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) \|
	\| `profile_store.py` \| NEW (KI-040). Persistent named-profile JSON store under `40-data/profiles/`. O(1) name-keyed lookup; mirrors into `profile_rag` on every save. \| — \|
	\| `profile_rag.py` \| Embeds the user's profile as a Chroma chunk so the brain sees it alongside policy chunks for "what's best for me?" turns. Per-chunk `session_id` metadata + `doc_type=profile` exclusion from main retrieval + Python-side triple-check on per-session lookup (KI-102). All `collection.get(...)` calls wrapped in `_safe_collection_get` so never-existed sessions return `None` instead of raising (KI-107). \| [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) \|
	\| `session_state.py` \| In-memory session map; tracks fact-find progress + chat history per `session_id`. \|
	\| `faithfulness.py` \| 4-gate hallucination guard (retrieval floor → citation integrity → regex numeric grounding → LLM judge). Blocks land in `logs/hallucinations.jsonl`. \| — \|
	\| `scorecard.py` \| Pure-function 6-sub-score scorer over the 62-field extracted JSON. No LLM. \| — \|
	\| `translator.py` \| Sarvam-M Indic ↔ English translator wrapper. \| [ADR-006](../70-docs/60-decisions/ADR-006-sarvam-first-stack.md) \|
	\| `translation_check.py` \| Post-hoc detector for mixed-script replies; flags Hinglish leakage. \| — \|
	\| `persona.py` \| The consultative-advisor system prompt + view-aware prompt overlays. \| [ADR-008](../70-docs/60-decisions/ADR-008-consultative-advisor-persona.md), [ADR-021](../70-docs/60-decisions/ADR-021-view-aware-system-prompt.md) \|
	\| `voice_format.py` \| Strips markdown / lists / bullet glyphs so TTS sounds natural. `tts_preprocess` also kills CoT leakage: `<think>...</think>` blocks, `Reasoning:` / `Thought:` labels, `[INTERNAL]` blocks, sentence-anchored CoT starters; emergency-fallback acknowledger if the whole reply is CoT-shaped (KI-104). \| — \|
	\| `premium_calculator.py` \| Looks up `40-data/premiums/illustrative_premiums.json` + applies the documented scaling factors. Never claims a real quote. \| [ADR-007](../70-docs/60-decisions/ADR-007-illustrative-pricing.md) \|
	\| `security.py` \| Request rate-limiting, input sanitisation. \| — \|
	\| `admin.py` \| Admin-only routes (live LLM-health, usage rollups, hallucination tail). \| ADR-023 \|
	\| `llm_health.py` \| Lightweight probe that pings each provider and writes `40-data/llm_health.json` for the admin tab. \| — \|

	## Subdirectory

	`providers/` — concrete STT / TTS / LLM / embeddings client implementations:

	- `sarvam_stt.py` / `sarvam_tts.py` — Sarvam Saarika v2.5 + Bulbul v2.
	- `google_gemini_llm.py` — Google AI Studio (`gemini-2.0-flash`, `gemini-2.5-flash`); Tier 0 primary on Brain Fast + Brain Main ([ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md)).
	- `nvidia_nim_llm.py` — NIM client + `NimChainLLM` dispatcher (the elector + cross-provider BACKUP machinery from KI-080). Tier 1 fallback across all chains; judge primary stays on NIM Mistral Large 3 675B.
	- `openrouter_llm.py` — OpenRouter client; Tier 2 cross-provider diversity using OR's `models: [...]` server-side fallback (KI-176).
	- `local_embeddings.py` — BGE-small-en-v1.5 sentence-transformers.

	All LLM access goes through `NimChainLLM(chain=...)` — never instantiate a single-provider client directly. See `backend/providers/README.md`.

	## Where to read for what

	- System tour: root `README.md`
	- Stable contracts a new contributor must know: root `CLAUDE.md`
	- Decisions with alternatives: `70-docs/60-decisions/ADR-*.md`
	- Routing invariants: `tests/test_routing_regression.py`
	- Defect register: `80-audit/ENTERPRISE_AUDIT.md`

	# `backend/` — FastAPI orchestrator

	FastAPI + Pydantic service that fronts every chat turn. The HTTP entry point is `main.py`; everything else is a typed module the orchestrator composes per turn.

	## Entry points

	\| File \| Role \|
	\| --- \| --- \|
	\| `main.py` \| FastAPI app, all HTTP routes (`/api/chat`, `/api/profile`, `/api/policies`, admin endpoints). Wires CORS, request validation, returns the typed response that the frontend's `openapi-typescript` codegen consumes. \|
	\| `orchestrator.py` \| The brain of a turn: `classify_intent`, `pick_brain`, fact-find routing, profile-RAG injection, faithfulness gate dispatch. Pinned by `tests/test_routing_regression.py`. \|
	\| `config.py` \| Pydantic-settings — single source of truth for env-vars, model IDs, chunk sizes, chain budgets. \|

	## Per-turn helpers

	\| File \| Role \| Related ADR \|
	\| --- \| --- \| --- \|
	\| `needs_finder.py` \| 9-slot fact-find SCHEMA (`GRAPH`). Post-KI-167 the `prompt_en` / `prompt_hi` strings are dead text — the LLM owns voice + cadence end-to-end via its system prompt. The data structure is retained as the schema source for `single_brain.py`. \| [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md), [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md) \|
	\| `single_brain.py` \| KI-225 single-LLM brain (2026-05-15). One Gemini-with-function-calling call per turn. Tools: `save_profile_field`, `retrieve_policies`, `get_policy_facts`, `mark_recommendation`. Replaced the multi-brain stack (`sales_brain` + `qa_brain` + `faithfulness` + `persona` + `translator` + `profile_extractor` + `orchestrator`, ~5,200 LOC removed). \| [ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md), KI-225 \|
	\| `sales_brain_normalizer.py` \| KI-167 deterministic post-processor. Pure-rule mapping of the LLM's loose `captures` dict to canonical `{field: validated_value}`: alias resolution (`location` → `location_tier`), enum coercion (`Bangalore` → `metro`), INR parsing, type / bounds validation, KI-094 null-drop. No LLM calls. \| [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| ~~`fact_find_brain.py`~~ (deleted in KI-167) \| Was the ADR-030 one-call brain with `<FF>...</FF>` trailer convention + `_canonical_fallback`. Removed entirely. \| superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| ~~`question_paraphraser.py`~~ (deleted in KI-070) \| Was an LLM rewrite of canonical slot questions (ADR-027). Superseded first by `fact_find_brain.py`, now by `sales_brain.py`. \| superseded by [ADR-039](../70-docs/60-decisions/ADR-039-llm-driven-sales-brain.md) \|
	\| `fact_find_normalizer.py` \| LLM-driven free-text → slot-value coercion (e.g. "32 lakh" → `3200000`). Goes through `NimChainLLM`, not a single client (KI-033). \| — \|
	\| `profile_extractor.py` \| LLM extractor that pulls profile updates out of conversational asides ("by the way, my dad has diabetes"). Chain-pattern, never a hardcoded model. \| [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) \|
	\| `profile_store.py` \| NEW (KI-040). Persistent named-profile JSON store under `40-data/profiles/`. O(1) name-keyed lookup; mirrors into `profile_rag` on every save. \| — \|
	\| `profile_rag.py` \| Embeds the user's profile as a Chroma chunk so the brain sees it alongside policy chunks for "what's best for me?" turns. Per-chunk `session_id` metadata + `doc_type=profile` exclusion from main retrieval + Python-side triple-check on per-session lookup (KI-102). All `collection.get(...)` calls wrapped in `_safe_collection_get` so never-existed sessions return `None` instead of raising (KI-107). \| [ADR-022](../70-docs/60-decisions/ADR-022-conversational-profile-updates.md) \|
	\| `session_state.py` \| In-memory session map; tracks fact-find progress + chat history per `session_id`. \|
	\| `faithfulness.py` \| 4-gate hallucination guard (retrieval floor → citation integrity → regex numeric grounding → LLM judge). Blocks land in `logs/hallucinations.jsonl`. \| — \|
	\| `scorecard.py` \| Pure-function 6-sub-score scorer over the 62-field extracted JSON. No LLM. \| — \|
	\| `translator.py` \| Sarvam-M Indic ↔ English translator wrapper. \| [ADR-006](../70-docs/60-decisions/ADR-006-sarvam-first-stack.md) \|
	\| `translation_check.py` \| Post-hoc detector for mixed-script replies; flags Hinglish leakage. \| — \|
	\| `persona.py` \| The consultative-advisor system prompt + view-aware prompt overlays. \| [ADR-008](../70-docs/60-decisions/ADR-008-consultative-advisor-persona.md), [ADR-021](../70-docs/60-decisions/ADR-021-view-aware-system-prompt.md) \|
	\| `voice_format.py` \| Strips markdown / lists / bullet glyphs so TTS sounds natural. `tts_preprocess` also kills CoT leakage: `<think>...</think>` blocks, `Reasoning:` / `Thought:` labels, `[INTERNAL]` blocks, sentence-anchored CoT starters; emergency-fallback acknowledger if the whole reply is CoT-shaped (KI-104). \| — \|
	\| `premium_calculator.py` \| Looks up `40-data/premiums/illustrative_premiums.json` + applies the documented scaling factors. Never claims a real quote. \| [ADR-007](../70-docs/60-decisions/ADR-007-illustrative-pricing.md) \|
	\| `security.py` \| Request rate-limiting, input sanitisation. \| — \|
	\| `admin.py` \| Admin-only routes (live LLM-health, usage rollups, hallucination tail). \| ADR-023 \|
	\| `llm_health.py` \| Lightweight probe that pings each provider and writes `40-data/llm_health.json` for the admin tab. \| — \|

	## Subdirectory

	`providers/` — concrete STT / TTS / LLM / embeddings client implementations:

	- `sarvam_stt.py` / `sarvam_tts.py` — Sarvam Saarika v2.5 + Bulbul v2.
	- `google_gemini_llm.py` — Google AI Studio (`gemini-2.0-flash`, `gemini-2.5-flash`); Tier 0 primary on Brain Fast + Brain Main ([ADR-040](../70-docs/60-decisions/ADR-040-google-gemini-primary.md)).
	- `nvidia_nim_llm.py` — NIM client + `NimChainLLM` dispatcher (the elector + cross-provider BACKUP machinery from KI-080). Tier 1 fallback across all chains; judge primary stays on NIM Mistral Large 3 675B.
	- `openrouter_llm.py` — OpenRouter client; Tier 2 cross-provider diversity using OR's `models: [...]` server-side fallback (KI-176).
	- `local_embeddings.py` — BGE-small-en-v1.5 sentence-transformers.

	All LLM access goes through `NimChainLLM(chain=...)` — never instantiate a single-provider client directly. See `backend/providers/README.md`.

	## Where to read for what

	- System tour: root `README.md`
	- Stable contracts a new contributor must know: root `CLAUDE.md`
	- Decisions with alternatives: `70-docs/60-decisions/ADR-*.md`
	- Routing invariants: `tests/test_routing_regression.py`
	- Defect register: `80-audit/ENTERPRISE_AUDIT.md`