# 05 — Needs Analysis Flow (single-LLM-with-tools) | Field | Value | | --- | --- | | Project | Insurance Sales Portfolio Expert | | Version | 0.3 | | Date | 2026-05-17 | | Implementation | `backend/single_brain.py` + `backend/brain_tools.py`. The slot schema still lives in `backend/needs_finder.py::GRAPH`. | > ℹ️ **Design intent below is still accurate; implementation pointers > updated.** The core idea — *one LLM call per turn owns the entire > fact-find surface* — is what the single-LLM-with-tools handler does > today. Fact-find, retrieval, QA, and recommendation are all handled in > that one call via function-calling tools; the separate `sales_brain` / > `qa_brain` split, the orchestrator, the profile_extractor, and the > separate faithfulness judge were removed. Present-state authority: > [`README.md`](../../README.md) §4. ## 0. Why one LLM call per turn (not "scripted question graph + paraphraser") A good Independent Financial Advisor opens with a stable, repeatable set of slots to fill — and adapts which slot to drive next based on what the buyer just said, what they've already told us, and what's still missing. We replicate this with **one LLM call per turn** that owns the entire fact-find surface (voice, cadence, slot-selection, multi-fact capture in a single turn), not a hardcoded state machine with a paraphraser glued on top. The original v0.1 of this doc was an explicit `GRAPH` of canonical questions with `prompt_en` / `prompt_hi` strings, rendered by a scripted state machine. That scripted approach (and a later one-call-brain variant with a structured trailer + canonical fallback) produced robotic cadence: scripted prompts leaked into fallback turns. The current design removes all of that. There are no scripted prompts and no canonical fallback. One **Gemini 2.5-flash call per turn with function-calling** owns voice, cadence, and slot selection; the `GRAPH` is consulted only as the slot *schema*. What we kept and what we changed: | Concern | v0.1 (scripted graph) | Current (single LLM + tools) | |---|---|---| | **Slot schema** | `GRAPH` of 9 `Question(id, prompt_en, prompt_hi, field, is_core, condition, parser)` entries | `GRAPH` data structure retained as the schema source; `prompt_en` / `prompt_hi` are dead strings now | | **Question text** | Hardcoded `prompt_en` rendered to user | LLM generates natural prose in the advisor's voice per turn (no template) | | **Slot selection** | `next_question(profile)` walked the graph in order | The single LLM decides which slot to drive next from the schema + conversation so far (required slots first) | | **Multi-fact capture** | One slot per turn (a user saying "I'm 32, just myself, in Mumbai" only filled `age`) | One utterance can fill 2-4 slots; the LLM calls `save_profile_field` per captured fact | | **Auditable behavior** | Graph order was the trace | Per-turn LLM call + tool calls are logged (`save_profile_field` / `retrieve_policies` / `mark_recommendation`) in `40-data/llm_usage.jsonl` + `logs/turns.jsonl` | | **Fail-soft** | Graph survived LLM degradation | On a transient Gemini error the small `backend/nim_fallback.py` (NVIDIA NIM) completes the turn; fail-loud otherwise — no scripted reply | | **Bilingual** | Hand-authored `prompt_en` + `prompt_hi` | Sarvam-M translation on the LLM's output (English authoring + Indic translation), same UX, fewer hand-authored strings | ## 1. The 9-slot schema (data-only — the LLM consults this through its system prompt) ``` ┌─────────────────────────────┐ │ Q1: age (core) │ │ "What is your age?" │ └────────────┬────────────────┘ ▼ ┌─────────────────────────────┐ │ Q2: dependents (core) │ │ "Who else do you cover?" │ └────────────┬────────────────┘ ▼ ┌─────────────────────────────┐ │ Q3: income_band (core) │ │ "Annual income?" │ └────────────┬────────────────┘ ▼ ┌─────────────────────────────┐ │ Q4: existing_cover (core) │ │ "Already have health ins?" │ └────────────┬────────────────┘ ▼ ┌─────────────────────────────┐ │ Q5: primary_goal (core) │ │ "Why are you here?" │ └────────────┬────────────────┘ ▼ ┌─────────────────────────────┐ │ Q6: location (core) │ │ "Which city / tier?" │ └────────────┬────────────────┘ ▼ (conditional branches) ▼ ┌────────────────────────┴────────────────────────┐ ▼ ▼ ┌──────────────────────────────┐ ┌────────────────────────────────┐ │ Q7: parents_age (cond) │ │ Q8: health_conditions (always) │ │ asked IF dependents include │ │ "Any pre-existing condition │ │ 'parent' │ │ on your side?" │ └──────────────┬───────────────┘ └─────────────────┬──────────────┘ ▼ ▼ ┌──────────────────────────────────────┐ │ Q9: budget_band (core) │ │ "Premium budget?" │ └──────────────┬───────────────────────┘ ▼ ┌─────────────────────────────────────┐ │ Profile complete → readback + │ │ policy recommendation │ └─────────────────────────────────────┘ ``` ## 2. Termination criteria Fact-find is "done" when all required slots are filled. Because the single LLM owns the whole turn, it stops asking discovery questions and moves to retrieval/recommendation once the profile is sufficiently complete; required-slot checks against the `GRAPH` schema keep it from declaring done prematurely. The user can also opt out — if the user immediately asks "compare Star and HDFC ERGO" or "show me the top 3 policies for me", the same LLM call can skip remaining discovery and go straight to `retrieve_policies` / `mark_recommendation` within that turn. ## 3. Bilingual handling The LLM authors in English; Sarvam-M translates Hinglish ↔ English at the cascade boundary. The "Indic-native, not translated" principle (problem-statement §5.3) is preserved by the LLM's native-quality output rather than by hand-authored Hindi strings — the cascade itself is the bilingual surface. ## 4. Capturing user answers When the user states a fact, the LLM calls the `save_profile_field` tool (e.g. `save_profile_field(field="age", value=34)`). The tool implementation in `backend/brain_tools.py` is a pure-function handler that: - Resolves field-name aliases (`location` → `location_tier`) - Coerces enums (`Bangalore` → `metro`) - Parses INR amounts (`"32 lakh"` → `3200000`) - Drops null / empty values (a `None`/empty value never overwrites a filled field) - Validates types and bounds (rejects `age: 250`) The handler is pure rules — no nested LLM call. Captured fields are persisted to the session profile. ## 5. How the single LLM uses this There is no separate orchestrator / intent-classifier / sales-brain split. One `single_brain` call per turn receives the system prompt (which embeds the slot schema), the recent history, and the user's text, and uses function-calling tools to do everything in that one turn: - `save_profile_field` — record a captured fact (fact-find). - `retrieve_policies` — structured + vector retrieval over the corpus when the turn needs policy evidence (QA / comparison / recommendation). - `mark_recommendation` — record the policies the LLM recommends, surfaced with the profile-tuned scorecard. Because the LLM can only answer from what `retrieve_policies` returned and must cite it, grounding is structural — there is no separate faithfulness-judge LLM in the loop. On a transient Gemini error, `backend/nim_fallback.py` completes the turn. ## 6. What the user sees A natural conversation, not a script. Example (real session, 2026-05-15): ``` Bot: "Hi! I'm your insurance advisor. To get you the right shortlist I'll need to learn a bit about you and your situation. What's your name and a rough sense of who we're looking to protect?" User: "Hi, I'm Rohit Sar. I'm 32, just myself, living in Mumbai." Bot: "Nice to meet you, Rohit. Got it — 32, covering yourself, Mumbai-based. Two more quick things and we can look at policies: roughly what's your annual income, and have you had any health conditions diagnosed — diabetes, BP, anything chronic?" User: "around 18 lakhs, no conditions" Bot: "Perfect. And what's drawing you to insurance right now — first policy, top-up on employer cover, or something specific?" User: "first policy, employer cover feels too thin" Bot: "Makes sense. Last one: comfortable annual premium range — under 15K, 15-30K, 30-60K, or higher?" User: "15-30 is fine" Bot: "Great — that's everything I need. Based on your profile, three policies stand out: [...]" ``` Note: the LLM captured `{name, age, dependents, location_tier}` in a single turn from the user's opener via `save_profile_field` calls. The exact turn count varies — some sessions finish in 3-4 turns, others in 6-7 depending on what the user volunteers. ## 7. v2 enhancements | # | Enhancement | Why | | --- | --- | --- | | 1 | Stream the prose token-by-token to the frontend | Cuts perceived latency on longer replies | | 2 | Skip-confirm flow ("you can skip this — say 'skip'") | Buyer autonomy | | 3 | Save profile across sessions ✅ shipped KI-040 | Returning user picks up where they left off | | 4 | Tone-match the user's energy (formal vs casual) | The LLM already does this implicitly via the system prompt; could be explicit per-session signal |