InsuranceBot / 70-docs /30-engineering /needs-analysis-flow.md
rohitsar567's picture
chore(cleanup): purge stale narrative/tombstones/dead code β€” codebase reads as the current standard
23b8fad
|
Raw
History Blame Contribute Delete
12.1 kB
# 05 β€” Needs Analysis Flow (single-LLM-with-tools)
| Field | Value |
| --- | --- |
| Project | Insurance Sales Portfolio Expert |
| Version | 0.3 |
| Date | 2026-05-17 |
| Implementation | `backend/single_brain.py` + `backend/brain_tools.py`. The slot schema still lives in `backend/needs_finder.py::GRAPH`. |
> ℹ️ **Design intent below is still accurate; implementation pointers
> updated.** The core idea β€” *one LLM call per turn owns the entire
> fact-find surface* β€” is what the single-LLM-with-tools handler does
> today. Fact-find, retrieval, QA, and recommendation are all handled in
> that one call via function-calling tools; the separate `sales_brain` /
> `qa_brain` split, the orchestrator, the profile_extractor, and the
> separate faithfulness judge were removed. Present-state authority:
> [`README.md`](../../README.md) Β§4.
## 0. Why one LLM call per turn (not "scripted question graph + paraphraser")
A good Independent Financial Advisor opens with a stable, repeatable set of slots to fill β€” and adapts which slot to drive next based on what the buyer just said, what they've already told us, and what's still missing. We replicate this with **one LLM call per turn** that owns the entire fact-find surface (voice, cadence, slot-selection, multi-fact capture in a single turn), not a hardcoded state machine with a paraphraser glued on top.
The original v0.1 of this doc was an explicit `GRAPH` of canonical questions with `prompt_en` / `prompt_hi` strings, rendered by a scripted state machine. That scripted approach (and a later one-call-brain variant with a structured trailer + canonical fallback) produced robotic cadence: scripted prompts leaked into fallback turns. The current design removes all of that. There are no scripted prompts and no canonical fallback. One **Gemini 2.5-flash call per turn with function-calling** owns voice, cadence, and slot selection; the `GRAPH` is consulted only as the slot *schema*.
What we kept and what we changed:
| Concern | v0.1 (scripted graph) | Current (single LLM + tools) |
|---|---|---|
| **Slot schema** | `GRAPH` of 9 `Question(id, prompt_en, prompt_hi, field, is_core, condition, parser)` entries | `GRAPH` data structure retained as the schema source; `prompt_en` / `prompt_hi` are dead strings now |
| **Question text** | Hardcoded `prompt_en` rendered to user | LLM generates natural prose in the advisor's voice per turn (no template) |
| **Slot selection** | `next_question(profile)` walked the graph in order | The single LLM decides which slot to drive next from the schema + conversation so far (required slots first) |
| **Multi-fact capture** | One slot per turn (a user saying "I'm 32, just myself, in Mumbai" only filled `age`) | One utterance can fill 2-4 slots; the LLM calls `save_profile_field` per captured fact |
| **Auditable behavior** | Graph order was the trace | Per-turn LLM call + tool calls are logged (`save_profile_field` / `retrieve_policies` / `mark_recommendation`) in `40-data/llm_usage.jsonl` + `logs/turns.jsonl` |
| **Fail-soft** | Graph survived LLM degradation | On a transient Gemini error the small `backend/nim_fallback.py` (NVIDIA NIM) completes the turn; fail-loud otherwise β€” no scripted reply |
| **Bilingual** | Hand-authored `prompt_en` + `prompt_hi` | Sarvam-M translation on the LLM's output (English authoring + Indic translation), same UX, fewer hand-authored strings |
## 1. The 9-slot schema (data-only β€” the LLM consults this through its system prompt)
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q1: age (core) β”‚
β”‚ "What is your age?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q2: dependents (core) β”‚
β”‚ "Who else do you cover?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q3: income_band (core) β”‚
β”‚ "Annual income?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q4: existing_cover (core) β”‚
β”‚ "Already have health ins?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q5: primary_goal (core) β”‚
β”‚ "Why are you here?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q6: location (core) β”‚
β”‚ "Which city / tier?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
(conditional branches)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q7: parents_age (cond) β”‚ β”‚ Q8: health_conditions (always) β”‚
β”‚ asked IF dependents include β”‚ β”‚ "Any pre-existing condition β”‚
β”‚ 'parent' β”‚ β”‚ on your side?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q9: budget_band (core) β”‚
β”‚ "Premium budget?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Profile complete β†’ readback + β”‚
β”‚ policy recommendation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## 2. Termination criteria
Fact-find is "done" when all required slots are filled. Because the single LLM owns the whole turn, it stops asking discovery questions and moves to retrieval/recommendation once the profile is sufficiently complete; required-slot checks against the `GRAPH` schema keep it from declaring done prematurely.
The user can also opt out β€” if the user immediately asks "compare Star and HDFC ERGO" or "show me the top 3 policies for me", the same LLM call can skip remaining discovery and go straight to `retrieve_policies` / `mark_recommendation` within that turn.
## 3. Bilingual handling
The LLM authors in English; Sarvam-M translates Hinglish ↔ English at the cascade boundary. The "Indic-native, not translated" principle (problem-statement Β§5.3) is preserved by the LLM's native-quality output rather than by hand-authored Hindi strings β€” the cascade itself is the bilingual surface.
## 4. Capturing user answers
When the user states a fact, the LLM calls the `save_profile_field` tool (e.g. `save_profile_field(field="age", value=34)`). The tool implementation in `backend/brain_tools.py` is a pure-function handler that:
- Resolves field-name aliases (`location` β†’ `location_tier`)
- Coerces enums (`Bangalore` β†’ `metro`)
- Parses INR amounts (`"32 lakh"` β†’ `3200000`)
- Drops null / empty values (a `None`/empty value never overwrites a filled field)
- Validates types and bounds (rejects `age: 250`)
The handler is pure rules β€” no nested LLM call. Captured fields are persisted to the session profile.
## 5. How the single LLM uses this
There is no separate orchestrator / intent-classifier / sales-brain split. One `single_brain` call per turn receives the system prompt (which embeds the slot schema), the recent history, and the user's text, and uses function-calling tools to do everything in that one turn:
- `save_profile_field` β€” record a captured fact (fact-find).
- `retrieve_policies` β€” structured + vector retrieval over the corpus when the turn needs policy evidence (QA / comparison / recommendation).
- `mark_recommendation` β€” record the policies the LLM recommends, surfaced with the profile-tuned scorecard.
Because the LLM can only answer from what `retrieve_policies` returned and must cite it, grounding is structural β€” there is no separate faithfulness-judge LLM in the loop. On a transient Gemini error, `backend/nim_fallback.py` completes the turn.
## 6. What the user sees
A natural conversation, not a script. Example (real session, 2026-05-15):
```
Bot: "Hi! I'm your insurance advisor. To get you the right shortlist I'll need
to learn a bit about you and your situation. What's your name and a rough
sense of who we're looking to protect?"
User: "Hi, I'm Rohit Sar. I'm 32, just myself, living in Mumbai."
Bot: "Nice to meet you, Rohit. Got it β€” 32, covering yourself, Mumbai-based.
Two more quick things and we can look at policies: roughly what's your
annual income, and have you had any health conditions diagnosed β€”
diabetes, BP, anything chronic?"
User: "around 18 lakhs, no conditions"
Bot: "Perfect. And what's drawing you to insurance right now β€” first policy,
top-up on employer cover, or something specific?"
User: "first policy, employer cover feels too thin"
Bot: "Makes sense. Last one: comfortable annual premium range β€” under 15K,
15-30K, 30-60K, or higher?"
User: "15-30 is fine"
Bot: "Great β€” that's everything I need. Based on your profile, three policies
stand out: [...]"
```
Note: the LLM captured `{name, age, dependents, location_tier}` in a single turn from the user's opener via `save_profile_field` calls. The exact turn count varies β€” some sessions finish in 3-4 turns, others in 6-7 depending on what the user volunteers.
## 7. v2 enhancements
| # | Enhancement | Why |
| --- | --- | --- |
| 1 | Stream the prose token-by-token to the frontend | Cuts perceived latency on longer replies |
| 2 | Skip-confirm flow ("you can skip this β€” say 'skip'") | Buyer autonomy |
| 3 | Save profile across sessions βœ… shipped KI-040 | Returning user picks up where they left off |
| 4 | Tone-match the user's energy (formal vs casual) | The LLM already does this implicitly via the system prompt; could be explicit per-session signal |