InsuranceBot / 70-docs /30-engineering /needs-analysis-flow.md
rohitsar567's picture
chore(cleanup): purge stale narrative/tombstones/dead code β€” codebase reads as the current standard
23b8fad
|
Raw
History Blame Contribute Delete
12.1 kB

05 β€” Needs Analysis Flow (single-LLM-with-tools)

Field Value
Project Insurance Sales Portfolio Expert
Version 0.3
Date 2026-05-17
Implementation backend/single_brain.py + backend/brain_tools.py. The slot schema still lives in backend/needs_finder.py::GRAPH.

ℹ️ Design intent below is still accurate; implementation pointers updated. The core idea β€” one LLM call per turn owns the entire fact-find surface β€” is what the single-LLM-with-tools handler does today. Fact-find, retrieval, QA, and recommendation are all handled in that one call via function-calling tools; the separate sales_brain / qa_brain split, the orchestrator, the profile_extractor, and the separate faithfulness judge were removed. Present-state authority: README.md Β§4.

0. Why one LLM call per turn (not "scripted question graph + paraphraser")

A good Independent Financial Advisor opens with a stable, repeatable set of slots to fill β€” and adapts which slot to drive next based on what the buyer just said, what they've already told us, and what's still missing. We replicate this with one LLM call per turn that owns the entire fact-find surface (voice, cadence, slot-selection, multi-fact capture in a single turn), not a hardcoded state machine with a paraphraser glued on top.

The original v0.1 of this doc was an explicit GRAPH of canonical questions with prompt_en / prompt_hi strings, rendered by a scripted state machine. That scripted approach (and a later one-call-brain variant with a structured trailer + canonical fallback) produced robotic cadence: scripted prompts leaked into fallback turns. The current design removes all of that. There are no scripted prompts and no canonical fallback. One Gemini 2.5-flash call per turn with function-calling owns voice, cadence, and slot selection; the GRAPH is consulted only as the slot schema.

What we kept and what we changed:

Concern v0.1 (scripted graph) Current (single LLM + tools)
Slot schema GRAPH of 9 Question(id, prompt_en, prompt_hi, field, is_core, condition, parser) entries GRAPH data structure retained as the schema source; prompt_en / prompt_hi are dead strings now
Question text Hardcoded prompt_en rendered to user LLM generates natural prose in the advisor's voice per turn (no template)
Slot selection next_question(profile) walked the graph in order The single LLM decides which slot to drive next from the schema + conversation so far (required slots first)
Multi-fact capture One slot per turn (a user saying "I'm 32, just myself, in Mumbai" only filled age) One utterance can fill 2-4 slots; the LLM calls save_profile_field per captured fact
Auditable behavior Graph order was the trace Per-turn LLM call + tool calls are logged (save_profile_field / retrieve_policies / mark_recommendation) in 40-data/llm_usage.jsonl + logs/turns.jsonl
Fail-soft Graph survived LLM degradation On a transient Gemini error the small backend/nim_fallback.py (NVIDIA NIM) completes the turn; fail-loud otherwise β€” no scripted reply
Bilingual Hand-authored prompt_en + prompt_hi Sarvam-M translation on the LLM's output (English authoring + Indic translation), same UX, fewer hand-authored strings

1. The 9-slot schema (data-only β€” the LLM consults this through its system prompt)

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q1: age (core)             β”‚
                    β”‚  "What is your age?"        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q2: dependents (core)      β”‚
                    β”‚  "Who else do you cover?"   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q3: income_band (core)     β”‚
                    β”‚  "Annual income?"           β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q4: existing_cover (core)  β”‚
                    β”‚  "Already have health ins?" β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q5: primary_goal (core)    β”‚
                    β”‚  "Why are you here?"        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Q6: location (core)        β”‚
                    β”‚  "Which city / tier?"       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                         (conditional branches)
                                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Q7: parents_age (cond)       β”‚         β”‚ Q8: health_conditions (always) β”‚
β”‚ asked IF dependents include  β”‚         β”‚ "Any pre-existing condition    β”‚
β”‚ 'parent'                     β”‚         β”‚  on your side?"                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β–Ό                                            β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  Q9: budget_band (core)              β”‚
                                β”‚  "Premium budget?"                   β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                               β–Ό
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Profile complete β†’ readback +      β”‚
                          β”‚  policy recommendation              β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Termination criteria

Fact-find is "done" when all required slots are filled. Because the single LLM owns the whole turn, it stops asking discovery questions and moves to retrieval/recommendation once the profile is sufficiently complete; required-slot checks against the GRAPH schema keep it from declaring done prematurely.

The user can also opt out β€” if the user immediately asks "compare Star and HDFC ERGO" or "show me the top 3 policies for me", the same LLM call can skip remaining discovery and go straight to retrieve_policies / mark_recommendation within that turn.

3. Bilingual handling

The LLM authors in English; Sarvam-M translates Hinglish ↔ English at the cascade boundary. The "Indic-native, not translated" principle (problem-statement Β§5.3) is preserved by the LLM's native-quality output rather than by hand-authored Hindi strings β€” the cascade itself is the bilingual surface.

4. Capturing user answers

When the user states a fact, the LLM calls the save_profile_field tool (e.g. save_profile_field(field="age", value=34)). The tool implementation in backend/brain_tools.py is a pure-function handler that:

  • Resolves field-name aliases (location β†’ location_tier)
  • Coerces enums (Bangalore β†’ metro)
  • Parses INR amounts ("32 lakh" β†’ 3200000)
  • Drops null / empty values (a None/empty value never overwrites a filled field)
  • Validates types and bounds (rejects age: 250)

The handler is pure rules β€” no nested LLM call. Captured fields are persisted to the session profile.

5. How the single LLM uses this

There is no separate orchestrator / intent-classifier / sales-brain split. One single_brain call per turn receives the system prompt (which embeds the slot schema), the recent history, and the user's text, and uses function-calling tools to do everything in that one turn:

  • save_profile_field β€” record a captured fact (fact-find).
  • retrieve_policies β€” structured + vector retrieval over the corpus when the turn needs policy evidence (QA / comparison / recommendation).
  • mark_recommendation β€” record the policies the LLM recommends, surfaced with the profile-tuned scorecard.

Because the LLM can only answer from what retrieve_policies returned and must cite it, grounding is structural β€” there is no separate faithfulness-judge LLM in the loop. On a transient Gemini error, backend/nim_fallback.py completes the turn.

6. What the user sees

A natural conversation, not a script. Example (real session, 2026-05-15):

Bot: "Hi! I'm your insurance advisor. To get you the right shortlist I'll need
      to learn a bit about you and your situation. What's your name and a rough
      sense of who we're looking to protect?"
User: "Hi, I'm Rohit Sar. I'm 32, just myself, living in Mumbai."
Bot: "Nice to meet you, Rohit. Got it β€” 32, covering yourself, Mumbai-based.
      Two more quick things and we can look at policies: roughly what's your
      annual income, and have you had any health conditions diagnosed β€”
      diabetes, BP, anything chronic?"
User: "around 18 lakhs, no conditions"
Bot: "Perfect. And what's drawing you to insurance right now β€” first policy,
      top-up on employer cover, or something specific?"
User: "first policy, employer cover feels too thin"
Bot: "Makes sense. Last one: comfortable annual premium range β€” under 15K,
      15-30K, 30-60K, or higher?"
User: "15-30 is fine"
Bot: "Great β€” that's everything I need. Based on your profile, three policies
      stand out: [...]"

Note: the LLM captured {name, age, dependents, location_tier} in a single turn from the user's opener via save_profile_field calls. The exact turn count varies β€” some sessions finish in 3-4 turns, others in 6-7 depending on what the user volunteers.

7. v2 enhancements

# Enhancement Why
1 Stream the prose token-by-token to the frontend Cuts perceived latency on longer replies
2 Skip-confirm flow ("you can skip this β€” say 'skip'") Buyer autonomy
3 Save profile across sessions βœ… shipped KI-040 Returning user picks up where they left off
4 Tone-match the user's energy (formal vs casual) The LLM already does this implicitly via the system prompt; could be explicit per-session signal