Spaces:
Sleeping
05 β Needs Analysis Flow (single-LLM-with-tools)
| Field | Value |
|---|---|
| Project | Insurance Sales Portfolio Expert |
| Version | 0.3 |
| Date | 2026-05-17 |
| Implementation | backend/single_brain.py + backend/brain_tools.py. The slot schema still lives in backend/needs_finder.py::GRAPH. |
βΉοΈ Design intent below is still accurate; implementation pointers updated. The core idea β one LLM call per turn owns the entire fact-find surface β is what the single-LLM-with-tools handler does today. Fact-find, retrieval, QA, and recommendation are all handled in that one call via function-calling tools; the separate
sales_brain/qa_brainsplit, the orchestrator, the profile_extractor, and the separate faithfulness judge were removed. Present-state authority:README.mdΒ§4.
0. Why one LLM call per turn (not "scripted question graph + paraphraser")
A good Independent Financial Advisor opens with a stable, repeatable set of slots to fill β and adapts which slot to drive next based on what the buyer just said, what they've already told us, and what's still missing. We replicate this with one LLM call per turn that owns the entire fact-find surface (voice, cadence, slot-selection, multi-fact capture in a single turn), not a hardcoded state machine with a paraphraser glued on top.
The original v0.1 of this doc was an explicit GRAPH of canonical questions with prompt_en / prompt_hi strings, rendered by a scripted state machine. That scripted approach (and a later one-call-brain variant with a structured trailer + canonical fallback) produced robotic cadence: scripted prompts leaked into fallback turns. The current design removes all of that. There are no scripted prompts and no canonical fallback. One Gemini 2.5-flash call per turn with function-calling owns voice, cadence, and slot selection; the GRAPH is consulted only as the slot schema.
What we kept and what we changed:
| Concern | v0.1 (scripted graph) | Current (single LLM + tools) |
|---|---|---|
| Slot schema | GRAPH of 9 Question(id, prompt_en, prompt_hi, field, is_core, condition, parser) entries |
GRAPH data structure retained as the schema source; prompt_en / prompt_hi are dead strings now |
| Question text | Hardcoded prompt_en rendered to user |
LLM generates natural prose in the advisor's voice per turn (no template) |
| Slot selection | next_question(profile) walked the graph in order |
The single LLM decides which slot to drive next from the schema + conversation so far (required slots first) |
| Multi-fact capture | One slot per turn (a user saying "I'm 32, just myself, in Mumbai" only filled age) |
One utterance can fill 2-4 slots; the LLM calls save_profile_field per captured fact |
| Auditable behavior | Graph order was the trace | Per-turn LLM call + tool calls are logged (save_profile_field / retrieve_policies / mark_recommendation) in 40-data/llm_usage.jsonl + logs/turns.jsonl |
| Fail-soft | Graph survived LLM degradation | On a transient Gemini error the small backend/nim_fallback.py (NVIDIA NIM) completes the turn; fail-loud otherwise β no scripted reply |
| Bilingual | Hand-authored prompt_en + prompt_hi |
Sarvam-M translation on the LLM's output (English authoring + Indic translation), same UX, fewer hand-authored strings |
1. The 9-slot schema (data-only β the LLM consults this through its system prompt)
βββββββββββββββββββββββββββββββ
β Q1: age (core) β
β "What is your age?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Q2: dependents (core) β
β "Who else do you cover?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Q3: income_band (core) β
β "Annual income?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Q4: existing_cover (core) β
β "Already have health ins?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Q5: primary_goal (core) β
β "Why are you here?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββ
β Q6: location (core) β
β "Which city / tier?" β
ββββββββββββββ¬βββββββββββββββββ
βΌ
(conditional branches)
βΌ
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ
β Q7: parents_age (cond) β β Q8: health_conditions (always) β
β asked IF dependents include β β "Any pre-existing condition β
β 'parent' β β on your side?" β
ββββββββββββββββ¬ββββββββββββββββ βββββββββββββββββββ¬βββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Q9: budget_band (core) β
β "Premium budget?" β
ββββββββββββββββ¬ββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Profile complete β readback + β
β policy recommendation β
βββββββββββββββββββββββββββββββββββββββ
2. Termination criteria
Fact-find is "done" when all required slots are filled. Because the single LLM owns the whole turn, it stops asking discovery questions and moves to retrieval/recommendation once the profile is sufficiently complete; required-slot checks against the GRAPH schema keep it from declaring done prematurely.
The user can also opt out β if the user immediately asks "compare Star and HDFC ERGO" or "show me the top 3 policies for me", the same LLM call can skip remaining discovery and go straight to retrieve_policies / mark_recommendation within that turn.
3. Bilingual handling
The LLM authors in English; Sarvam-M translates Hinglish β English at the cascade boundary. The "Indic-native, not translated" principle (problem-statement Β§5.3) is preserved by the LLM's native-quality output rather than by hand-authored Hindi strings β the cascade itself is the bilingual surface.
4. Capturing user answers
When the user states a fact, the LLM calls the save_profile_field tool (e.g. save_profile_field(field="age", value=34)). The tool implementation in backend/brain_tools.py is a pure-function handler that:
- Resolves field-name aliases (
locationβlocation_tier) - Coerces enums (
Bangaloreβmetro) - Parses INR amounts (
"32 lakh"β3200000) - Drops null / empty values (a
None/empty value never overwrites a filled field) - Validates types and bounds (rejects
age: 250)
The handler is pure rules β no nested LLM call. Captured fields are persisted to the session profile.
5. How the single LLM uses this
There is no separate orchestrator / intent-classifier / sales-brain split. One single_brain call per turn receives the system prompt (which embeds the slot schema), the recent history, and the user's text, and uses function-calling tools to do everything in that one turn:
save_profile_fieldβ record a captured fact (fact-find).retrieve_policiesβ structured + vector retrieval over the corpus when the turn needs policy evidence (QA / comparison / recommendation).mark_recommendationβ record the policies the LLM recommends, surfaced with the profile-tuned scorecard.
Because the LLM can only answer from what retrieve_policies returned and must cite it, grounding is structural β there is no separate faithfulness-judge LLM in the loop. On a transient Gemini error, backend/nim_fallback.py completes the turn.
6. What the user sees
A natural conversation, not a script. Example (real session, 2026-05-15):
Bot: "Hi! I'm your insurance advisor. To get you the right shortlist I'll need
to learn a bit about you and your situation. What's your name and a rough
sense of who we're looking to protect?"
User: "Hi, I'm Rohit Sar. I'm 32, just myself, living in Mumbai."
Bot: "Nice to meet you, Rohit. Got it β 32, covering yourself, Mumbai-based.
Two more quick things and we can look at policies: roughly what's your
annual income, and have you had any health conditions diagnosed β
diabetes, BP, anything chronic?"
User: "around 18 lakhs, no conditions"
Bot: "Perfect. And what's drawing you to insurance right now β first policy,
top-up on employer cover, or something specific?"
User: "first policy, employer cover feels too thin"
Bot: "Makes sense. Last one: comfortable annual premium range β under 15K,
15-30K, 30-60K, or higher?"
User: "15-30 is fine"
Bot: "Great β that's everything I need. Based on your profile, three policies
stand out: [...]"
Note: the LLM captured {name, age, dependents, location_tier} in a single turn from the user's opener via save_profile_field calls. The exact turn count varies β some sessions finish in 3-4 turns, others in 6-7 depending on what the user volunteers.
7. v2 enhancements
| # | Enhancement | Why |
|---|---|---|
| 1 | Stream the prose token-by-token to the frontend | Cuts perceived latency on longer replies |
| 2 | Skip-confirm flow ("you can skip this β say 'skip'") | Buyer autonomy |
| 3 | Save profile across sessions β shipped KI-040 | Returning user picks up where they left off |
| 4 | Tone-match the user's energy (formal vs casual) | The LLM already does this implicitly via the system prompt; could be explicit per-session signal |