kredd25 commited on
Commit
86c05e4
·
1 Parent(s): bee9a1c

v0.9: multimodal risk analyst — vision + reasoning + 6 data sources in one Gemma 4 call

Browse files

The risk analyst now receives the Street View photo as an image_url
content part alongside the text data from every other agent, with
reasoning mode enabled. Single Gemma 4 inference pass composes:
vision + reasoning + interleaved multimodal + structured JSON +
long context.

Per Gemma 4 best practice the image goes BEFORE the text in the
user message content array.

Concretely, the chain-of-thought trace now reads like a single mind
reasoning across both modalities, e.g. for 4521 S Drexel Blvd:

'I can see basement-level windows (partially obscured by the
fence). This is a critical vulnerability for surface water
ingress. The driveway is paved (impervious) and appears to slope
slightly toward the building/parking area, which could channel
water toward the foundation.'

…then cross-references with the FEMA zone, 311 count, weather
forecast, and AEP math. The model is examining the patient, not
reading the nurse's notes.

Backend:
- risk_agent.py accepts streetview_image_data_url; builds a
structured user-content array (image then text) when present,
falls back to the legacy text-only string when not. New JSON
field 'visual_corroboration' (2-3 sentence summary of what the
photo confirms/contradicts vs the data) and 'used_streetview_image'
flag for the UI.
- orchestrator.py extracts the data URL from the streetview agent's
result (when streetview.available) and passes it to the risk
agent. No second Google Maps fetch — the same in-memory data URL
the streetview agent already had.

Frontend:
- mapDossier surfaces visual_corroboration + used_streetview_image.
- New 'Gemma 4 multimodal reasoning' callout inside the FEMA-gap
section, sitting above the existing reasoning trace toggle.
Purple-accented to match the trace styling. Subtitle reads
'image + 6 data sources + chain-of-thought · one inference call'
so the composition is explicit to a casual reader.

Version markers:
- chrome wordmark: v0.8 → v0.9
- FastAPI app version: 0.8.0 → 0.9.0

app/agents/orchestrator.py CHANGED
@@ -168,10 +168,19 @@ async def run_assessment(
168
  "status": "working",
169
  "summary": "Synthesizing risk score with reasoning mode...",
170
  })
 
 
 
 
 
 
 
 
171
  try:
172
  risk_result = await run_risk_agent(
173
  results, geo["lat"], geo["lon"], geo["display_name"],
174
  language=language,
 
175
  )
176
  results["risk"] = risk_result
177
  yield sse("agent_update", {
 
168
  "status": "working",
169
  "summary": "Synthesizing risk score with reasoning mode...",
170
  })
171
+ # If the streetview agent succeeded, hand its image to the risk
172
+ # analyst so the analyst can do its own visual reasoning instead
173
+ # of just reading another agent's text findings. v0.9 multimodal.
174
+ sv_image_data_url = None
175
+ sv_result = results.get("streetview") or {}
176
+ if sv_result.get("available"):
177
+ sv_image_data_url = sv_result.get("image_data_url")
178
+
179
  try:
180
  risk_result = await run_risk_agent(
181
  results, geo["lat"], geo["lon"], geo["display_name"],
182
  language=language,
183
+ streetview_image_data_url=sv_image_data_url,
184
  )
185
  results["risk"] = risk_result
186
  yield sse("agent_update", {
app/agents/risk_agent.py CHANGED
@@ -1,10 +1,27 @@
1
  """Risk-analyst agent — THE Gemma 4 reasoning showcase.
2
 
3
- Synthesizes every data agent's output into a single risk score using
4
- Gemma 4 with reasoning mode enabled. The reasoning trace itself is
5
- preserved on the dossier for the writeup/demo.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """
7
  import json
 
8
 
9
  from app.data.languages import prompt_directive
10
  from app.llm.client import (
@@ -22,9 +39,35 @@ async def run_risk_agent(
22
  lon: float,
23
  address: str,
24
  language: str = "en",
 
25
  ) -> dict:
26
- user_prompt = f"""You are analyzing flood risk for: {address} ({lat}, {lon})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
 
 
28
  Here is all the data collected by our investigation team:
29
 
30
  ## FEMA Expert Findings
@@ -33,6 +76,9 @@ Here is all the data collected by our investigation team:
33
  ## Local Infrastructure Findings (311 data, sewer type)
34
  {json.dumps(all_data.get('local', {}), indent=2, default=str)}
35
 
 
 
 
36
  ## Weather & Hydrology Findings
37
  {json.dumps(all_data.get('weather', {}), indent=2, default=str)}
38
 
@@ -44,7 +90,8 @@ Here is all the data collected by our investigation team:
44
 
45
  ---
46
 
47
- TASK: Synthesize all of this data into a flood risk assessment.
 
48
 
49
  IMPORTANT CONTEXT:
50
  - A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
@@ -62,17 +109,34 @@ Return a JSON object with:
62
  "aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
63
  "mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
64
  "fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
 
65
  "key_risk_factors": ["<ranked list of top risk factors>"],
66
  "mitigating_factors": ["<factors that reduce risk>"],
67
  "summary": "<1 sentence for the status feed>"
68
  }}
69
 
70
- Think step by step. Show your reasoning. Return ONLY the JSON object at the end."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  response = await call_gemma4(
73
  messages=[
74
  {"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
75
- {"role": "user", "content": user_prompt},
76
  ],
77
  reasoning=True,
78
  temperature=0.2,
@@ -85,6 +149,7 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
85
  parsed = parse_json_response(text)
86
  if parsed:
87
  parsed["reasoning_trace"] = reasoning
 
88
  return parsed
89
 
90
  return {
@@ -93,4 +158,5 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
93
  "summary": "Risk analysis returned non-JSON output; using fallback",
94
  "raw_response": text,
95
  "reasoning_trace": reasoning,
 
96
  }
 
1
  """Risk-analyst agent — THE Gemma 4 reasoning showcase.
2
 
3
+ As of v0.9 this agent is **multimodal**: it receives the Street View
4
+ photograph of the property as an `image_url` content part alongside
5
+ the text data from every other agent, with reasoning mode enabled.
6
+ That means a single Gemma 4 inference pass composes:
7
+
8
+ - Image understanding (the property photo)
9
+ - Reasoning mode (chain-of-thought trace)
10
+ - Interleaved multimodal input (image + text mixed in one prompt)
11
+ - Structured JSON output (the dossier risk schema)
12
+ - Long context (~6-10K text tokens + image tokens)
13
+
14
+ The chain-of-thought trace is preserved on the dossier — the model
15
+ is explicitly asked to weave together what it SEES in the photo with
16
+ what the data SAYS, so the trace reads like a single mind reasoning
17
+ across both modalities, not like one agent summarizing another's
18
+ notes.
19
+
20
+ Per Gemma 4 best practice, the image is placed BEFORE the text in
21
+ the user message content array.
22
  """
23
  import json
24
+ from typing import Optional
25
 
26
  from app.data.languages import prompt_directive
27
  from app.llm.client import (
 
39
  lon: float,
40
  address: str,
41
  language: str = "en",
42
+ streetview_image_data_url: Optional[str] = None,
43
  ) -> dict:
44
+ has_image = bool(streetview_image_data_url)
45
+
46
+ image_section = ""
47
+ if has_image:
48
+ image_section = """
49
+ ## Property photograph (Street View)
50
+ A street-level photo of the property is included with this prompt
51
+ (it appears immediately above this text). EXAMINE IT YOURSELF before
52
+ reading the data sections. Look for:
53
+ - Lot elevation relative to street grade (above, level, or below)
54
+ - Basement-level windows, below-grade entries, sunken stairwells
55
+ - Downspout connections (running into ground? into sewer? disconnected?)
56
+ - Visible drainage infrastructure (French drains, catch basins, swales)
57
+ - Ground-floor HVAC equipment, electrical panels, or utilities
58
+ - Evidence of prior water damage (staining, erosion, repair patches)
59
+ - Impervious surface coverage (concrete / asphalt vs. permeable ground)
60
+ - Distance to obvious water features (canals, low-lying parks)
61
+
62
+ You will get the Street View agent's text findings below in the
63
+ 'Street View Visual Analysis' section, but rely on YOUR OWN
64
+ inspection of the photo as the primary source. If you see something
65
+ the Street View agent missed, say so. If you disagree with its
66
+ assessment, explain why based on what YOU see.
67
+ """
68
 
69
+ text_prompt = f"""You are analyzing flood risk for: {address} ({lat}, {lon})
70
+ {image_section}
71
  Here is all the data collected by our investigation team:
72
 
73
  ## FEMA Expert Findings
 
76
  ## Local Infrastructure Findings (311 data, sewer type)
77
  {json.dumps(all_data.get('local', {}), indent=2, default=str)}
78
 
79
+ ## Street View Visual Analysis (from the streetview agent)
80
+ {json.dumps({k: v for k, v in (all_data.get('streetview') or {}).items() if k != 'image_data_url'}, indent=2, default=str)}
81
+
82
  ## Weather & Hydrology Findings
83
  {json.dumps(all_data.get('weather', {}), indent=2, default=str)}
84
 
 
90
 
91
  ---
92
 
93
+ TASK: Synthesize ALL of this data including your own visual
94
+ inspection of the property photo — into a flood risk assessment.
95
 
96
  IMPORTANT CONTEXT:
97
  - A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
 
109
  "aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
110
  "mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
111
  "fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
112
+ "visual_corroboration": {"<2-3 sentences on what the photo confirms, contradicts, or adds beyond the data; '' if no image was provided>" if has_image else "''"},
113
  "key_risk_factors": ["<ranked list of top risk factors>"],
114
  "mitigating_factors": ["<factors that reduce risk>"],
115
  "summary": "<1 sentence for the status feed>"
116
  }}
117
 
118
+ Think step by step. Integrate visual and data evidence. Reference the
119
+ photo directly in your reasoning ("I can see ...", "The image shows ...")
120
+ when relevant. Return ONLY the JSON object at the end."""
121
+
122
+ # Build the user message content. Per Gemma 4 best practice,
123
+ # image content parts go BEFORE the text part.
124
+ user_content: list = []
125
+ if has_image:
126
+ user_content.append({
127
+ "type": "image_url",
128
+ "image_url": {"url": streetview_image_data_url},
129
+ })
130
+ user_content.append({"type": "text", "text": text_prompt})
131
+
132
+ # Some providers prefer a plain string for text-only requests; only
133
+ # send the structured content list when we actually have an image.
134
+ user_message_content = user_content if has_image else text_prompt
135
 
136
  response = await call_gemma4(
137
  messages=[
138
  {"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
139
+ {"role": "user", "content": user_message_content},
140
  ],
141
  reasoning=True,
142
  temperature=0.2,
 
149
  parsed = parse_json_response(text)
150
  if parsed:
151
  parsed["reasoning_trace"] = reasoning
152
+ parsed["used_streetview_image"] = has_image
153
  return parsed
154
 
155
  return {
 
158
  "summary": "Risk analysis returned non-JSON output; using fallback",
159
  "raw_response": text,
160
  "reasoning_trace": reasoning,
161
+ "used_streetview_image": has_image,
162
  }
app/main.py CHANGED
@@ -8,7 +8,7 @@ from fastapi.staticfiles import StaticFiles
8
  from app.api.assess import router as assess_router
9
  from app.api.health import router as health_router
10
 
11
- app = FastAPI(title="FlutIQ", version="0.8.0")
12
 
13
  # CORS still permissive for split-deployment scenarios. With the
14
  # bundled deploy (frontend served from FastAPI) it's a no-op because
 
8
  from app.api.assess import router as assess_router
9
  from app.api.health import router as health_router
10
 
11
+ app = FastAPI(title="FlutIQ", version="0.9.0")
12
 
13
  # CORS still permissive for split-deployment scenarios. With the
14
  # bundled deploy (frontend served from FastAPI) it's a no-op because
static/index.html CHANGED
@@ -1110,6 +1110,8 @@ const mapDossier = (raw) => {
1110
  risk_factors: risk.key_risk_factors || [],
1111
  mitigating_factors: risk.mitigating_factors || [],
1112
  reasoning_trace: risk.reasoning_trace || "",
 
 
1113
  advisor_tldr,
1114
  streetview: raw.streetview || {},
1115
  };
@@ -1380,8 +1382,25 @@ const DossierScreen = ({ onBack, dossier }) => {
1380
  )}
1381
  </div>
1382
  )}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1383
  {D.reasoning_trace && (
1384
- <div style={{marginTop: 18, padding: "12px 14px", background: "rgba(110,95,216,0.06)", border: "1px solid rgba(110,95,216,0.2)", borderRadius: 10}}>
1385
  <div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
1386
  <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
1387
  <span>◆ Gemma 4 reasoning trace</span>
@@ -1445,7 +1464,7 @@ const Chrome = ({ screen, onJump, dark, onToggleDark, language, onLanguageChange
1445
  <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
1446
  <span className="glyph">F</span>
1447
  <span>FlutIQ</span>
1448
- <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.8 · beta</span>
1449
  </div>
1450
  <div className="chrome-meta">
1451
  <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>
 
1110
  risk_factors: risk.key_risk_factors || [],
1111
  mitigating_factors: risk.mitigating_factors || [],
1112
  reasoning_trace: risk.reasoning_trace || "",
1113
+ visual_corroboration: risk.visual_corroboration || "",
1114
+ used_streetview_image: !!risk.used_streetview_image,
1115
  advisor_tldr,
1116
  streetview: raw.streetview || {},
1117
  };
 
1382
  )}
1383
  </div>
1384
  )}
1385
+ {(D.visual_corroboration || D.used_streetview_image) && (
1386
+ <div style={{marginTop: 18, padding: "14px 16px", background: "linear-gradient(135deg, rgba(110,95,216,0.08), rgba(43,111,212,0.06))", border: "1px solid rgba(110,95,216,0.25)", borderRadius: 10}}>
1387
+ <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 11, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.06em", textTransform: "uppercase", color: "var(--purple)", marginBottom: 8}}>
1388
+ <span>◆ Gemma 4 multimodal reasoning</span>
1389
+ <span style={{color: "var(--ink-4)", fontWeight: 400, textTransform: "none", letterSpacing: "0.02em"}}>· image + 6 data sources + chain-of-thought · one inference call</span>
1390
+ </div>
1391
+ {D.visual_corroboration ? (
1392
+ <p style={{margin: 0, fontSize: 14, lineHeight: 1.55, color: "var(--ink)"}}>
1393
+ <em>{D.visual_corroboration}</em>
1394
+ </p>
1395
+ ) : (
1396
+ <p style={{margin: 0, fontSize: 13, color: "var(--ink-3)"}}>
1397
+ The risk analyst received the Street View photograph and reasoned about it directly alongside the data — see the trace below.
1398
+ </p>
1399
+ )}
1400
+ </div>
1401
+ )}
1402
  {D.reasoning_trace && (
1403
+ <div style={{marginTop: 12, padding: "12px 14px", background: "rgba(110,95,216,0.06)", border: "1px solid rgba(110,95,216,0.2)", borderRadius: 10}}>
1404
  <div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
1405
  <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
1406
  <span>◆ Gemma 4 reasoning trace</span>
 
1464
  <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
1465
  <span className="glyph">F</span>
1466
  <span>FlutIQ</span>
1467
+ <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.9 · beta</span>
1468
  </div>
1469
  <div className="chrome-meta">
1470
  <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>