v0.9: multimodal risk analyst — vision + reasoning + 6 data sources in one Gemma 4 call
Browse filesThe risk analyst now receives the Street View photo as an image_url
content part alongside the text data from every other agent, with
reasoning mode enabled. Single Gemma 4 inference pass composes:
vision + reasoning + interleaved multimodal + structured JSON +
long context.
Per Gemma 4 best practice the image goes BEFORE the text in the
user message content array.
Concretely, the chain-of-thought trace now reads like a single mind
reasoning across both modalities, e.g. for 4521 S Drexel Blvd:
'I can see basement-level windows (partially obscured by the
fence). This is a critical vulnerability for surface water
ingress. The driveway is paved (impervious) and appears to slope
slightly toward the building/parking area, which could channel
water toward the foundation.'
…then cross-references with the FEMA zone, 311 count, weather
forecast, and AEP math. The model is examining the patient, not
reading the nurse's notes.
Backend:
- risk_agent.py accepts streetview_image_data_url; builds a
structured user-content array (image then text) when present,
falls back to the legacy text-only string when not. New JSON
field 'visual_corroboration' (2-3 sentence summary of what the
photo confirms/contradicts vs the data) and 'used_streetview_image'
flag for the UI.
- orchestrator.py extracts the data URL from the streetview agent's
result (when streetview.available) and passes it to the risk
agent. No second Google Maps fetch — the same in-memory data URL
the streetview agent already had.
Frontend:
- mapDossier surfaces visual_corroboration + used_streetview_image.
- New 'Gemma 4 multimodal reasoning' callout inside the FEMA-gap
section, sitting above the existing reasoning trace toggle.
Purple-accented to match the trace styling. Subtitle reads
'image + 6 data sources + chain-of-thought · one inference call'
so the composition is explicit to a casual reader.
Version markers:
- chrome wordmark: v0.8 → v0.9
- FastAPI app version: 0.8.0 → 0.9.0
- app/agents/orchestrator.py +9 -0
- app/agents/risk_agent.py +73 -7
- app/main.py +1 -1
- static/index.html +21 -2
|
@@ -168,10 +168,19 @@ async def run_assessment(
|
|
| 168 |
"status": "working",
|
| 169 |
"summary": "Synthesizing risk score with reasoning mode...",
|
| 170 |
})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
try:
|
| 172 |
risk_result = await run_risk_agent(
|
| 173 |
results, geo["lat"], geo["lon"], geo["display_name"],
|
| 174 |
language=language,
|
|
|
|
| 175 |
)
|
| 176 |
results["risk"] = risk_result
|
| 177 |
yield sse("agent_update", {
|
|
|
|
| 168 |
"status": "working",
|
| 169 |
"summary": "Synthesizing risk score with reasoning mode...",
|
| 170 |
})
|
| 171 |
+
# If the streetview agent succeeded, hand its image to the risk
|
| 172 |
+
# analyst so the analyst can do its own visual reasoning instead
|
| 173 |
+
# of just reading another agent's text findings. v0.9 multimodal.
|
| 174 |
+
sv_image_data_url = None
|
| 175 |
+
sv_result = results.get("streetview") or {}
|
| 176 |
+
if sv_result.get("available"):
|
| 177 |
+
sv_image_data_url = sv_result.get("image_data_url")
|
| 178 |
+
|
| 179 |
try:
|
| 180 |
risk_result = await run_risk_agent(
|
| 181 |
results, geo["lat"], geo["lon"], geo["display_name"],
|
| 182 |
language=language,
|
| 183 |
+
streetview_image_data_url=sv_image_data_url,
|
| 184 |
)
|
| 185 |
results["risk"] = risk_result
|
| 186 |
yield sse("agent_update", {
|
|
@@ -1,10 +1,27 @@
|
|
| 1 |
"""Risk-analyst agent — THE Gemma 4 reasoning showcase.
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
import json
|
|
|
|
| 8 |
|
| 9 |
from app.data.languages import prompt_directive
|
| 10 |
from app.llm.client import (
|
|
@@ -22,9 +39,35 @@ async def run_risk_agent(
|
|
| 22 |
lon: float,
|
| 23 |
address: str,
|
| 24 |
language: str = "en",
|
|
|
|
| 25 |
) -> dict:
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
|
|
|
|
|
|
| 28 |
Here is all the data collected by our investigation team:
|
| 29 |
|
| 30 |
## FEMA Expert Findings
|
|
@@ -33,6 +76,9 @@ Here is all the data collected by our investigation team:
|
|
| 33 |
## Local Infrastructure Findings (311 data, sewer type)
|
| 34 |
{json.dumps(all_data.get('local', {}), indent=2, default=str)}
|
| 35 |
|
|
|
|
|
|
|
|
|
|
| 36 |
## Weather & Hydrology Findings
|
| 37 |
{json.dumps(all_data.get('weather', {}), indent=2, default=str)}
|
| 38 |
|
|
@@ -44,7 +90,8 @@ Here is all the data collected by our investigation team:
|
|
| 44 |
|
| 45 |
---
|
| 46 |
|
| 47 |
-
TASK: Synthesize
|
|
|
|
| 48 |
|
| 49 |
IMPORTANT CONTEXT:
|
| 50 |
- A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
|
|
@@ -62,17 +109,34 @@ Return a JSON object with:
|
|
| 62 |
"aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
|
| 63 |
"mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
|
| 64 |
"fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
|
|
|
|
| 65 |
"key_risk_factors": ["<ranked list of top risk factors>"],
|
| 66 |
"mitigating_factors": ["<factors that reduce risk>"],
|
| 67 |
"summary": "<1 sentence for the status feed>"
|
| 68 |
}}
|
| 69 |
|
| 70 |
-
Think step by step.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
response = await call_gemma4(
|
| 73 |
messages=[
|
| 74 |
{"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
|
| 75 |
-
{"role": "user", "content":
|
| 76 |
],
|
| 77 |
reasoning=True,
|
| 78 |
temperature=0.2,
|
|
@@ -85,6 +149,7 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
|
|
| 85 |
parsed = parse_json_response(text)
|
| 86 |
if parsed:
|
| 87 |
parsed["reasoning_trace"] = reasoning
|
|
|
|
| 88 |
return parsed
|
| 89 |
|
| 90 |
return {
|
|
@@ -93,4 +158,5 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
|
|
| 93 |
"summary": "Risk analysis returned non-JSON output; using fallback",
|
| 94 |
"raw_response": text,
|
| 95 |
"reasoning_trace": reasoning,
|
|
|
|
| 96 |
}
|
|
|
|
| 1 |
"""Risk-analyst agent — THE Gemma 4 reasoning showcase.
|
| 2 |
|
| 3 |
+
As of v0.9 this agent is **multimodal**: it receives the Street View
|
| 4 |
+
photograph of the property as an `image_url` content part alongside
|
| 5 |
+
the text data from every other agent, with reasoning mode enabled.
|
| 6 |
+
That means a single Gemma 4 inference pass composes:
|
| 7 |
+
|
| 8 |
+
- Image understanding (the property photo)
|
| 9 |
+
- Reasoning mode (chain-of-thought trace)
|
| 10 |
+
- Interleaved multimodal input (image + text mixed in one prompt)
|
| 11 |
+
- Structured JSON output (the dossier risk schema)
|
| 12 |
+
- Long context (~6-10K text tokens + image tokens)
|
| 13 |
+
|
| 14 |
+
The chain-of-thought trace is preserved on the dossier — the model
|
| 15 |
+
is explicitly asked to weave together what it SEES in the photo with
|
| 16 |
+
what the data SAYS, so the trace reads like a single mind reasoning
|
| 17 |
+
across both modalities, not like one agent summarizing another's
|
| 18 |
+
notes.
|
| 19 |
+
|
| 20 |
+
Per Gemma 4 best practice, the image is placed BEFORE the text in
|
| 21 |
+
the user message content array.
|
| 22 |
"""
|
| 23 |
import json
|
| 24 |
+
from typing import Optional
|
| 25 |
|
| 26 |
from app.data.languages import prompt_directive
|
| 27 |
from app.llm.client import (
|
|
|
|
| 39 |
lon: float,
|
| 40 |
address: str,
|
| 41 |
language: str = "en",
|
| 42 |
+
streetview_image_data_url: Optional[str] = None,
|
| 43 |
) -> dict:
|
| 44 |
+
has_image = bool(streetview_image_data_url)
|
| 45 |
+
|
| 46 |
+
image_section = ""
|
| 47 |
+
if has_image:
|
| 48 |
+
image_section = """
|
| 49 |
+
## Property photograph (Street View)
|
| 50 |
+
A street-level photo of the property is included with this prompt
|
| 51 |
+
(it appears immediately above this text). EXAMINE IT YOURSELF before
|
| 52 |
+
reading the data sections. Look for:
|
| 53 |
+
- Lot elevation relative to street grade (above, level, or below)
|
| 54 |
+
- Basement-level windows, below-grade entries, sunken stairwells
|
| 55 |
+
- Downspout connections (running into ground? into sewer? disconnected?)
|
| 56 |
+
- Visible drainage infrastructure (French drains, catch basins, swales)
|
| 57 |
+
- Ground-floor HVAC equipment, electrical panels, or utilities
|
| 58 |
+
- Evidence of prior water damage (staining, erosion, repair patches)
|
| 59 |
+
- Impervious surface coverage (concrete / asphalt vs. permeable ground)
|
| 60 |
+
- Distance to obvious water features (canals, low-lying parks)
|
| 61 |
+
|
| 62 |
+
You will get the Street View agent's text findings below in the
|
| 63 |
+
'Street View Visual Analysis' section, but rely on YOUR OWN
|
| 64 |
+
inspection of the photo as the primary source. If you see something
|
| 65 |
+
the Street View agent missed, say so. If you disagree with its
|
| 66 |
+
assessment, explain why based on what YOU see.
|
| 67 |
+
"""
|
| 68 |
|
| 69 |
+
text_prompt = f"""You are analyzing flood risk for: {address} ({lat}, {lon})
|
| 70 |
+
{image_section}
|
| 71 |
Here is all the data collected by our investigation team:
|
| 72 |
|
| 73 |
## FEMA Expert Findings
|
|
|
|
| 76 |
## Local Infrastructure Findings (311 data, sewer type)
|
| 77 |
{json.dumps(all_data.get('local', {}), indent=2, default=str)}
|
| 78 |
|
| 79 |
+
## Street View Visual Analysis (from the streetview agent)
|
| 80 |
+
{json.dumps({k: v for k, v in (all_data.get('streetview') or {}).items() if k != 'image_data_url'}, indent=2, default=str)}
|
| 81 |
+
|
| 82 |
## Weather & Hydrology Findings
|
| 83 |
{json.dumps(all_data.get('weather', {}), indent=2, default=str)}
|
| 84 |
|
|
|
|
| 90 |
|
| 91 |
---
|
| 92 |
|
| 93 |
+
TASK: Synthesize ALL of this data — including your own visual
|
| 94 |
+
inspection of the property photo — into a flood risk assessment.
|
| 95 |
|
| 96 |
IMPORTANT CONTEXT:
|
| 97 |
- A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
|
|
|
|
| 109 |
"aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
|
| 110 |
"mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
|
| 111 |
"fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
|
| 112 |
+
"visual_corroboration": {"<2-3 sentences on what the photo confirms, contradicts, or adds beyond the data; '' if no image was provided>" if has_image else "''"},
|
| 113 |
"key_risk_factors": ["<ranked list of top risk factors>"],
|
| 114 |
"mitigating_factors": ["<factors that reduce risk>"],
|
| 115 |
"summary": "<1 sentence for the status feed>"
|
| 116 |
}}
|
| 117 |
|
| 118 |
+
Think step by step. Integrate visual and data evidence. Reference the
|
| 119 |
+
photo directly in your reasoning ("I can see ...", "The image shows ...")
|
| 120 |
+
when relevant. Return ONLY the JSON object at the end."""
|
| 121 |
+
|
| 122 |
+
# Build the user message content. Per Gemma 4 best practice,
|
| 123 |
+
# image content parts go BEFORE the text part.
|
| 124 |
+
user_content: list = []
|
| 125 |
+
if has_image:
|
| 126 |
+
user_content.append({
|
| 127 |
+
"type": "image_url",
|
| 128 |
+
"image_url": {"url": streetview_image_data_url},
|
| 129 |
+
})
|
| 130 |
+
user_content.append({"type": "text", "text": text_prompt})
|
| 131 |
+
|
| 132 |
+
# Some providers prefer a plain string for text-only requests; only
|
| 133 |
+
# send the structured content list when we actually have an image.
|
| 134 |
+
user_message_content = user_content if has_image else text_prompt
|
| 135 |
|
| 136 |
response = await call_gemma4(
|
| 137 |
messages=[
|
| 138 |
{"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
|
| 139 |
+
{"role": "user", "content": user_message_content},
|
| 140 |
],
|
| 141 |
reasoning=True,
|
| 142 |
temperature=0.2,
|
|
|
|
| 149 |
parsed = parse_json_response(text)
|
| 150 |
if parsed:
|
| 151 |
parsed["reasoning_trace"] = reasoning
|
| 152 |
+
parsed["used_streetview_image"] = has_image
|
| 153 |
return parsed
|
| 154 |
|
| 155 |
return {
|
|
|
|
| 158 |
"summary": "Risk analysis returned non-JSON output; using fallback",
|
| 159 |
"raw_response": text,
|
| 160 |
"reasoning_trace": reasoning,
|
| 161 |
+
"used_streetview_image": has_image,
|
| 162 |
}
|
|
@@ -8,7 +8,7 @@ from fastapi.staticfiles import StaticFiles
|
|
| 8 |
from app.api.assess import router as assess_router
|
| 9 |
from app.api.health import router as health_router
|
| 10 |
|
| 11 |
-
app = FastAPI(title="FlutIQ", version="0.
|
| 12 |
|
| 13 |
# CORS still permissive for split-deployment scenarios. With the
|
| 14 |
# bundled deploy (frontend served from FastAPI) it's a no-op because
|
|
|
|
| 8 |
from app.api.assess import router as assess_router
|
| 9 |
from app.api.health import router as health_router
|
| 10 |
|
| 11 |
+
app = FastAPI(title="FlutIQ", version="0.9.0")
|
| 12 |
|
| 13 |
# CORS still permissive for split-deployment scenarios. With the
|
| 14 |
# bundled deploy (frontend served from FastAPI) it's a no-op because
|
|
@@ -1110,6 +1110,8 @@ const mapDossier = (raw) => {
|
|
| 1110 |
risk_factors: risk.key_risk_factors || [],
|
| 1111 |
mitigating_factors: risk.mitigating_factors || [],
|
| 1112 |
reasoning_trace: risk.reasoning_trace || "",
|
|
|
|
|
|
|
| 1113 |
advisor_tldr,
|
| 1114 |
streetview: raw.streetview || {},
|
| 1115 |
};
|
|
@@ -1380,8 +1382,25 @@ const DossierScreen = ({ onBack, dossier }) => {
|
|
| 1380 |
)}
|
| 1381 |
</div>
|
| 1382 |
)}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1383 |
{D.reasoning_trace && (
|
| 1384 |
-
<div style={{marginTop:
|
| 1385 |
<div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
|
| 1386 |
<div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
|
| 1387 |
<span>◆ Gemma 4 reasoning trace</span>
|
|
@@ -1445,7 +1464,7 @@ const Chrome = ({ screen, onJump, dark, onToggleDark, language, onLanguageChange
|
|
| 1445 |
<div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
|
| 1446 |
<span className="glyph">F</span>
|
| 1447 |
<span>FlutIQ</span>
|
| 1448 |
-
<span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.
|
| 1449 |
</div>
|
| 1450 |
<div className="chrome-meta">
|
| 1451 |
<span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>
|
|
|
|
| 1110 |
risk_factors: risk.key_risk_factors || [],
|
| 1111 |
mitigating_factors: risk.mitigating_factors || [],
|
| 1112 |
reasoning_trace: risk.reasoning_trace || "",
|
| 1113 |
+
visual_corroboration: risk.visual_corroboration || "",
|
| 1114 |
+
used_streetview_image: !!risk.used_streetview_image,
|
| 1115 |
advisor_tldr,
|
| 1116 |
streetview: raw.streetview || {},
|
| 1117 |
};
|
|
|
|
| 1382 |
)}
|
| 1383 |
</div>
|
| 1384 |
)}
|
| 1385 |
+
{(D.visual_corroboration || D.used_streetview_image) && (
|
| 1386 |
+
<div style={{marginTop: 18, padding: "14px 16px", background: "linear-gradient(135deg, rgba(110,95,216,0.08), rgba(43,111,212,0.06))", border: "1px solid rgba(110,95,216,0.25)", borderRadius: 10}}>
|
| 1387 |
+
<div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 11, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.06em", textTransform: "uppercase", color: "var(--purple)", marginBottom: 8}}>
|
| 1388 |
+
<span>◆ Gemma 4 multimodal reasoning</span>
|
| 1389 |
+
<span style={{color: "var(--ink-4)", fontWeight: 400, textTransform: "none", letterSpacing: "0.02em"}}>· image + 6 data sources + chain-of-thought · one inference call</span>
|
| 1390 |
+
</div>
|
| 1391 |
+
{D.visual_corroboration ? (
|
| 1392 |
+
<p style={{margin: 0, fontSize: 14, lineHeight: 1.55, color: "var(--ink)"}}>
|
| 1393 |
+
<em>{D.visual_corroboration}</em>
|
| 1394 |
+
</p>
|
| 1395 |
+
) : (
|
| 1396 |
+
<p style={{margin: 0, fontSize: 13, color: "var(--ink-3)"}}>
|
| 1397 |
+
The risk analyst received the Street View photograph and reasoned about it directly alongside the data — see the trace below.
|
| 1398 |
+
</p>
|
| 1399 |
+
)}
|
| 1400 |
+
</div>
|
| 1401 |
+
)}
|
| 1402 |
{D.reasoning_trace && (
|
| 1403 |
+
<div style={{marginTop: 12, padding: "12px 14px", background: "rgba(110,95,216,0.06)", border: "1px solid rgba(110,95,216,0.2)", borderRadius: 10}}>
|
| 1404 |
<div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
|
| 1405 |
<div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
|
| 1406 |
<span>◆ Gemma 4 reasoning trace</span>
|
|
|
|
| 1464 |
<div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
|
| 1465 |
<span className="glyph">F</span>
|
| 1466 |
<span>FlutIQ</span>
|
| 1467 |
+
<span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.9 · beta</span>
|
| 1468 |
</div>
|
| 1469 |
<div className="chrome-meta">
|
| 1470 |
<span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>
|