Spaces:

kredd25
/

FlutIQ

Sleeping

kredd25 commited on May 3

Commit

86c05e4

1 Parent(s): bee9a1c

v0.9: multimodal risk analyst — vision + reasoning + 6 data sources in one Gemma 4 call

The risk analyst now receives the Street View photo as an image_url
content part alongside the text data from every other agent, with
reasoning mode enabled. Single Gemma 4 inference pass composes:
vision + reasoning + interleaved multimodal + structured JSON +
long context.

Per Gemma 4 best practice the image goes BEFORE the text in the
user message content array.

Concretely, the chain-of-thought trace now reads like a single mind
reasoning across both modalities, e.g. for 4521 S Drexel Blvd:

'I can see basement-level windows (partially obscured by the
fence). This is a critical vulnerability for surface water
ingress. The driveway is paved (impervious) and appears to slope
slightly toward the building/parking area, which could channel
water toward the foundation.'

…then cross-references with the FEMA zone, 311 count, weather
forecast, and AEP math. The model is examining the patient, not
reading the nurse's notes.

Backend:
- risk_agent.py accepts streetview_image_data_url; builds a
structured user-content array (image then text) when present,
falls back to the legacy text-only string when not. New JSON
field 'visual_corroboration' (2-3 sentence summary of what the
photo confirms/contradicts vs the data) and 'used_streetview_image'
flag for the UI.
- orchestrator.py extracts the data URL from the streetview agent's
result (when streetview.available) and passes it to the risk
agent. No second Google Maps fetch — the same in-memory data URL
the streetview agent already had.

Frontend:
- mapDossier surfaces visual_corroboration + used_streetview_image.
- New 'Gemma 4 multimodal reasoning' callout inside the FEMA-gap
section, sitting above the existing reasoning trace toggle.
Purple-accented to match the trace styling. Subtitle reads
'image + 6 data sources + chain-of-thought · one inference call'
so the composition is explicit to a casual reader.

Version markers:
- chrome wordmark: v0.8 → v0.9
- FastAPI app version: 0.8.0 → 0.9.0

Files changed (4) hide show

app/agents/orchestrator.py +9 -0
app/agents/risk_agent.py +73 -7
app/main.py +1 -1
static/index.html +21 -2

app/agents/orchestrator.py CHANGED Viewed

@@ -168,10 +168,19 @@ async def run_assessment(
         "status": "working",
         "summary": "Synthesizing risk score with reasoning mode...",
     })
     try:
         risk_result = await run_risk_agent(
             results, geo["lat"], geo["lon"], geo["display_name"],
             language=language,
         )
         results["risk"] = risk_result
         yield sse("agent_update", {

         "status": "working",
         "summary": "Synthesizing risk score with reasoning mode...",
     })
+    # If the streetview agent succeeded, hand its image to the risk
+    # analyst so the analyst can do its own visual reasoning instead
+    # of just reading another agent's text findings. v0.9 multimodal.
+    sv_image_data_url = None
+    sv_result = results.get("streetview") or {}
+    if sv_result.get("available"):
+        sv_image_data_url = sv_result.get("image_data_url")
     try:
         risk_result = await run_risk_agent(
             results, geo["lat"], geo["lon"], geo["display_name"],
             language=language,
+            streetview_image_data_url=sv_image_data_url,
         )
         results["risk"] = risk_result
         yield sse("agent_update", {

app/agents/risk_agent.py CHANGED Viewed

@@ -1,10 +1,27 @@
 """Risk-analyst agent — THE Gemma 4 reasoning showcase.
-Synthesizes every data agent's output into a single risk score using
-Gemma 4 with reasoning mode enabled. The reasoning trace itself is
-preserved on the dossier for the writeup/demo.
 """
 import json
 from app.data.languages import prompt_directive
 from app.llm.client import (
@@ -22,9 +39,35 @@ async def run_risk_agent(
     lon: float,
     address: str,
     language: str = "en",
 ) -> dict:
-    user_prompt = f"""You are analyzing flood risk for: {address} ({lat}, {lon})
 Here is all the data collected by our investigation team:
 ## FEMA Expert Findings
@@ -33,6 +76,9 @@ Here is all the data collected by our investigation team:
 ## Local Infrastructure Findings (311 data, sewer type)
 {json.dumps(all_data.get('local', {}), indent=2, default=str)}
 ## Weather & Hydrology Findings
 {json.dumps(all_data.get('weather', {}), indent=2, default=str)}
@@ -44,7 +90,8 @@ Here is all the data collected by our investigation team:
 ---
-TASK: Synthesize all of this data into a flood risk assessment.
 IMPORTANT CONTEXT:
 - A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
@@ -62,17 +109,34 @@ Return a JSON object with:
   "aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
   "mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
   "fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
   "key_risk_factors": ["<ranked list of top risk factors>"],
   "mitigating_factors": ["<factors that reduce risk>"],
   "summary": "<1 sentence for the status feed>"
 }}
-Think step by step. Show your reasoning. Return ONLY the JSON object at the end."""
     response = await call_gemma4(
         messages=[
             {"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
-            {"role": "user", "content": user_prompt},
         ],
         reasoning=True,
         temperature=0.2,
@@ -85,6 +149,7 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
     parsed = parse_json_response(text)
     if parsed:
         parsed["reasoning_trace"] = reasoning
         return parsed
     return {
@@ -93,4 +158,5 @@ Think step by step. Show your reasoning. Return ONLY the JSON object at the end.
         "summary": "Risk analysis returned non-JSON output; using fallback",
         "raw_response": text,
         "reasoning_trace": reasoning,
     }

 """Risk-analyst agent — THE Gemma 4 reasoning showcase.
+As of v0.9 this agent is **multimodal**: it receives the Street View
+photograph of the property as an `image_url` content part alongside
+the text data from every other agent, with reasoning mode enabled.
+That means a single Gemma 4 inference pass composes:
+  - Image understanding (the property photo)
+  - Reasoning mode (chain-of-thought trace)
+  - Interleaved multimodal input (image + text mixed in one prompt)
+  - Structured JSON output (the dossier risk schema)
+  - Long context (~6-10K text tokens + image tokens)
+The chain-of-thought trace is preserved on the dossier — the model
+is explicitly asked to weave together what it SEES in the photo with
+what the data SAYS, so the trace reads like a single mind reasoning
+across both modalities, not like one agent summarizing another's
+notes.
+Per Gemma 4 best practice, the image is placed BEFORE the text in
+the user message content array.
 """
 import json
+from typing import Optional
 from app.data.languages import prompt_directive
 from app.llm.client import (
     lon: float,
     address: str,
     language: str = "en",
+    streetview_image_data_url: Optional[str] = None,
 ) -> dict:
+    has_image = bool(streetview_image_data_url)
+    image_section = ""
+    if has_image:
+        image_section = """
+## Property photograph (Street View)
+A street-level photo of the property is included with this prompt
+(it appears immediately above this text). EXAMINE IT YOURSELF before
+reading the data sections. Look for:
+- Lot elevation relative to street grade (above, level, or below)
+- Basement-level windows, below-grade entries, sunken stairwells
+- Downspout connections (running into ground? into sewer? disconnected?)
+- Visible drainage infrastructure (French drains, catch basins, swales)
+- Ground-floor HVAC equipment, electrical panels, or utilities
+- Evidence of prior water damage (staining, erosion, repair patches)
+- Impervious surface coverage (concrete / asphalt vs. permeable ground)
+- Distance to obvious water features (canals, low-lying parks)
+You will get the Street View agent's text findings below in the
+'Street View Visual Analysis' section, but rely on YOUR OWN
+inspection of the photo as the primary source. If you see something
+the Street View agent missed, say so. If you disagree with its
+assessment, explain why based on what YOU see.
+"""
+    text_prompt = f"""You are analyzing flood risk for: {address} ({lat}, {lon})
+{image_section}
 Here is all the data collected by our investigation team:
 ## FEMA Expert Findings
 ## Local Infrastructure Findings (311 data, sewer type)
 {json.dumps(all_data.get('local', {}), indent=2, default=str)}
+## Street View Visual Analysis (from the streetview agent)
+{json.dumps({k: v for k, v in (all_data.get('streetview') or {}).items() if k != 'image_data_url'}, indent=2, default=str)}
 ## Weather & Hydrology Findings
 {json.dumps(all_data.get('weather', {}), indent=2, default=str)}
 ---
+TASK: Synthesize ALL of this data — including your own visual
+inspection of the property photo — into a flood risk assessment.
 IMPORTANT CONTEXT:
 - A "100-year flood" means 1% annual exceedance probability (AEP), NOT once per century
   "aep_estimate": <estimated annual exceedance probability as decimal, e.g. 0.04>,
   "mortgage_30yr_probability": <cumulative probability over 30 years, e.g. 0.68>,
   "fema_gap_explanation": "<2-3 sentences explaining if/why FEMA designation is misleading>",
+  "visual_corroboration": {"<2-3 sentences on what the photo confirms, contradicts, or adds beyond the data; '' if no image was provided>" if has_image else "''"},
   "key_risk_factors": ["<ranked list of top risk factors>"],
   "mitigating_factors": ["<factors that reduce risk>"],
   "summary": "<1 sentence for the status feed>"
 }}
+Think step by step. Integrate visual and data evidence. Reference the
+photo directly in your reasoning ("I can see ...", "The image shows ...")
+when relevant. Return ONLY the JSON object at the end."""
+    # Build the user message content. Per Gemma 4 best practice,
+    # image content parts go BEFORE the text part.
+    user_content: list = []
+    if has_image:
+        user_content.append({
+            "type": "image_url",
+            "image_url": {"url": streetview_image_data_url},
+        })
+    user_content.append({"type": "text", "text": text_prompt})
+    # Some providers prefer a plain string for text-only requests; only
+    # send the structured content list when we actually have an image.
+    user_message_content = user_content if has_image else text_prompt
     response = await call_gemma4(
         messages=[
             {"role": "system", "content": RISK_AGENT_SYSTEM_PROMPT + prompt_directive(language)},
+            {"role": "user", "content": user_message_content},
         ],
         reasoning=True,
         temperature=0.2,
     parsed = parse_json_response(text)
     if parsed:
         parsed["reasoning_trace"] = reasoning
+        parsed["used_streetview_image"] = has_image
         return parsed
     return {
         "summary": "Risk analysis returned non-JSON output; using fallback",
         "raw_response": text,
         "reasoning_trace": reasoning,
+        "used_streetview_image": has_image,
     }

app/main.py CHANGED Viewed

@@ -8,7 +8,7 @@ from fastapi.staticfiles import StaticFiles
 from app.api.assess import router as assess_router
 from app.api.health import router as health_router
-app = FastAPI(title="FlutIQ", version="0.8.0")
 # CORS still permissive for split-deployment scenarios. With the
 # bundled deploy (frontend served from FastAPI) it's a no-op because

 from app.api.assess import router as assess_router
 from app.api.health import router as health_router
+app = FastAPI(title="FlutIQ", version="0.9.0")
 # CORS still permissive for split-deployment scenarios. With the
 # bundled deploy (frontend served from FastAPI) it's a no-op because

static/index.html CHANGED Viewed

@@ -1110,6 +1110,8 @@ const mapDossier = (raw) => {
     risk_factors: risk.key_risk_factors || [],
     mitigating_factors: risk.mitigating_factors || [],
     reasoning_trace: risk.reasoning_trace || "",
     advisor_tldr,
     streetview: raw.streetview || {},
   };
@@ -1380,8 +1382,25 @@ const DossierScreen = ({ onBack, dossier }) => {
               )}
             </div>
           )}
           {D.reasoning_trace && (
-            <div style={{marginTop: 18, padding: "12px 14px", background: "rgba(110,95,216,0.06)", border: "1px solid rgba(110,95,216,0.2)", borderRadius: 10}}>
               <div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
                 <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
                   <span>◆ Gemma 4 reasoning trace</span>
@@ -1445,7 +1464,7 @@ const Chrome = ({ screen, onJump, dark, onToggleDark, language, onLanguageChange
     <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
       <span className="glyph">F</span>
       <span>FlutIQ</span>
-      <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.8 · beta</span>
     </div>
     <div className="chrome-meta">
       <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>

     risk_factors: risk.key_risk_factors || [],
     mitigating_factors: risk.mitigating_factors || [],
     reasoning_trace: risk.reasoning_trace || "",
+    visual_corroboration: risk.visual_corroboration || "",
+    used_streetview_image: !!risk.used_streetview_image,
     advisor_tldr,
     streetview: raw.streetview || {},
   };
               )}
             </div>
           )}
+          {(D.visual_corroboration || D.used_streetview_image) && (
+            <div style={{marginTop: 18, padding: "14px 16px", background: "linear-gradient(135deg, rgba(110,95,216,0.08), rgba(43,111,212,0.06))", border: "1px solid rgba(110,95,216,0.25)", borderRadius: 10}}>
+              <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 11, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.06em", textTransform: "uppercase", color: "var(--purple)", marginBottom: 8}}>
+                <span>◆ Gemma 4 multimodal reasoning</span>
+                <span style={{color: "var(--ink-4)", fontWeight: 400, textTransform: "none", letterSpacing: "0.02em"}}>· image + 6 data sources + chain-of-thought · one inference call</span>
+              </div>
+              {D.visual_corroboration ? (
+                <p style={{margin: 0, fontSize: 14, lineHeight: 1.55, color: "var(--ink)"}}>
+                  <em>{D.visual_corroboration}</em>
+                </p>
+              ) : (
+                <p style={{margin: 0, fontSize: 13, color: "var(--ink-3)"}}>
+                  The risk analyst received the Street View photograph and reasoned about it directly alongside the data — see the trace below.
+                </p>
+              )}
+            </div>
+          )}
           {D.reasoning_trace && (
+            <div style={{marginTop: 12, padding: "12px 14px", background: "rgba(110,95,216,0.06)", border: "1px solid rgba(110,95,216,0.2)", borderRadius: 10}}>
               <div style={{display:"flex", alignItems:"center", justifyContent:"space-between", cursor:"pointer"}} onClick={() => setShowReasoning(s => !s)}>
                 <div style={{display:"flex", alignItems:"center", gap: 8, fontSize: 12, fontFamily: "'JetBrains Mono', monospace", letterSpacing: "0.04em", textTransform: "uppercase", color: "var(--purple)"}}>
                   <span>◆ Gemma 4 reasoning trace</span>
     <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
       <span className="glyph">F</span>
       <span>FlutIQ</span>
+      <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.9 · beta</span>
     </div>
     <div className="chrome-meta">
       <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>