kredd25 commited on
Commit
719cc9b
·
1 Parent(s): 82900b3

v0.13: Tier-1 multi-city expansion — Chicago + NYC + SF + LA + Austin

Browse files

Refactors the Chicago-only local_agent into a city-aware system driven
by a registry. Each registry entry encodes the city's Socrata dataset
IDs, field names, flood-related category codes, and a context blurb
that gets injected into the local-agent prompt so Gemma 4 reasons
about each city's actual sewer system (combined vs separated, dominant
flood mode, local hazard pattern).

Honest per-city availability — what each city's dossier actually shows:

Chicago: full — 311 (WIB/SFL) + permits with reported_cost
(e.g. $80M new medical office at 4822 S Cottage Grove)
NYC: 311 — 391 sewer-flood reports per 350 5th Ave query.
Permits intentionally deferred: DOB datasets are
fragmented (legacy is stale, modern lacks dates,
active is filtered to currently-open jobs).
SF: full — 941 sewer reports + 75 permits at $366M (Mission
test point: '595 basement reports, increasing
development pressure, concern: high')
LA: permits-only — 142 permits, $320M, top $99M downtown.
LA's separated storm-sanitary system means basement
sewer-backup is rare; 311 doesn't categorize floods
cleanly enough so it's deferred.
Austin: 311 + permits-count — 91 flood-coded 311 reports + 500
permits but no cost field in the Socrata dataset.
The dossier honestly says count-only.

Atlanta + every other city: graceful 'Tier 1 cities are X' message.

Verified end-to-end via scripts/smoke_test_cities.py and full pipeline
runs. Per-city context blurbs visibly steer Gemma's reasoning:
- NYC output references 'combined-sewer overflow (CSO) failures in
Manhattan' specifically
- SF output references 'fully combined sewer system'
- LA blurb (separated system, flash-flood mode) calibrates the model
away from sewer-backup framing

Backend changes:
+ backend/app/data/cities.py — registry of 5 cities with verified
dataset IDs, field names, and category values. Match function takes
geocoded city/state and returns the right config (or None for graceful
degrade).
~ backend/app/tools/chicago_311.py — refactored to take a config dict.
Generic across cities. Uses within_circle() when the dataset has a
Socrata Point column, falls back to bbox-with-:: number cast for
cities (NYC, LA) that store lat/lon as text.
~ backend/app/tools/building_permits.py — same refactor. Also handles
cities (SF) where the cost field itself is text-stored, casts via
PostgreSQL :: number syntax (to_number() doesn't exist on these
Socrata flavors).
~ backend/app/agents/local_agent.py — uses find_city() instead of
hardcoded 'is_chicago' check. Runs whatever signals the city has
(could be both, 311-only, permits-only). Prompt explicitly tells
Gemma which signals are available + what's missing so it doesn't
fabricate narrative for absent data. New flags surfaced on the
result dict: city_supported, city_id, has_311, has_permits,
has_permit_cost — frontend uses these for honest UI degradation.
+ backend/scripts/smoke_test_cities.py — pre-deploy regression check
for all 5 supported cities + 1 graceful-degrade case.

Bugs caught and fixed during the refactor:
- to_number() — doesn't exist on Socrata SoQL. Use PostgreSQL :: cast.
- Some cities store BOTH lat/lon AND cost as text strings — needs the
same :: cast trick on the cost filter.
- Austin 311 column was sr_location_long_text in the spec; real name
is sr_location with a separate sr_location_lat_long Point col.
- SF permits has a real Socrata Point 'location' column we can use
for within_circle() instead of bbox math.
- NYC dates are MM/DD/YYYY strings in the legacy dataset — string
compare with ISO format excludes everything.

Future work (deferred): Boston (CKAN, not Socrata), Dallas (no working
311 dataset ID), Houston/Miami (different platforms). Each requires
custom integration; Tier-1 5 cities is the realistic hackathon scope.

Version: chrome wordmark v0.12 → v0.13, app version 0.12.0 → 0.13.0.

app/agents/local_agent.py CHANGED
@@ -1,25 +1,23 @@
1
- """Local infrastructure / 311 + building-permits agent.
2
-
3
- Currently Chicago-only. Pulls TWO Socrata signals in parallel and asks
4
- Gemma 4 to interpret them together:
5
-
6
- 1. 311 flood reports (WIB / SFL) within 500m / 5y — the historical
7
- symptom: where flooding has actually been reported.
8
- 2. Building permits (new construction + major renovations >$100K)
9
- within 1km / 3y — the leading indicator: where the impervious
10
- surface is growing and the combined sewer load is increasing.
11
-
12
- These signals compound. A property in a stable block with 0 311
13
- reports is one risk profile. A property in a stable block with 0
14
- 311 reports BUT $80M of new construction permitted nearby is another
15
- the model can read the trajectory, not just the snapshot.
16
-
17
- For non-Chicago cities, returns a graceful "no local dataset wired"
18
- message so the rest of the dossier still completes.
19
  """
20
  import asyncio
21
  import json
22
 
 
23
  from app.llm.client import call_gemma4, extract_text, parse_json_response
24
  from app.llm.prompts import LOCAL_AGENT_SYSTEM_PROMPT
25
  from app.tools.building_permits import get_nearby_construction
@@ -32,43 +30,52 @@ async def run_local_agent(
32
  city: str,
33
  state: str,
34
  ) -> dict:
35
- is_chicago = (
36
- city.lower() == "chicago" or "chicago" in city.lower()
37
- ) and state.lower() in ("illinois", "il")
38
 
39
- if not is_chicago:
40
  return {
41
  "city": city,
42
  "data_available": False,
43
  "summary": (
44
- f"No local-311 dataset wired for {city or 'this location'}; "
45
- "Chicago is the demo city for the 311 + permits signals."
46
  ),
47
  "basement_flooding_reports": 0,
48
  "street_flooding_reports": 0,
49
  "construction": {"permits_found": False},
50
  }
51
 
52
- # Two Socrata calls in parallel — same platform, different datasets.
53
- reports_task = asyncio.create_task(
54
- get_flood_reports(lat, lon, radius_m=500, years=5)
55
- )
56
- construction_task = asyncio.create_task(
57
- get_nearby_construction(lat, lon, radius_m=1000, years=3, min_cost=100_000)
58
- )
59
- reports, construction = await asyncio.gather(
60
- reports_task, construction_task, return_exceptions=False
61
- )
62
-
63
- # Sanitize construction for the prompt — drop the verbose major-projects
64
- # descriptions to keep the prompt tight; we'll keep the full data on
65
- # the agent's return for the dossier UI.
 
 
 
 
 
 
 
 
 
 
 
 
66
  construction_for_prompt = {
67
- k: v for k, v in (construction or {}).items()
68
- if k not in ("major_projects",)
69
  }
70
  construction_for_prompt["major_projects_count"] = len(
71
- (construction or {}).get("major_projects") or []
72
  )
73
  construction_for_prompt["major_projects_top3"] = [
74
  {
@@ -78,52 +85,54 @@ async def run_local_agent(
78
  "description": (p.get("description") or "")[:120],
79
  "address": p.get("address"),
80
  }
81
- for p in ((construction or {}).get("major_projects") or [])[:3]
82
  ]
83
 
84
- user_prompt = f"""Interpret these TWO Chicago Socrata signals together for the property at ({lat}, {lon}).
 
 
 
 
 
 
 
 
 
85
 
86
- ## 311 flood reports (the historical symptom)
87
- {json.dumps(reports, indent=2, default=str)}
88
 
89
- Context for 311:
90
- - "WIB" = Water in Basement, "SFL" = Street Flooding
91
- - Search radius: {reports.get('radius_m')} m, time window: {reports.get('years')} years
92
- - Chicago combined sewer overflows after ~0.67 in/hr of rain
93
 
94
- ## Building permits (the leading indicator of impervious-surface change)
95
- {json.dumps(construction_for_prompt, indent=2, default=str)}
 
96
 
97
- Context for permits:
98
- - New construction + major renovations within 1km, last 3 years, > $100K
99
- - Each new structure replaces absorbent ground with roof + foundation
100
- - Each new parking lot replaces vegetation with asphalt
101
- - The combined sewer system serving these blocks does NOT get upgraded
102
- when density increases — so each new permit shifts more stormwater
103
- load onto the same shared pipes
104
- - Trend "increasing" = neighborhood is actively densifying = future
105
- flood risk is rising even if current 311 signal is clean
106
 
107
- Return a JSON object combining both signals with these fields:
108
  {{
109
  "basement_flooding_reports": int,
110
  "street_flooding_reports": int,
111
  "total_reports": int,
112
- "density_assessment": "low" | "moderate" | "high",
113
- "pattern_notes": "1-2 sentences on the 311 recency / clustering",
114
  "construction": {{
115
  "permits_count": int,
116
  "new_construction_count": int,
117
- "total_cost": number,
118
  "trend_direction": "increasing" | "stable" | "decreasing",
119
- "interpretation": "1-2 sentences on what the development pressure means for THIS property's future flood risk explicitly connect to impervious-surface change and the combined-sewer load",
120
  "concern_level": "low" | "moderate" | "high"
121
  }},
122
- "compound_signal": "1-2 sentences on how the 311 history and the permit trajectory interact e.g. 'clean 311 today + heavy densification = rising risk' or 'high 311 + active densification = compounding risk'",
123
- "summary": "1 sentence for the status feed (covers BOTH signals)"
124
  }}
125
 
126
- Return ONLY the JSON object, no other text."""
127
 
128
  response = await call_gemma4(
129
  messages=[
@@ -137,15 +146,25 @@ Return ONLY the JSON object, no other text."""
137
  text = extract_text(response)
138
  parsed = parse_json_response(text)
139
  if parsed:
140
- parsed["raw_311"] = reports
141
- parsed["raw_construction"] = construction
142
  parsed["data_available"] = True
 
 
 
 
 
143
  return parsed
144
 
145
- # Fallback: still return both raw signals so the UI + risk analyst
146
- # can see them even if Gemma's interpretation failed to parse.
147
  return {
148
  "data_available": True,
 
 
 
 
 
149
  "basement_flooding_reports": reports.get("basement_flooding", 0),
150
  "street_flooding_reports": reports.get("street_flooding", 0),
151
  "total_reports": reports.get("total_reports", 0),
@@ -153,8 +172,8 @@ Return ONLY the JSON object, no other text."""
153
  "summary": (
154
  f"{reports.get('basement_flooding', 0)} basement + "
155
  f"{reports.get('street_flooding', 0)} street flood reports · "
156
- f"{construction_for_prompt.get('total_permits', 0)} construction "
157
- f"permits within 1km / 3y"
158
  ),
159
  "raw_311": reports,
160
  "raw_construction": construction,
 
1
+ """Local infrastructure agent 311 + building permits, city-aware.
2
+
3
+ Now driven by the registry in app.data.cities. For any supported city
4
+ we run the two Socrata signals in parallel and ask Gemma 4 to interpret
5
+ them together, with city-specific context (combined-vs-separated
6
+ sewers, dominant flood mode, local hazard pattern) injected into the
7
+ prompt so the reasoning is calibrated to the right system.
8
+
9
+ Honest about per-city data quality:
10
+ - Cities with both signals (Chicago, NYC, SF, Austin) get the full
11
+ "compound signal" narrative.
12
+ - Cities with only one signal (LA: permits-only) get a partial
13
+ narrative that explicitly says what's missing.
14
+ - Cities not in the registry get a clean "not supported here"
15
+ summary so the rest of the dossier still completes.
 
 
 
16
  """
17
  import asyncio
18
  import json
19
 
20
+ from app.data.cities import find_city
21
  from app.llm.client import call_gemma4, extract_text, parse_json_response
22
  from app.llm.prompts import LOCAL_AGENT_SYSTEM_PROMPT
23
  from app.tools.building_permits import get_nearby_construction
 
30
  city: str,
31
  state: str,
32
  ) -> dict:
33
+ cfg = find_city(city, state)
 
 
34
 
35
+ if cfg is None:
36
  return {
37
  "city": city,
38
  "data_available": False,
39
  "summary": (
40
+ f"No local-data integration wired for {city or 'this location'}; "
41
+ "Tier 1 cities are Chicago, NYC, San Francisco, Los Angeles, Austin."
42
  ),
43
  "basement_flooding_reports": 0,
44
  "street_flooding_reports": 0,
45
  "construction": {"permits_found": False},
46
  }
47
 
48
+ has_311 = cfg.get("311") is not None
49
+ has_permits = cfg.get("permits") is not None
50
+
51
+ # Run whichever signals this city actually has, in parallel.
52
+ tasks = []
53
+ if has_311:
54
+ tasks.append(("reports", asyncio.create_task(
55
+ get_flood_reports(cfg, lat, lon, radius_m=500, years=5)
56
+ )))
57
+ if has_permits:
58
+ tasks.append(("construction", asyncio.create_task(
59
+ get_nearby_construction(cfg, lat, lon, radius_m=1000, years=3, min_cost=100_000)
60
+ )))
61
+ results: dict = {}
62
+ for name, t in tasks:
63
+ try:
64
+ results[name] = await t
65
+ except Exception as e:
66
+ results[name] = {"error": f"{type(e).__name__}: {str(e)[:100]}"}
67
+
68
+ reports = results.get("reports") or {}
69
+ construction = results.get("construction") or {}
70
+
71
+ # Trim construction for the prompt — don't dump 5 long descriptions
72
+ # into the system; we keep the full data on the agent's return for
73
+ # the dossier UI to render.
74
  construction_for_prompt = {
75
+ k: v for k, v in construction.items() if k not in ("major_projects",)
 
76
  }
77
  construction_for_prompt["major_projects_count"] = len(
78
+ construction.get("major_projects") or []
79
  )
80
  construction_for_prompt["major_projects_top3"] = [
81
  {
 
85
  "description": (p.get("description") or "")[:120],
86
  "address": p.get("address"),
87
  }
88
+ for p in (construction.get("major_projects") or [])[:3]
89
  ]
90
 
91
+ has_cost = construction.get("has_cost", False)
92
+ cost_note = (
93
+ ""
94
+ if has_cost
95
+ else (
96
+ "NOTE: this city's permits dataset does NOT expose project cost. "
97
+ "Reason about permit COUNT, project type, and trend direction — "
98
+ "do NOT report dollar amounts you don't actually have."
99
+ )
100
+ )
101
 
102
+ user_prompt = f"""You are interpreting LOCAL infrastructure signals for a property in {cfg['name']} ({lat}, {lon}).
 
103
 
104
+ City context (use this to calibrate your reasoning):
105
+ {cfg['context_blurb']}
 
 
106
 
107
+ ## 311 service-request signal historical symptom
108
+ {f"Available for {cfg['name']}" if has_311 else "NOT AVAILABLE for this city — skip 311 reasoning."}
109
+ {json.dumps(reports, indent=2, default=str) if has_311 else ""}
110
 
111
+ ## Building permits signal — leading indicator of impervious-surface change
112
+ {f"Available for {cfg['name']}" if has_permits else "NOT AVAILABLE for this city skip permit reasoning."}
113
+ {json.dumps(construction_for_prompt, indent=2, default=str) if has_permits else ""}
114
+ {cost_note}
 
 
 
 
 
115
 
116
+ Return a JSON object combining whatever signals were available with these fields:
117
  {{
118
  "basement_flooding_reports": int,
119
  "street_flooding_reports": int,
120
  "total_reports": int,
121
+ "density_assessment": "low" | "moderate" | "high" | "n/a",
122
+ "pattern_notes": "1-2 sentences on the 311 pattern, or 'No 311 signal available for this city.'",
123
  "construction": {{
124
  "permits_count": int,
125
  "new_construction_count": int,
126
+ "total_cost": number_or_null,
127
  "trend_direction": "increasing" | "stable" | "decreasing",
128
+ "interpretation": "1-2 sentences on what the development pressure means for THIS property's future flood risk in THIS city's drainage system",
129
  "concern_level": "low" | "moderate" | "high"
130
  }},
131
+ "compound_signal": "1-2 sentences on how the available signals interact for {cfg['name']} specifically. If only one signal is available, say so explicitly.",
132
+ "summary": "1 sentence for the status feed"
133
  }}
134
 
135
+ Return ONLY the JSON object."""
136
 
137
  response = await call_gemma4(
138
  messages=[
 
146
  text = extract_text(response)
147
  parsed = parse_json_response(text)
148
  if parsed:
149
+ parsed["raw_311"] = reports if has_311 else {"supported": False}
150
+ parsed["raw_construction"] = construction if has_permits else {"supported": False}
151
  parsed["data_available"] = True
152
+ parsed["city_supported"] = True
153
+ parsed["city_id"] = cfg["id"]
154
+ parsed["has_311"] = has_311
155
+ parsed["has_permits"] = has_permits
156
+ parsed["has_permit_cost"] = has_cost
157
  return parsed
158
 
159
+ # Fallback if Gemma's JSON didn't parse still return the raw signals
160
+ # so the dossier shows something useful.
161
  return {
162
  "data_available": True,
163
+ "city_supported": True,
164
+ "city_id": cfg["id"],
165
+ "has_311": has_311,
166
+ "has_permits": has_permits,
167
+ "has_permit_cost": has_cost,
168
  "basement_flooding_reports": reports.get("basement_flooding", 0),
169
  "street_flooding_reports": reports.get("street_flooding", 0),
170
  "total_reports": reports.get("total_reports", 0),
 
172
  "summary": (
173
  f"{reports.get('basement_flooding', 0)} basement + "
174
  f"{reports.get('street_flooding', 0)} street flood reports · "
175
+ f"{construction.get('total_permits', 0)} construction permits "
176
+ f"within 1km / 3y in {cfg['name']}"
177
  ),
178
  "raw_311": reports,
179
  "raw_construction": construction,
app/data/cities.py ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ City registry for the local agent.
3
+
4
+ Each entry encodes everything the 311 + permits tools need to query
5
+ the city's Socrata-hosted open data: dataset IDs, the field names
6
+ used for category/cost/date/location, the actual flood-related
7
+ category values, and a short context blurb that gets injected into
8
+ the local-agent prompt so Gemma 4 has accurate sewer-system info
9
+ when it interprets the signals.
10
+
11
+ All entries verified live on 2026-05-04 against the cities' open-data
12
+ portals. The honesty-tax: most cities don't expose project cost in
13
+ their permits dataset, and not every city categorizes 311 floods
14
+ explicitly. Each config field documents what's actually available
15
+ so the dossier can degrade gracefully (e.g. show permit COUNT but
16
+ suppress the dollar narrative when no cost field exists).
17
+ """
18
+ from typing import Optional
19
+
20
+
21
+ # Standardize state matching: accept full name OR 2-letter code.
22
+ def _state_match(input_state: str, codes: tuple[str, ...]) -> bool:
23
+ s = (input_state or "").strip().lower()
24
+ return any(s == c.lower() for c in codes)
25
+
26
+
27
+ CITIES: list[dict] = [
28
+ # ---- CHICAGO --------------------------------------------------------
29
+ {
30
+ "id": "chicago",
31
+ "name": "Chicago",
32
+ "state_codes": ("IL", "Illinois"),
33
+ "match_fn": lambda city, state: "chicago" in (city or "").lower() and _state_match(state, ("IL", "Illinois")),
34
+ "context_blurb": (
35
+ "Chicago has a combined sewer system covering ~80% of the city. "
36
+ "Combined sewer overflows after ~0.67 in/hr of rain. 42% of Cook "
37
+ "County is impervious surface. MWRD's Deep Tunnel (TARP) provides "
38
+ "buffering but local sewers still bottleneck at neighborhood scale."
39
+ ),
40
+ "311": {
41
+ "url": "https://data.cityofchicago.org/resource/v6vf-nfxy.json",
42
+ "category_field": "sr_short_code",
43
+ # Chicago uses short codes: WIB = Water in Basement, SFL = Street Flooding
44
+ "flood_categories": ("WIB", "SFL"),
45
+ "category_in_clause": "sr_short_code in('WIB','SFL')",
46
+ "date_field": "created_date",
47
+ "location_field": "location",
48
+ "select_fields": "sr_short_code,created_date,street_address,ward",
49
+ "address_field_template": "street_address",
50
+ },
51
+ "permits": {
52
+ "url": "https://data.cityofchicago.org/resource/ydr8-5enu.json",
53
+ "permit_type_field": "permit_type",
54
+ "permit_type_values": (
55
+ "PERMIT - NEW CONSTRUCTION",
56
+ "PERMIT - RENOVATION/ALTERATION",
57
+ ),
58
+ "cost_field": "reported_cost",
59
+ "date_field": "issue_date",
60
+ "location_field": "location",
61
+ "select_fields": (
62
+ "permit_type,work_description,reported_cost,issue_date,"
63
+ "latitude,longitude,street_number,street_direction,"
64
+ "street_name,total_fee"
65
+ ),
66
+ "address_keys": ("street_number", "street_direction", "street_name"),
67
+ "new_construction_marker": "NEW CONSTRUCTION",
68
+ "renovation_marker": "RENOVATION",
69
+ "has_cost": True,
70
+ },
71
+ },
72
+ # ---- NEW YORK CITY --------------------------------------------------
73
+ {
74
+ "id": "nyc",
75
+ "name": "New York City",
76
+ "state_codes": ("NY", "New York"),
77
+ "match_fn": lambda city, state: any(t in (city or "").lower() for t in ("new york", "manhattan", "brooklyn", "queens", "bronx", "staten island")) and _state_match(state, ("NY", "New York")),
78
+ "context_blurb": (
79
+ "NYC has combined sewer systems across 60% of the city — including "
80
+ "all of Manhattan, much of Brooklyn, and parts of Queens and the "
81
+ "Bronx. Storm intensity above ~1.5 in/hr triggers combined-sewer "
82
+ "overflows into NY Harbor and basement backups. NYC DEP's bluebelts "
83
+ "and grey infrastructure are unevenly distributed."
84
+ ),
85
+ "311": {
86
+ "url": "https://data.cityofnewyork.us/resource/erm2-nwe9.json",
87
+ "category_field": "complaint_type",
88
+ # NYC complaint_type values verified 2026-05 (top flood-related)
89
+ "flood_categories": ("Sewer", "Sewer Maintenance"),
90
+ "category_in_clause": "complaint_type in('Sewer','Sewer Maintenance')",
91
+ "date_field": "created_date",
92
+ "location_field": "location",
93
+ # NYC uses lat/lon directly:
94
+ "select_fields": "complaint_type,descriptor,created_date,incident_address,borough",
95
+ "address_field_template": "incident_address",
96
+ },
97
+ # NYC permits intentionally DEFERRED. The DOB datasets are fragmented:
98
+ # ipu4-2q9a (legacy "DOB Permit Issuance") — most recent NB rows
99
+ # are from 2022; the dataset stopped being updated regularly.
100
+ # rbx6-tga4 (DOB NOW: Build – Approved Permits) — has cost
101
+ # (estimated_job_costs) and lat/lon, but NO date column
102
+ # suitable for "last 3 years" filtering.
103
+ # w9ak-ipjd (Active Construction Permits) — has filing_date and
104
+ # initial_cost in the right format, but is filtered to currently-
105
+ # active permits, so the historical 3-year window is empty for
106
+ # most areas.
107
+ # The right fix is a custom NYC client that joins multiple datasets;
108
+ # for the hackathon scope NYC ships with 311-only and the dossier
109
+ # honestly says permits-deferred.
110
+ "permits": None,
111
+ },
112
+ # ---- SAN FRANCISCO --------------------------------------------------
113
+ {
114
+ "id": "sf",
115
+ "name": "San Francisco",
116
+ "state_codes": ("CA", "California"),
117
+ "match_fn": lambda city, state: "san francisco" in (city or "").lower() and _state_match(state, ("CA", "California")),
118
+ "context_blurb": (
119
+ "San Francisco operates a fully combined sewer system citywide — "
120
+ "the only major California city to do so. Heavy rain plus high "
121
+ "tide concentrates overflows. SF Public Utilities Commission has "
122
+ "documented chronic flooding hotspots in Mission, Bayview, and "
123
+ "the South of Market areas."
124
+ ),
125
+ "311": {
126
+ "url": "https://data.sfgov.org/resource/vw6y-z8j6.json",
127
+ "category_field": "service_name",
128
+ "flood_categories": ("Sewer Issues", "Sewer"),
129
+ "category_in_clause": "service_name in('Sewer Issues','Sewer')",
130
+ "date_field": "requested_datetime",
131
+ "location_field": "point",
132
+ "select_fields": "service_name,service_subtype,requested_datetime,address",
133
+ "address_field_template": "address",
134
+ },
135
+ "permits": {
136
+ "url": "https://data.sfgov.org/resource/i98e-djp9.json",
137
+ "permit_type_field": "permit_type_definition",
138
+ "permit_type_values": (
139
+ "new construction",
140
+ "new construction wood frame",
141
+ "additions alterations or repairs",
142
+ ),
143
+ "cost_field": "estimated_cost",
144
+ "cost_is_string": True, # SF stores estimated_cost as text
145
+ "date_field": "issued_date",
146
+ "location_field": "location", # SF permits has a real Socrata Point column
147
+ "select_fields": (
148
+ "permit_type,permit_type_definition,description,estimated_cost,"
149
+ "revised_cost,issued_date,filed_date,street_number,street_name,"
150
+ "street_suffix,zipcode"
151
+ ),
152
+ "address_keys": ("street_number", "street_name", "street_suffix"),
153
+ "new_construction_marker": "new construction",
154
+ "renovation_marker": "alterations",
155
+ "has_cost": True,
156
+ },
157
+ },
158
+ # ---- LOS ANGELES ----------------------------------------------------
159
+ {
160
+ "id": "la",
161
+ "name": "Los Angeles",
162
+ "state_codes": ("CA", "California"),
163
+ "match_fn": lambda city, state: any(t in (city or "").lower() for t in ("los angeles", "los-angeles")) and _state_match(state, ("CA", "California")),
164
+ "context_blurb": (
165
+ "Los Angeles has a separated storm-sanitary sewer system — the "
166
+ "two pipe networks don't co-mingle, so basement sewer-backup "
167
+ "flooding is rare. The dominant LA flood mode is FLASH FLOODING "
168
+ "during winter atmospheric-river events, when concrete-channelized "
169
+ "rivers (LA River, Ballona Creek) and soft-bottomed creeks rise "
170
+ "rapidly. Hillside debris flows after wildfire are a separate "
171
+ "wet-season hazard."
172
+ ),
173
+ "311": None, # LA's 311 dataset doesn't categorize floods cleanly enough; deferred
174
+ "permits": {
175
+ "url": "https://data.lacity.org/resource/pi9x-tg5x.json",
176
+ "permit_type_field": "permit_type",
177
+ "permit_type_values": ("Bldg-New", "Bldg-Alter/Repair", "Bldg-Addition"),
178
+ "cost_field": "valuation",
179
+ "cost_is_string": True, # LA stores valuation as text
180
+ "date_field": "issue_date",
181
+ "location_field": None,
182
+ "select_fields": (
183
+ "permit_type,permit_sub_type,work_desc,valuation,issue_date,"
184
+ "primary_address,zip_code,lat,lon"
185
+ ),
186
+ "address_keys": ("primary_address",),
187
+ "new_construction_marker": "New",
188
+ "renovation_marker": "Alter",
189
+ "has_cost": True,
190
+ "lat_field": "lat",
191
+ "lon_field": "lon",
192
+ "lat_lon_is_string": True, # LA stores lat/lon as text — need cast
193
+ },
194
+ },
195
+ # ---- AUSTIN ---------------------------------------------------------
196
+ {
197
+ "id": "austin",
198
+ "name": "Austin",
199
+ "state_codes": ("TX", "Texas"),
200
+ "match_fn": lambda city, state: "austin" in (city or "").lower() and _state_match(state, ("TX", "Texas")),
201
+ "context_blurb": (
202
+ "Austin has separated storm and sanitary sewers, but the storm "
203
+ "drain system is undersized for the increasingly extreme rainfall "
204
+ "events of the post-2010 Texas climate. Onion Creek and Williamson "
205
+ "Creek have historic flash-flood corridors. Austin's hilly topo "
206
+ "concentrates runoff into well-known low-point intersections — the "
207
+ "city publishes a 'Flood Early Warning System' map for these spots."
208
+ ),
209
+ "311": {
210
+ "url": "https://data.austintexas.gov/resource/xwdj-i9he.json",
211
+ "category_field": "sr_type_desc",
212
+ "flood_categories": (
213
+ "Flooding Current (Non-Emergency)",
214
+ "Flooding - Past",
215
+ "WPD - Flooding Current",
216
+ "WPD - Flooding Past",
217
+ "WPD - Channels/Creek/Drainage Issues",
218
+ "WPD - Storm Drain Services",
219
+ ),
220
+ "category_in_clause": (
221
+ "sr_type_desc in("
222
+ "'Flooding Current (Non-Emergency)',"
223
+ "'Flooding - Past',"
224
+ "'WPD - Flooding Current',"
225
+ "'WPD - Flooding Past',"
226
+ "'WPD - Channels/Creek/Drainage Issues',"
227
+ "'WPD - Storm Drain Services'"
228
+ ")"
229
+ ),
230
+ "date_field": "sr_created_date",
231
+ "location_field": "sr_location_lat_long", # real Point column
232
+ "select_fields": "sr_type_desc,sr_created_date,sr_location",
233
+ "address_field_template": "sr_location",
234
+ },
235
+ "permits": {
236
+ "url": "https://data.austintexas.gov/resource/3syk-w9eu.json",
237
+ "permit_type_field": "permittype",
238
+ # Austin permittype: BP=building, MP=mechanical, etc. We want the
239
+ # construction-relevant ones via permit_class_mapped + work_class.
240
+ "permit_type_values": ("BP",),
241
+ "cost_field": None, # Austin permits dataset has no cost field
242
+ "date_field": "issue_date",
243
+ "location_field": "location",
244
+ "select_fields": (
245
+ "permittype,permit_type_desc,permit_class_mapped,work_class,"
246
+ "description,issue_date,permit_location,latitude,longitude,"
247
+ "original_address1,original_zip"
248
+ ),
249
+ "address_keys": ("original_address1",),
250
+ "new_construction_marker": "New",
251
+ "renovation_marker": "Remodel",
252
+ "has_cost": False,
253
+ "lat_field": "latitude",
254
+ "lon_field": "longitude",
255
+ },
256
+ },
257
+ ]
258
+
259
+
260
+ def find_city(city: str, state: str) -> Optional[dict]:
261
+ """Look up a registry entry for the given geocoded city/state, or None."""
262
+ for entry in CITIES:
263
+ try:
264
+ if entry["match_fn"](city, state):
265
+ return entry
266
+ except Exception:
267
+ continue
268
+ return None
269
+
270
+
271
+ def supported_city_names() -> list[str]:
272
+ return [c["name"] for c in CITIES]
app/main.py CHANGED
@@ -8,7 +8,7 @@ from fastapi.staticfiles import StaticFiles
8
  from app.api.assess import router as assess_router
9
  from app.api.health import router as health_router
10
 
11
- app = FastAPI(title="FlutIQ", version="0.12.0")
12
 
13
  # CORS still permissive for split-deployment scenarios. With the
14
  # bundled deploy (frontend served from FastAPI) it's a no-op because
 
8
  from app.api.assess import router as assess_router
9
  from app.api.health import router as health_router
10
 
11
+ app = FastAPI(title="FlutIQ", version="0.13.0")
12
 
13
  # CORS still permissive for split-deployment scenarios. With the
14
  # bundled deploy (frontend served from FastAPI) it's a no-op because
app/tools/building_permits.py CHANGED
@@ -1,136 +1,215 @@
1
  """
2
- Chicago Building Permits via Socrata SODA API.
3
-
4
- Why this matters for flood risk:
5
- A property's flood risk is not static. As the surrounding block
6
- densifies new condos, parking lots, commercial buildings — more
7
- absorbent ground is replaced with impervious surface. The combined
8
- sewer system handling stormwater doesn't get upgraded; it just gets
9
- more overwhelmed. So a house that hasn't flooded in 50 years can
10
- suddenly be vulnerable because of what a developer built nearby.
11
-
12
- This tool surfaces that signal: significant construction projects
13
- (new builds + major renovations >$100K) within a configurable radius
14
- of the property in the last few years, with cost/scale information
15
- and a year-over-year trend.
16
-
17
- Free dataset (`ydr8-5enu`, no auth required) same Socrata platform
18
- as our Chicago 311 data. Non-Chicago addresses get this from the
19
- local_agent's graceful degrade path, so this tool is Chicago-only by
20
- construction.
21
  """
 
22
  from datetime import datetime, timedelta, timezone
 
23
 
24
  import httpx
25
 
26
- PERMITS_URL = "https://data.cityofchicago.org/resource/ydr8-5enu.json"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
  async def get_nearby_construction(
 
30
  lat: float,
31
  lon: float,
32
  radius_m: int = 1000,
33
  years: int = 3,
34
  min_cost: int = 100_000,
35
  ) -> dict:
36
- """Find significant construction within `radius_m` over the last `years`.
37
 
38
- Returns a structured dict with counts, total cost, top-5 major
39
- projects, and a YoY trend ("increasing" / "stable" / "decreasing").
40
  """
41
- now = datetime.now(timezone.utc)
42
- since = (now - timedelta(days=365 * years)).strftime("%Y-%m-%dT00:00:00")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
- where = (
45
- "permit_type in('PERMIT - NEW CONSTRUCTION','PERMIT - RENOVATION/ALTERATION')"
46
- f" AND issue_date > '{since}'"
47
- f" AND within_circle(location, {lat}, {lon}, {radius_m})"
48
- f" AND reported_cost > {min_cost}"
49
- )
50
  params = {
51
- "$where": where,
52
  "$limit": 500,
53
- # Schema verified against the live dataset 2026-05-04: there is
54
- # no `suffix`, `_total_sqft`, or `community_area` column despite
55
- # what the source spec suggested. Reported cost is the best
56
- # available proxy for project scale.
57
- "$select": (
58
- "permit_type,work_description,reported_cost,issue_date,"
59
- "latitude,longitude,street_number,street_direction,"
60
- "street_name,total_fee"
61
- ),
62
- "$order": "reported_cost DESC",
63
  }
64
 
65
- async with httpx.AsyncClient(timeout=30) as client:
66
- resp = await client.get(PERMITS_URL, params=params)
67
 
68
  if resp.status_code != 200:
69
  return {
 
 
70
  "error": f"HTTP {resp.status_code}",
71
  "permits_found": False,
72
  "total_permits": 0,
 
73
  }
74
 
75
  try:
76
  permits = resp.json()
77
  except ValueError:
78
- return {"error": "non-JSON response", "permits_found": False, "total_permits": 0}
 
 
 
 
 
 
 
79
 
80
  if isinstance(permits, dict) and "error" in permits:
81
  return {
 
 
82
  "error": permits.get("message", "Socrata error"),
83
  "permits_found": False,
84
  "total_permits": 0,
 
85
  }
86
 
87
- new_construction = [
88
- p for p in permits if "NEW CONSTRUCTION" in (p.get("permit_type") or "")
89
- ]
90
- renovations = [
91
- p for p in permits if "RENOVATION" in (p.get("permit_type") or "")
92
- ]
 
 
 
 
 
 
 
93
 
94
  total_cost = 0.0
95
  for p in permits:
96
- try:
97
- total_cost += float(p.get("reported_cost") or 0)
98
- except (ValueError, TypeError):
99
- pass
 
100
 
101
  def _addr(p: dict) -> str:
102
- parts = [
103
- p.get("street_number"),
104
- p.get("street_direction"),
105
- p.get("street_name"),
106
- ]
107
- return " ".join(part for part in parts if part).strip()
108
-
109
- # Top 5 by reported cost (already sorted DESC by Socrata).
110
  major = []
111
  for p in permits:
112
- try:
113
- cost = float(p.get("reported_cost") or 0)
114
- except (ValueError, TypeError):
115
  continue
116
- if cost > 500_000:
117
- major.append({
118
- "type": "new" if "NEW CONSTRUCTION" in (p.get("permit_type") or "") else "renovation",
119
- "description": (p.get("work_description") or "").strip()[:200],
120
- "cost": cost,
121
- "date": (p.get("issue_date") or "")[:10],
122
- "address": _addr(p),
123
- })
 
 
 
 
124
  if len(major) >= 5:
125
  break
126
 
127
- # Year-over-year trend (last 12mo vs prior 12mo).
 
128
  cutoff_12 = (now - timedelta(days=365)).strftime("%Y-%m-%d")
129
  cutoff_24 = (now - timedelta(days=730)).strftime("%Y-%m-%d")
130
- last_12 = [p for p in permits if (p.get("issue_date") or "") > cutoff_12]
131
  prior_12 = [
132
  p for p in permits
133
- if cutoff_24 < (p.get("issue_date") or "") <= cutoff_12
134
  ]
135
 
136
  if len(prior_12) == 0:
@@ -143,11 +222,14 @@ async def get_nearby_construction(
143
  direction = "stable"
144
 
145
  return {
 
 
146
  "permits_found": True,
 
147
  "total_permits": len(permits),
148
  "new_construction_count": len(new_construction),
149
  "renovation_count": len(renovations),
150
- "total_reported_cost": round(total_cost, 2),
151
  "major_projects": major,
152
  "trend": {
153
  "last_12_months": len(last_12),
@@ -156,6 +238,6 @@ async def get_nearby_construction(
156
  },
157
  "radius_m": radius_m,
158
  "years": years,
159
- "min_cost_filter": min_cost,
160
  "since": since,
161
  }
 
1
  """
2
+ City building-permits tool Socrata SODA, city-aware.
3
+
4
+ Was originally Chicago-only; now driven by app.data.cities so any
5
+ supported city can plug in. The exported function name
6
+ `get_nearby_construction` is preserved so callers don't change.
7
+
8
+ Honest about per-city data quality:
9
+ - Some cities (Chicago, SF, LA, Dallas) expose a project cost
10
+ field the dossier can show '$80M new medical office'.
11
+ - Other cities (NYC DOB, Austin) don't expose cost in their
12
+ public dataset the dossier degrades to permit COUNT and
13
+ project type narrative ('14 new buildings, increasing trend').
14
+
15
+ Each city's config encodes the right field names for permit_type,
16
+ cost (if any), date, and the geo column type (Socrata Point column
17
+ for `within_circle` vs. bbox-on-lat/lon for cities without one).
 
 
 
18
  """
19
+ import math
20
  from datetime import datetime, timedelta, timezone
21
+ from typing import Optional
22
 
23
  import httpx
24
 
25
+
26
+ def _bbox_clause(
27
+ lat: float,
28
+ lon: float,
29
+ radius_m: int,
30
+ lat_field: str,
31
+ lon_field: str,
32
+ is_string: bool = False,
33
+ ) -> str:
34
+ """SODA WHERE fragment that bounds rows within ~radius_m of (lat, lon).
35
+ When `is_string=True` (NYC DOB, LA permits) the fields get wrapped in
36
+ `to_number()` because the dataset stores lat/lon as text."""
37
+ lat_deg = radius_m / 111_000
38
+ lon_deg = radius_m / (111_000 * max(0.1, math.cos(math.radians(lat))))
39
+ # SoQL doesn't have to_number(); use PostgreSQL :: cast for cities
40
+ # that store lat/lon as text.
41
+ lf = f"{lat_field}::number" if is_string else lat_field
42
+ lnf = f"{lon_field}::number" if is_string else lon_field
43
+ return (
44
+ f"{lf} >= {lat - lat_deg} AND {lf} <= {lat + lat_deg} "
45
+ f"AND {lnf} >= {lon - lon_deg} AND {lnf} <= {lon + lon_deg}"
46
+ )
47
+
48
+
49
+ def _coerce_float(v) -> Optional[float]:
50
+ if v is None or v == "":
51
+ return None
52
+ try:
53
+ return float(v)
54
+ except (TypeError, ValueError):
55
+ return None
56
 
57
 
58
  async def get_nearby_construction(
59
+ config: dict,
60
  lat: float,
61
  lon: float,
62
  radius_m: int = 1000,
63
  years: int = 3,
64
  min_cost: int = 100_000,
65
  ) -> dict:
66
+ """Fetch significant construction permits near a location for a city.
67
 
68
+ Returns the same shape regardless of city, with `has_cost` flagging
69
+ whether the dollar-amount narrative is meaningful for this city.
70
  """
71
+ cfg = (config or {}).get("permits")
72
+ if not cfg:
73
+ return {
74
+ "permits_found": False,
75
+ "supported": False,
76
+ "total_permits": 0,
77
+ "city": (config or {}).get("name", ""),
78
+ }
79
+
80
+ since = (
81
+ datetime.now(timezone.utc) - timedelta(days=365 * years)
82
+ ).strftime("%Y-%m-%dT00:00:00")
83
+
84
+ type_field = cfg["permit_type_field"]
85
+ type_values = cfg["permit_type_values"]
86
+ type_clause = " OR ".join(f"{type_field}='{v}'" for v in type_values)
87
+
88
+ cost_field = cfg.get("cost_field")
89
+ has_cost = bool(cost_field and cfg.get("has_cost"))
90
+
91
+ location_field = cfg.get("location_field")
92
+ if location_field:
93
+ geo_clause = f"within_circle({location_field}, {lat}, {lon}, {radius_m})"
94
+ else:
95
+ geo_clause = _bbox_clause(
96
+ lat, lon, radius_m,
97
+ cfg.get("lat_field", "latitude"),
98
+ cfg.get("lon_field", "longitude"),
99
+ is_string=cfg.get("lat_lon_is_string", False),
100
+ )
101
+
102
+ where_parts = [
103
+ f"({type_clause})",
104
+ f"{cfg['date_field']} > '{since}'",
105
+ f"({geo_clause})",
106
+ ]
107
+ if has_cost and min_cost:
108
+ # Some cities (SF) store cost as text — apply same :: cast as for lat/lon.
109
+ cost_expr = (
110
+ f"{cost_field}::number"
111
+ if cfg.get("cost_is_string")
112
+ else cost_field
113
+ )
114
+ where_parts.append(f"{cost_expr} > {min_cost}")
115
 
 
 
 
 
 
 
116
  params = {
117
+ "$where": " AND ".join(where_parts),
118
  "$limit": 500,
119
+ "$select": cfg["select_fields"],
120
+ "$order": f"{cost_field} DESC" if has_cost else f"{cfg['date_field']} DESC",
 
 
 
 
 
 
 
 
121
  }
122
 
123
+ async with httpx.AsyncClient(timeout=30, follow_redirects=True) as client:
124
+ resp = await client.get(cfg["url"], params=params)
125
 
126
  if resp.status_code != 200:
127
  return {
128
+ "city": config["name"],
129
+ "supported": True,
130
  "error": f"HTTP {resp.status_code}",
131
  "permits_found": False,
132
  "total_permits": 0,
133
+ "has_cost": has_cost,
134
  }
135
 
136
  try:
137
  permits = resp.json()
138
  except ValueError:
139
+ return {
140
+ "city": config["name"],
141
+ "supported": True,
142
+ "error": "non-JSON response",
143
+ "permits_found": False,
144
+ "total_permits": 0,
145
+ "has_cost": has_cost,
146
+ }
147
 
148
  if isinstance(permits, dict) and "error" in permits:
149
  return {
150
+ "city": config["name"],
151
+ "supported": True,
152
  "error": permits.get("message", "Socrata error"),
153
  "permits_found": False,
154
  "total_permits": 0,
155
+ "has_cost": has_cost,
156
  }
157
 
158
+ new_marker = cfg.get("new_construction_marker", "")
159
+ reno_marker = cfg.get("renovation_marker", "")
160
+
161
+ new_construction = []
162
+ renovations = []
163
+ for p in permits:
164
+ ptype = (p.get(type_field) or "")
165
+ if new_marker and new_marker in ptype:
166
+ new_construction.append(p)
167
+ elif reno_marker and reno_marker in ptype:
168
+ renovations.append(p)
169
+ else:
170
+ renovations.append(p)
171
 
172
  total_cost = 0.0
173
  for p in permits:
174
+ c = _coerce_float(p.get(cost_field) if cost_field else None)
175
+ if c:
176
+ total_cost += c
177
+
178
+ address_keys = cfg.get("address_keys") or ()
179
 
180
  def _addr(p: dict) -> str:
181
+ parts = [p.get(k) for k in address_keys]
182
+ return " ".join(str(part) for part in parts if part).strip()
183
+
184
+ # Top 5 — by cost if we have it, else by date.
 
 
 
 
185
  major = []
186
  for p in permits:
187
+ cost = _coerce_float(p.get(cost_field) if cost_field else None) or 0.0
188
+ if has_cost and cost <= 500_000:
 
189
  continue
190
+ ptype = p.get(type_field) or ""
191
+ major.append({
192
+ "type": "new" if (new_marker and new_marker in ptype) else "renovation",
193
+ "description": (
194
+ p.get("work_description") or p.get("description") or p.get("work_desc") or ""
195
+ ).strip()[:200],
196
+ "cost": cost if has_cost else None,
197
+ "date": (
198
+ p.get(cfg["date_field"]) or ""
199
+ )[:10],
200
+ "address": _addr(p),
201
+ })
202
  if len(major) >= 5:
203
  break
204
 
205
+ # Year-over-year trend (last 12mo vs prior 12mo)
206
+ now = datetime.now(timezone.utc)
207
  cutoff_12 = (now - timedelta(days=365)).strftime("%Y-%m-%d")
208
  cutoff_24 = (now - timedelta(days=730)).strftime("%Y-%m-%d")
209
+ last_12 = [p for p in permits if (p.get(cfg["date_field"]) or "") > cutoff_12]
210
  prior_12 = [
211
  p for p in permits
212
+ if cutoff_24 < (p.get(cfg["date_field"]) or "") <= cutoff_12
213
  ]
214
 
215
  if len(prior_12) == 0:
 
222
  direction = "stable"
223
 
224
  return {
225
+ "city": config["name"],
226
+ "supported": True,
227
  "permits_found": True,
228
+ "has_cost": has_cost,
229
  "total_permits": len(permits),
230
  "new_construction_count": len(new_construction),
231
  "renovation_count": len(renovations),
232
+ "total_reported_cost": round(total_cost, 2) if has_cost else None,
233
  "major_projects": major,
234
  "trend": {
235
  "last_12_months": len(last_12),
 
238
  },
239
  "radius_m": radius_m,
240
  "years": years,
241
+ "min_cost_filter": min_cost if has_cost else None,
242
  "since": since,
243
  }
app/tools/chicago_311.py CHANGED
@@ -1,48 +1,157 @@
1
- """Chicago 311 service requests via Socrata SODA API. Free, no auth."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  from datetime import datetime, timedelta, timezone
3
 
4
  import httpx
5
 
6
- # v6vf-nfxy is the unified 311 service requests dataset.
7
- CHICAGO_311_URL = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
8
 
9
- # WIB = Water in Basement, SFL = Street Flooding (these are the
10
- # Chicago 311 sr_short_codes that signal urban flooding).
11
- FLOOD_CODES = ("WIB", "SFL")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
 
14
  async def get_flood_reports(
 
15
  lat: float,
16
  lon: float,
17
  radius_m: int = 500,
18
  years: int = 5,
19
  ) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  since = (
21
  datetime.now(timezone.utc) - timedelta(days=365 * years)
22
  ).strftime("%Y-%m-%dT00:00:00")
23
 
24
- codes = ",".join(f"'{c}'" for c in FLOOD_CODES)
25
- where = (
26
- f"sr_short_code in({codes}) "
27
- f"AND created_date > '{since}' "
28
- f"AND within_circle(location, {lat}, {lon}, {radius_m})"
29
- )
 
 
 
 
 
 
 
 
 
 
 
30
  params = {
31
  "$where": where,
32
  "$limit": 1000,
33
- "$select": "sr_short_code,created_date,street_address,ward",
34
- "$order": "created_date DESC",
35
  }
36
 
37
- async with httpx.AsyncClient(timeout=30) as client:
38
- resp = await client.get(CHICAGO_311_URL, params=params)
39
- resp.raise_for_status()
40
- reports = resp.json()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- basement = [r for r in reports if r.get("sr_short_code") == "WIB"]
43
- street = [r for r in reports if r.get("sr_short_code") == "SFL"]
 
 
 
 
 
 
 
 
 
44
 
45
  return {
 
 
46
  "total_reports": len(reports),
47
  "basement_flooding": len(basement),
48
  "street_flooding": len(street),
@@ -50,4 +159,6 @@ async def get_flood_reports(
50
  "years": years,
51
  "since": since,
52
  "recent_reports": reports[:10],
 
 
53
  }
 
1
+ """City 311 flood-report tool Socrata SODA, city-aware.
2
+
3
+ Was originally chicago_311 only; now driven by the registry in
4
+ app.data.cities so any supported city can plug in without changing
5
+ caller code. The exported function name `get_flood_reports` is kept
6
+ for backward compat with the local_agent.
7
+
8
+ Each city's 311 dataset has different field names (sr_short_code in
9
+ Chicago, complaint_type in NYC, service_name in SF, sr_type_desc in
10
+ Austin) and different flood-related category VALUES. The registry
11
+ encodes both, plus the date field name and the geo column type
12
+ (some cities expose a Socrata Point column for `within_circle`,
13
+ others only have separate latitude/longitude columns and need a
14
+ bbox query instead).
15
+ """
16
+ import math
17
  from datetime import datetime, timedelta, timezone
18
 
19
  import httpx
20
 
 
 
21
 
22
+ def _bbox_clause(
23
+ lat: float,
24
+ lon: float,
25
+ radius_m: int,
26
+ lat_field: str,
27
+ lon_field: str,
28
+ is_string: bool = False,
29
+ ) -> str:
30
+ """Build a SODA WHERE clause filtering to rows within ~radius_m of
31
+ (lat, lon) using a lat/lon bounding box. When `is_string=True` the
32
+ fields are wrapped in `to_number()` because some cities (NYC DOB,
33
+ LA building permits) store lat/lon as text rather than numeric."""
34
+ lat_deg = radius_m / 111_000
35
+ lon_deg = radius_m / (111_000 * max(0.1, math.cos(math.radians(lat))))
36
+ # SoQL doesn't have to_number(); use PostgreSQL :: cast syntax for
37
+ # cities that store lat/lon as text (NYC DOB, LA building permits).
38
+ lf = f"{lat_field}::number" if is_string else lat_field
39
+ lnf = f"{lon_field}::number" if is_string else lon_field
40
+ return (
41
+ f"{lf} >= {lat - lat_deg} AND {lf} <= {lat + lat_deg} "
42
+ f"AND {lnf} >= {lon - lon_deg} AND {lnf} <= {lon + lon_deg}"
43
+ )
44
 
45
 
46
  async def get_flood_reports(
47
+ config: dict,
48
  lat: float,
49
  lon: float,
50
  radius_m: int = 500,
51
  years: int = 5,
52
  ) -> dict:
53
+ """Fetch flood-related 311 reports near a location for the given city
54
+ config. Returns a dict ready for the local_agent to interpret.
55
+
56
+ The shape of the return is intentionally Chicago-shaped (basement /
57
+ street counts) so the rest of the pipeline doesn't have to change —
58
+ cities without that distinction surface under "total_reports".
59
+ """
60
+ cfg = config.get("311") if config else None
61
+ if not cfg:
62
+ return {
63
+ "city": (config or {}).get("name", ""),
64
+ "supported": False,
65
+ "total_reports": 0,
66
+ "basement_flooding": 0,
67
+ "street_flooding": 0,
68
+ "recent_reports": [],
69
+ "radius_m": radius_m,
70
+ "years": years,
71
+ }
72
+
73
  since = (
74
  datetime.now(timezone.utc) - timedelta(days=365 * years)
75
  ).strftime("%Y-%m-%dT00:00:00")
76
 
77
+ date_field = cfg["date_field"]
78
+ cat_clause = cfg["category_in_clause"]
79
+ location_field = cfg.get("location_field")
80
+
81
+ # Some cities have a Socrata Point column → within_circle. Others only
82
+ # expose lat/lon scalars → bbox.
83
+ if location_field:
84
+ geo_clause = f"within_circle({location_field}, {lat}, {lon}, {radius_m})"
85
+ else:
86
+ geo_clause = _bbox_clause(
87
+ lat, lon, radius_m,
88
+ cfg.get("lat_field", "latitude"),
89
+ cfg.get("lon_field", "longitude"),
90
+ is_string=cfg.get("lat_lon_is_string", False),
91
+ )
92
+
93
+ where = f"{cat_clause} AND {date_field} > '{since}' AND ({geo_clause})"
94
  params = {
95
  "$where": where,
96
  "$limit": 1000,
97
+ "$select": cfg["select_fields"],
98
+ "$order": f"{date_field} DESC",
99
  }
100
 
101
+ async with httpx.AsyncClient(timeout=30, follow_redirects=True) as client:
102
+ resp = await client.get(cfg["url"], params=params)
103
+
104
+ if resp.status_code != 200:
105
+ return {
106
+ "city": config["name"],
107
+ "supported": True,
108
+ "error": f"HTTP {resp.status_code}",
109
+ "total_reports": 0,
110
+ "basement_flooding": 0,
111
+ "street_flooding": 0,
112
+ "recent_reports": [],
113
+ "radius_m": radius_m,
114
+ "years": years,
115
+ }
116
+
117
+ try:
118
+ reports = resp.json()
119
+ except ValueError:
120
+ return {
121
+ "city": config["name"],
122
+ "supported": True,
123
+ "error": "non-JSON response",
124
+ "total_reports": 0,
125
+ "basement_flooding": 0,
126
+ "street_flooding": 0,
127
+ "recent_reports": [],
128
+ "radius_m": radius_m,
129
+ "years": years,
130
+ }
131
+
132
+ # Heuristically split into "basement-flavored" vs "street-flavored"
133
+ # 311 reports for cities that surface that distinction. Chicago has
134
+ # explicit codes (WIB / SFL); other cities have to be inferred from
135
+ # the category value and descriptor.
136
+ cat_field = cfg["category_field"]
137
+ basement_keywords = ("WIB", "Sewer", "basement", "Basement")
138
+ street_keywords = ("SFL", "Street", "Drain", "Storm", "Flooding")
139
 
140
+ basement = []
141
+ street = []
142
+ for r in reports:
143
+ cat = (r.get(cat_field) or "")
144
+ descr = " ".join(str(v) for v in r.values()).lower()
145
+ if any(k in cat for k in basement_keywords) or "basement" in descr:
146
+ basement.append(r)
147
+ elif any(k in cat for k in street_keywords):
148
+ street.append(r)
149
+ else:
150
+ street.append(r) # default bucket
151
 
152
  return {
153
+ "city": config["name"],
154
+ "supported": True,
155
  "total_reports": len(reports),
156
  "basement_flooding": len(basement),
157
  "street_flooding": len(street),
 
159
  "years": years,
160
  "since": since,
161
  "recent_reports": reports[:10],
162
+ "category_field": cat_field,
163
+ "categories_queried": list(cfg["flood_categories"]),
164
  }
scripts/smoke_test_cities.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """End-to-end smoke test for all Tier-1 cities + a graceful-degrade check.
2
+
3
+ Run:
4
+ cd backend && set -a && source .env && set +a
5
+ PYTHONPATH=. .venv/bin/python scripts/smoke_test_cities.py
6
+ """
7
+ import asyncio
8
+
9
+ from app.data.cities import find_city
10
+ from app.tools.building_permits import get_nearby_construction
11
+ from app.tools.chicago_311 import get_flood_reports
12
+
13
+
14
+ TARGETS = [
15
+ ("Chicago", 41.8127, -87.6045, "Chicago", "Illinois"),
16
+ ("New York", 40.7589, -73.9851, "New York", "NY"),
17
+ ("San Francisco", 37.7749, -122.4194, "San Francisco", "California"),
18
+ ("Los Angeles", 34.0522, -118.2437, "Los Angeles", "California"),
19
+ ("Austin", 30.2672, -97.7431, "Austin", "Texas"),
20
+ ("Atlanta", 33.7490, -84.3880, "Atlanta", "Georgia"), # unsupported
21
+ ]
22
+
23
+
24
+ async def main() -> None:
25
+ for name, lat, lon, city, state in TARGETS:
26
+ cfg = find_city(city, state)
27
+ if cfg is None:
28
+ print(f"\n=== {name} → unsupported (graceful degrade) ===")
29
+ continue
30
+ print(f"\n=== {name} ===")
31
+
32
+ if cfg.get("311"):
33
+ r = await get_flood_reports(cfg, lat, lon)
34
+ if r.get("error"):
35
+ print(f" 311 ERROR: {r['error']}")
36
+ else:
37
+ print(f" 311: {r.get('total_reports', 0)} reports "
38
+ f"(basement={r.get('basement_flooding', 0)}, "
39
+ f"street={r.get('street_flooding', 0)})")
40
+ else:
41
+ print(" 311: not wired in registry")
42
+
43
+ if cfg.get("permits"):
44
+ p = await get_nearby_construction(cfg, lat, lon)
45
+ if p.get("error"):
46
+ print(f" permits ERROR: {p['error']}")
47
+ else:
48
+ cost_str = (
49
+ f"${p.get('total_reported_cost', 0):,.0f}"
50
+ if p.get("has_cost") else "count-only"
51
+ )
52
+ print(f" permits: {p.get('total_permits', 0)} permits, "
53
+ f"{p.get('new_construction_count', 0)} new, "
54
+ f"{cost_str}, trend={p.get('trend', {}).get('direction')}")
55
+ top = (p.get("major_projects") or [None])[0]
56
+ if top:
57
+ cost_field = (
58
+ f"${top.get('cost', 0):,.0f}" if top.get("cost") else ""
59
+ )
60
+ print(f" top: {cost_field} {top.get('date', '')} "
61
+ f"{(top.get('address') or '')[:60]}")
62
+ else:
63
+ print(" permits: not wired in registry (intentional)")
64
+
65
+
66
+ if __name__ == "__main__":
67
+ asyncio.run(main())
static/index.html CHANGED
@@ -1594,7 +1594,7 @@ const Chrome = ({ screen, onJump, dark, onToggleDark, language, onLanguageChange
1594
  <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
1595
  <span className="glyph">F</span>
1596
  <span>FlutIQ</span>
1597
- <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.12 · beta</span>
1598
  </div>
1599
  <div className="chrome-meta">
1600
  <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>
 
1594
  <div className="wordmark" onClick={()=>onJump("search")} style={{cursor:"pointer"}}>
1595
  <span className="glyph">F</span>
1596
  <span>FlutIQ</span>
1597
+ <span style={{color:"var(--ink-4)",fontSize:12,marginLeft:8,fontFamily:"JetBrains Mono"}}>v0.13 · beta</span>
1598
  </div>
1599
  <div className="chrome-meta">
1600
  <span className="pill static"><span className="dot"/>gemma-4 · OpenRouter</span>