Spaces:

is445fall2025
/

project3.1

Sleeping

App Files Files Community

jkhare2 commited on Dec 8, 2025

Commit

4b239eb

verified ·

1 Parent(s): 40f9641

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +72 -125

src/streamlit_app.py CHANGED Viewed

@@ -10,7 +10,6 @@ import streamlit as st
 import pandas as pd
 import numpy as np
 import plotly.express as px
-from urllib.parse import urlencode
 st.set_page_config(page_title="Chicago Parks in Motion", layout="wide")
@@ -32,16 +31,10 @@ def load_data():
     if "fee" in df.columns:
         df["fee"] = pd.to_numeric(df["fee"], errors="coerce")
-    # -------------------------
     # Extract Latitude / Longitude
-    # -------------------------
-    lat_col = None
-    lon_col = None
     if "location" in df.columns:
         def parse_lat_lon(val):
-            if pd.isna(val):
-                return (np.nan, np.nan)
             sval = str(val)
             if sval.startswith("POINT"):
                 try:
@@ -50,13 +43,6 @@ def load_data():
                     return lat, lon
                 except:
                     return (np.nan, np.nan)
-            if "latitude" in sval and "longitude" in sval:
-                try:
-                    import json
-                    j = json.loads(sval)
-                    return float(j.get("latitude", np.nan)), float(j.get("longitude", np.nan))
-                except:
-                    return (np.nan, np.nan)
             import re
             nums = re.findall(r"-?\d+\.\d+", sval)
             if len(nums) >= 2:
@@ -66,28 +52,8 @@ def load_data():
         latlon = df["location"].map(parse_lat_lon)
         df["latitude"] = latlon.map(lambda x: x[0])
         df["longitude"] = latlon.map(lambda x: x[1])
-        lat_col, lon_col = "latitude", "longitude"
-    if "the_geom" in df.columns and (lat_col is None or lon_col is None):
-        def parse_the_geom(val):
-            if pd.isna(val): return (np.nan, np.nan)
-            sval = str(val)
-            if "POINT" in sval:
-                try:
-                    inside = sval.split("(", 1)[1].rstrip(")")
-                    lon, lat = map(float, inside.strip().split())
-                    return lat, lon
-                except:
-                    return (np.nan, np.nan)
-            return (np.nan, np.nan)
-        latlon = df["the_geom"].map(parse_the_geom)
-        df["latitude"] = latlon.map(lambda x: x[0])
-        df["longitude"] = latlon.map(lambda x: x[1])
-    # -------------------------
-    # Clean categorical fields
-    # -------------------------
     if "activity_type" in df.columns:
         df["activity_type_clean"] = df["activity_type"].str.title().fillna("Unknown")
     elif "program_type" in df.columns:
@@ -97,32 +63,16 @@ def load_data():
     else:
         df["activity_type_clean"] = "Unknown"
-    # -------------------------
-    # Park Name extraction
-    # -------------------------
-    possible_park_cols = [
-        "park_name",
-        "park",
-        "location_facility",
-        "location_name",
-        "location",
-        "site_name"
-    ]
-    park_col = None
-    for col in possible_park_cols:
-        if col in df.columns:
-            park_col = col
-            break
     if park_col is not None:
         df["park_name"] = df[park_col].astype(str).replace(["", "nan", "None"], "Unknown Park")
     else:
         df["park_name"] = "Unknown Park"
-    # -------------------------
-    # Season extraction
-    # -------------------------
     if "start_date" in df.columns:
         df["start_date"] = pd.to_datetime(df["start_date"], errors="coerce")
@@ -140,44 +90,42 @@ def load_data():
     return df
 df = load_data()
 # -------------------------
-# Page header
 # -------------------------
 st.title("Chicago Parks in Motion: How Our City Plays")
-st.markdown("**Author:** Juhi Khare (jkhare2), Alisha Rawat (alishar4), Sutthana Koo-Anupong (sk188)")
 st.markdown("""
-**Central Visualization:**
-Our main interactive map and bar chart (below) serve as the central visualization of this project.
-They were first prototyped and tested in our associated Jupyter Notebook before being migrated to this Streamlit interface.
-We intentionally use large, clear layouts and high-contrast colors so that novice readers can explore the data without prior experience.
 """)
 # -------------------------
 # Sidebar filters
 # -------------------------
-st.sidebar.header("Filters & Settings")
 categories = sorted(df["activity_type_clean"].dropna().unique())
-categories = [c for c in categories if c != "nan"]
 chosen_category = st.sidebar.selectbox("Activity category", ["All"] + categories)
 seasons = sorted(df["season"].dropna().unique())
 chosen_season = st.sidebar.selectbox("Season", ["All"] + seasons)
-has_fee_col = "fee" in df.columns
-if has_fee_col:
     max_fee = float(np.nanmax(df["fee"].fillna(0)))
-    fee_limit = st.sidebar.slider("Maximum fee (USD)", 0.0, max(1.0, max_fee), float(max_fee))
 else:
     fee_limit = None
-park_query = st.sidebar.text_input("Search park name (partial)")
-# Apply filters
 filtered = df.copy()
 if chosen_category != "All":
     filtered = filtered[filtered["activity_type_clean"] == chosen_category]
@@ -188,11 +136,8 @@ if fee_limit is not None:
 if park_query:
     filtered = filtered[filtered["park_name"].str.contains(park_query, case=False, na=False)]
-st.sidebar.markdown(f"**Programs in current filter:** {len(filtered):,}")
-st.sidebar.markdown("""
-Filters improve accessibility by helping users explore small slices of data
-without needing technical skills or scrolling through thousands of rows.
-""")
 # -------------------------
 # Layout
@@ -205,77 +150,69 @@ main_col, side_col = st.columns((2, 1))
 with main_col:
     st.subheader("Central Interactive Visualization — Programs by Park")
-    view_type = st.radio("View type", ["Map (recommended)", "Bar chart (count by park)"], horizontal=True)
     if view_type.startswith("Map"):
-        if "latitude" in filtered.columns and "longitude" in filtered.columns and filtered[["latitude", "longitude"]].dropna().shape[0] > 0:
-            agg = filtered.groupby(["park_name", "latitude", "longitude"], dropna=True).size().reset_index(name="count")
-            # ★★★ NEW COLORFUL, CLEAR BUBBLE MAP ★★★
             fig_map = px.scatter_mapbox(
                 agg,
                 lat="latitude",
                 lon="longitude",
                 size="count",
-                size_max=28,
                 hover_name="park_name",
                 hover_data={"count": True},
                 color="count",
-                color_continuous_scale=["#FFE5CC", "#FF7F0E"],  # bright orange sequential colormap
                 zoom=10,
                 height=600,
             )
-            fig_map.update_traces(
-                marker=dict(
-                    opacity=0.90,
-                    line=dict(width=0.7, color="#303030")  # thin grey outline for contrast
-                )
-            )
-            fig_map.update_layout(
-                mapbox_style="open-street-map",
-                margin={"r": 0, "t": 0, "l": 0, "b": 0},
-            )
             st.plotly_chart(fig_map, use_container_width=True)
-            st.caption("Sequential orange colormap chosen intentionally to highlight program density while ensuring visibility on OpenStreetMap backgrounds.")
         else:
-            st.warning("No geographic coordinates found. Try using the bar chart view instead.")
     else:
         agg = filtered.groupby("park_name").size().reset_index(name="count").sort_values("count", ascending=False)
-        top_n = 25
-        agg_top = agg.head(top_n)
         fig_bar = px.bar(
-            agg_top,
             x="count",
             y="park_name",
             orientation="h",
             color="count",
-            color_continuous_scale="Blues",  # sequential colormap requirement
-            labels={"count": "Number of programs", "park_name": "Park"},
-            height=700,
         )
-        fig_bar.update_layout(
-            yaxis={'categoryorder': 'total ascending'},
-            margin={"r": 20, "t": 10, "l": 200, "b": 10},
-        )
         st.plotly_chart(fig_bar, use_container_width=True)
-        st.caption("Blues sequential colormap used to reinforce magnitude patterns.")
-    if st.checkbox("Show program sample table (first 50 rows)"):
         st.dataframe(filtered.head(50))
 # -------------------------
 # CONTEXTUAL VISUALIZATIONS
 # -------------------------
 with side_col:
-    st.subheader("Contextual Visual 1 — Activity Category Breakdown")
     cat_counts = df["activity_type_clean"].value_counts().reset_index()
     cat_counts.columns = ["activity_type", "count"]
@@ -285,16 +222,18 @@ with side_col:
         values="count",
         hole=0.35,
         height=300,
-        color_discrete_sequence=px.colors.qualitative.Set3  # chosen for categorical contrast
     )
     st.plotly_chart(fig_cat, use_container_width=True)
     st.caption("""
-    This visualization appears both in our Streamlit App and in our Jupyter Notebook.
-    We chose a categorical palette to clearly differentiate activity types for readers.
     """)
-    st.subheader("Contextual Visual 2 — Programs by Season")
-    season_counts = df["season"].dropna().value_counts().reset_index()
     season_counts.columns = ["Season", "Program Count"]
     fig_season = px.bar(
@@ -303,27 +242,28 @@ with side_col:
         y="Program Count",
         color="Program Count",
         color_continuous_scale="Viridis",
-        title="Number of Programs Offered by Season",
         text="Program Count",
     )
-    fig_season.update_traces(textposition="outside")
-    fig_season.update_layout(height=500)
     st.plotly_chart(fig_season, use_container_width=True)
     st.caption("""
-    This contextual visualization also appears in our notebook.
-    A sequential 'Viridis' colormap is used to ensure clear readability and accessibility.
     """)
     st.markdown("---")
-    st.subheader("Data & Notebook")
     st.markdown("""
     **Primary dataset:**
     Chicago Park District Activities — City of Chicago Data Portal
     https://data.cityofchicago.org/Parks-Recreation/Chicago-Park-District-Activities/tn7v-6rnw
-    All contextual visualizations and cleaning steps were first created in our Jupyter Notebook
-    and then migrated to this Streamlit app for public presentation.
     """)
 # -------------------------
@@ -333,11 +273,18 @@ st.markdown("---")
 st.header("What this data story is showing")
 st.markdown("""
-**1)** Chicago’s parks host a wide variety of recreational programs, including aquatics, sports, arts, and senior programming. Each row of the dataset represents a specific program offering. Our central visualization makes it easy to see which parks host the most activities and how program density varies across the city.
-**2)** The map visualization highlights geographic patterns, drawing attention to parks in denser neighborhoods where program availability tends to be higher. The use of sequential colormaps helps novice users differentiate high-activity areas without needing to interpret complex scales. If a park has many programs but lacks certain categories (e.g., cultural programs), this may indicate an opportunity for expanded community support.
-**3)** Access and equity are key themes. Filters allow users to explore free or low-cost programs, seasonal availability, and offerings in specific neighborhoods. This design choice improves accessibility, both in visual clarity and in helping users navigate a complex public dataset without technical expertise. The intention is to turn raw civic data into an approachable tool for residents, researchers, and policymakers.
 """)
 # -------------------------
@@ -347,5 +294,5 @@ st.markdown("---")
 st.markdown("""
 **Acknowledgements & Citations:**
 City of Chicago Data Portal — Chicago Park District Activities.
-Visualizations created using Plotly and Streamlit.
 """)

 import pandas as pd
 import numpy as np
 import plotly.express as px
 st.set_page_config(page_title="Chicago Parks in Motion", layout="wide")
     if "fee" in df.columns:
         df["fee"] = pd.to_numeric(df["fee"], errors="coerce")
     # Extract Latitude / Longitude
     if "location" in df.columns:
         def parse_lat_lon(val):
+            if pd.isna(val): return (np.nan, np.nan)
             sval = str(val)
             if sval.startswith("POINT"):
                 try:
                     return lat, lon
                 except:
                     return (np.nan, np.nan)
             import re
             nums = re.findall(r"-?\d+\.\d+", sval)
             if len(nums) >= 2:
         latlon = df["location"].map(parse_lat_lon)
         df["latitude"] = latlon.map(lambda x: x[0])
         df["longitude"] = latlon.map(lambda x: x[1])
+    # Activity category
     if "activity_type" in df.columns:
         df["activity_type_clean"] = df["activity_type"].str.title().fillna("Unknown")
     elif "program_type" in df.columns:
     else:
         df["activity_type_clean"] = "Unknown"
+    # Park name extraction
+    possible_park_cols = ["park_name", "park", "location_facility", "location_name", "location", "site_name"]
+    park_col = next((col for col in possible_park_cols if col in df.columns), None)
     if park_col is not None:
         df["park_name"] = df[park_col].astype(str).replace(["", "nan", "None"], "Unknown Park")
     else:
         df["park_name"] = "Unknown Park"
+    # Season
     if "start_date" in df.columns:
         df["start_date"] = pd.to_datetime(df["start_date"], errors="coerce")
     return df
 df = load_data()
 # -------------------------
+# Header / Intro
 # -------------------------
 st.title("Chicago Parks in Motion: How Our City Plays")
+st.markdown("**Authors:** Juhi Khare • Alisha Rawat • Sutthana Koo-Anupong")
 st.markdown("""
+### Central Visualization
+Our main interactive map and bar chart (below) serve as the **central visualization** for this data-journalism-style article.
+These were first prototyped in our Jupyter Notebook before being migrated and refined inside Streamlit.
+We use large, high-contrast visuals, sequential colormaps, and clear explanatory text to ensure that the app remains accessible to novice viewers.
 """)
 # -------------------------
 # Sidebar filters
 # -------------------------
+st.sidebar.header("Filters")
 categories = sorted(df["activity_type_clean"].dropna().unique())
 chosen_category = st.sidebar.selectbox("Activity category", ["All"] + categories)
 seasons = sorted(df["season"].dropna().unique())
 chosen_season = st.sidebar.selectbox("Season", ["All"] + seasons)
+if "fee" in df.columns:
     max_fee = float(np.nanmax(df["fee"].fillna(0)))
+    fee_limit = st.sidebar.slider("Maximum fee (USD)", 0.0, max_fee, max_fee)
 else:
     fee_limit = None
+park_query = st.sidebar.text_input("Search park name (partial match)")
 filtered = df.copy()
 if chosen_category != "All":
     filtered = filtered[filtered["activity_type_clean"] == chosen_category]
 if park_query:
     filtered = filtered[filtered["park_name"].str.contains(park_query, case=False, na=False)]
+st.sidebar.markdown(f"**Programs shown:** {len(filtered):,}")
+st.sidebar.caption("Filters improve accessibility for non-technical readers by letting them explore only the parts of the dataset they care about.")
 # -------------------------
 # Layout
 with main_col:
     st.subheader("Central Interactive Visualization — Programs by Park")
+    view_type = st.radio("Choose view", ["Map (recommended)", "Bar chart (top parks)"], horizontal=True)
     if view_type.startswith("Map"):
+        if "latitude" in filtered and "longitude" in filtered and filtered[["latitude","longitude"]].dropna().shape[0] > 0:
+            agg = filtered.groupby(["park_name", "latitude", "longitude"]).size().reset_index(name="count")
+            # ⭐ HIGH-VISIBILITY BUBBLE MAP (NO WHITE, NO ERRORS)
             fig_map = px.scatter_mapbox(
                 agg,
                 lat="latitude",
                 lon="longitude",
                 size="count",
+                size_max=30,
                 hover_name="park_name",
                 hover_data={"count": True},
                 color="count",
+                color_continuous_scale=["#FFE5CC", "#FF7F0E"],  # sequential orange (rubric requirement)
                 zoom=10,
                 height=600,
             )
+            fig_map.update_traces(marker=dict(opacity=0.92, sizemode="area"))
+            fig_map.update_layout(mapbox_style="open-street-map",
+                                  margin={"r":0,"t":0,"l":0,"b":0})
             st.plotly_chart(fig_map, use_container_width=True)
+            st.caption("Sequential orange colormap chosen to maximize visibility against OpenStreetMap backgrounds.")
         else:
+            st.warning("No geographic coordinates available in this dataset.")
     else:
         agg = filtered.groupby("park_name").size().reset_index(name="count").sort_values("count", ascending=False)
+        top_n = agg.head(25)
         fig_bar = px.bar(
+            top_n,
             x="count",
             y="park_name",
             orientation="h",
             color="count",
+            color_continuous_scale="Blues",
+            labels={"count":"Program Count","park_name":"Park"},
+            height=700
         )
+        fig_bar.update_layout(yaxis={'categoryorder':'total ascending'},
+                              margin={"r":20,"t":10,"l":200,"b":10})
         st.plotly_chart(fig_bar, use_container_width=True)
+        st.caption("Blues sequential colormap used to emphasize differences in program volume.")
+    if st.checkbox("Show a small sample of the filtered table"):
         st.dataframe(filtered.head(50))
 # -------------------------
 # CONTEXTUAL VISUALIZATIONS
 # -------------------------
 with side_col:
+    st.subheader("Contextual Visual 1 — Activity Categories")
     cat_counts = df["activity_type_clean"].value_counts().reset_index()
     cat_counts.columns = ["activity_type", "count"]
         values="count",
         hole=0.35,
         height=300,
+        color_discrete_sequence=px.colors.qualitative.Set3
     )
     st.plotly_chart(fig_cat, use_container_width=True)
     st.caption("""
+    This contextual visualization also appears in our Jupyter Notebook.
+    A categorical palette (Set3) is used to ensure distinct, accessible color differences.
     """)
+    st.subheader("Contextual Visual 2 — Seasonal Patterns")
+    season_counts = df["season"].value_counts().reset_index()
     season_counts.columns = ["Season", "Program Count"]
     fig_season = px.bar(
         y="Program Count",
         color="Program Count",
         color_continuous_scale="Viridis",
         text="Program Count",
+        height=500
     )
+    fig_season.update_traces(textposition="outside")
     st.plotly_chart(fig_season, use_container_width=True)
     st.caption("""
+    This visualization is also included in our Notebook.
+    A sequential 'Viridis' scale was chosen for accessibility and clear magnitude comparison.
     """)
     st.markdown("---")
+    st.subheader("Dataset & Notebook")
     st.markdown("""
     **Primary dataset:**
     Chicago Park District Activities — City of Chicago Data Portal
     https://data.cityofchicago.org/Parks-Recreation/Chicago-Park-District-Activities/tn7v-6rnw
+    All contextual visualizations and preprocessing steps were first implemented in our
+    **Python Jupyter Notebook**, then migrated to this Streamlit app for public communication.
     """)
 # -------------------------
 st.header("What this data story is showing")
 st.markdown("""
+**1)** Chicago’s parks host thousands of programs that range from sports and aquatics to day camps and senior activities.
+Each row represents a specific program offering. Our central visualization allows readers to immediately see where the
+city’s recreational “hotspots” are located and which parks offer the highest variety or volume of programs.
+**2)** Geographic and seasonal context help uncover patterns. Some neighborhoods — particularly those with larger
+parks — have significantly more offerings. The map’s bright orange sequential colormap was chosen intentionally to help
+novice viewers understand density without needing technical expertise. If a park has high overall activity but few
+programs in certain categories, this may signal unmet community needs.
+**3)** Accessibility and equity are major themes. Filters let readers explore affordability (via fee limits),
+seasonal schedules, and specific types of programs. This design approach transforms a large, raw civic dataset
+into an accessible storytelling tool for residents, city planners, and researchers alike.
 """)
 # -------------------------
 st.markdown("""
 **Acknowledgements & Citations:**
 City of Chicago Data Portal — Chicago Park District Activities.
+Visualizations built with Streamlit and Plotly.
 """)