DataViz / app.py
SmeetPatel's picture
explanation and future improvements
e261135 verified
import streamlit as st
import pandas as pd
import altair as alt
# Title for the application
st.title("Streamlit App for IS445: Building Inventory Data Visualization")
# Informing the user about the app URL
st.text("The URL for this app is: https://huggingface.co/spaces/SmeetPatel/building_inventory_viz")
# Load the dataset
# The dataset contains information about various buildings, including location, usage, year constructed, and size.
data_url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/building_inventory.csv"
data = pd.read_csv(data_url)
# Visualization 1: Distribution of Building Usage
# This bar chart visualizes the number of buildings categorized by their primary usage type.
# Each bar represents a usage category, and the length of the bar corresponds to the number of buildings.
usage_bar = (
alt.Chart(data)
.mark_bar()
.encode(
x=alt.X("count():Q", title="Number of Buildings"),
y=alt.Y("Usage Description:N", sort="-x", title="Building Usage"),
color=alt.Color("Usage Description:N", legend=None),
tooltip=["Usage Description", "count()"]
)
.properties(
title="Building Usage Distribution",
width=550,
height=300
)
)
# Explanation for Visualization 1
st.subheader("Visualization 1: Distribution of Building Usage")
st.altair_chart(usage_bar, use_container_width=True)
st.text("""
This bar chart shows the distribution of buildings based on their primary usage type. The following design choices were made:
- Horizontal Layout: Horizontal bars were used for better readability, especially since Usage Description values are text-heavy. This layout prevents truncation and allows easier comparison.
- Sorting the categories by count, in descending order, ensures that the most common usage types are immediately visible, aiding prioritization of insights.
- Each bar is uniquely colored for visual distinction.
Future Improvements:
- Adding filters based on Region, Bldg Status, or other columns would allow users to view the distribution of building usage for specific subsets of data.
- It can also incorporate a feature to drill down into specific categories (e.g., Usage Description) and see detailed statistics for subcategories.
""")
# Visualization 2: Relationship Between Year Constructed and Square Footage
# This scatter plot explores the relationship between the year a building was constructed and its square footage.
# Points are color-coded based on the building's operational status, with tooltips providing additional details.
# Remove rows where 'Year Constructed' is missing (NaN)
# Visualization 2: Relationship Between Year Constructed and Square Footage
# This scatter plot explores the relationship between the year a building was constructed and its square footage.
# Remove rows where 'Square Footage' is missing or invalid
data['Square Footage'] = pd.to_numeric(data['Square Footage'], errors='coerce')
filtered_data = data.dropna(subset=['Square Footage'])
# Visualization 2: Relationship Between Square Footage and Total Floors
# This scatter plot explores the relationship between the square footage of a building and its total floors.
scatter_plot = (
alt.Chart(filtered_data)
.mark_circle(size=60)
.encode(
x=alt.X(
"Square Footage:Q",
title="Square Footage",
scale=alt.Scale(domain=[filtered_data['Square Footage'].min(), filtered_data['Square Footage'].max()])
),
y=alt.Y("Total Floors:Q", title="Total Floors"),
color=alt.Color("Bldg Status:N", title="Building Status"),
tooltip=["Agency Name", "Location Name", "Square Footage", "Total Floors", "Bldg Status"]
)
.properties(
title="Relationship Between Square Footage and Total Floors",
width=550,
height=300
)
)
# Explanation for Visualization 2
st.subheader("Visualization 2: Square Footage vs. Total Floors")
st.altair_chart(scatter_plot, use_container_width=True)
st.text("""
This scatter plot highlights the relationship between the square footage of a building and its total floors.
Design choices include:
- Scatter Plot Representation: A scatter plot was chosen as it is ideal for exploring relationships between two continuous variables (Square Footage and Total Floors). This allows patterns or clusters to be identified, such as whether larger buildings tend to have more floors.
- Color Coding by Building Status: Points were color-coded by Bldg Status to differentiate operational and non-operational buildings. This helps identify trends within specific categories of buildings.
- Dynamic Domain: The x-axis (Square Footage) and y-axis (Total Floors) were scaled dynamically to accommodate the full range of data, ensuring no points were excluded.
- Interactive Tooltips: Tooltips were added to provide detailed information about each point, such as the building’s name, location, size, and total floors. This makes the visualization interactive and detailed.
- Point Size: A fixed point size ensures clarity while maintaining focus on relationships between variables without visual clutter.
Future Improvements:
- Adding a Third Variable: Introduce point size encoding to represent a third variable, such as Floors Above Grade, for richer insights.
- Filters for Building Usage or Region: Enable filtering by Usage Description or Region to analyze specific subsets of data, like residential buildings in a particular county.
- Trend Line or Clustering: Add a regression line or clustering to help identify trends or group similar buildings based on size and floors.
- Zoom and Pan Functionality: Incorporating zoom and pan features would improve navigation for datasets with wide ranges in square footage.
""")