import pandas as pd
import altair as alt
import streamlit as st
url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/building_inventory.csv"
data = pd.read_csv(url)
# Handling missing values
data['County'].fillna('Unknown', inplace=True)
data['Rep Full Name'].fillna('Not Available', inplace=True)
data['Year Constructed'].fillna(data['Year Constructed'].median(), inplace=True)
data['Senator Full Name'].fillna('Unknown', inplace=True)
data['Usage Description 2'].fillna(data['Usage Description 2'].mode()[0], inplace=True)
data['Usage Description 3'].fillna(data['Usage Description 3'].mode()[0], inplace=True)
data['Address'].fillna('Address Not Available', inplace=True)
data['Congressional Full Name'].fillna('Unknown Congressional Name', inplace=True)
st.markdown("
IS 445 HOMEWORK 5.1
", unsafe_allow_html=True)
st.markdown("Data Visualizations for Building Inventory Dataset
", unsafe_allow_html=True)
st.markdown("1. Distribution of Building Usage Types
", unsafe_allow_html=True)
usage_counts = data['Usage Description'].value_counts().reset_index()
usage_counts.columns = ['Usage Description', 'Count']
bar_chart = alt.Chart(usage_counts).mark_bar().encode(
x=alt.X('Count:Q', title='Number of Buildings'),
y=alt.Y('Usage Description:N', sort='-x', title='Building Usage'),
color=alt.Color('Usage Description:N', legend=None)
).properties(
width=600,
height=400,
title="Distribution of Building Usage"
)
st.altair_chart(bar_chart, use_container_width=True)
st.write("""
In this visualization, I have plotted the distribution of various building usage types. The bar chart shows how many buildings belong to each usage category. I used horizontal bars for better readability, especially for longer labels in the 'Usage Description' column. The x-axis shows the number of buildings, and the y-axis shows the different building usage categories. The distinct colors for each category allow easy differentiation between them without overwhelming the viewer.
If I had more time, I would add interactive features like tooltips that display the percentage of buildings per usage type when hovering over a bar. Additionally, I would consider adding sorting functionality for the bars to sort the data by count or alphabetically by usage type.
""")
st.markdown("2. Exploring Square Footage vs. Total Floors with Usage Types
", unsafe_allow_html=True)
bubble_chart = alt.Chart(data).mark_circle().encode(
x=alt.X('Square Footage:Q', title='Square Footage'),
y=alt.Y('Total Floors:Q', title='Total Floors'),
size=alt.Size('Total Floors:Q', legend=None, scale=alt.Scale(range=[50, 500])),
color=alt.Color('Usage Description:N', title='Building Usage', legend=alt.Legend(orient='top', symbolType='circle')),
tooltip=['Location Name', 'Square Footage', 'Total Floors', 'Usage Description']
).properties(
width=600,
height=400,
title="Bubble Chart of Square Footage vs. Total Floors with Usage Description"
).interactive()
st.altair_chart(bubble_chart, use_container_width=True)
st.write("""
This bubble chart shows the relationship between square footage and total floors in the buildings while adding a third dimension which is the size of the bubbles, which represents the total number of floors. The x-axis represents the square footage of each building, and the y-axis represents the number of floors. The bubble size increases with the number of floors, allowing us to quickly identify buildings with larger floor areas. I also used color encoding for the 'Usage Description' to group buildings by their usage types.
If I had more time, I would add features like a trendline to explore potential correlations further or include a filter option to allow users to zoom in on specific usage types or buildings of interest. Additionally, adjusting the bubble size scaling would help reduce overlapping bubbles in areas with high data density.
""")