Spaces:

JanhaviZarapkar
/

Homework5.1

Sleeping

App Files Files Community

Homework5.1 / app.py

JanhaviZarapkar

Final Commit

f12e548 verified about 1 year ago

raw

history blame contribute delete

9.91 kB

	import pandas as pd
	import streamlit as st
	import altair as alt


	import streamlit.components.v1 as components
	st.set_page_config(page_title="Building Inventory Analysis", layout="wide")

	components.html(
	"""
	<script>
	document.querySelector('iframe').style.height = '100vh';
	</script>
	""",
	height=0,
	)


	st.markdown(
	"""
	<style>
	html, body, [data-testid="stAppViewContainer"] {
	height: 100vh;
	overflow: hidden;
	}
	</style>
	""",
	unsafe_allow_html=True,
	)


	# Load and clean dataset
	url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/building_inventory.csv"
	df = pd.read_csv(url, na_values={'Year Acquired': 0, 'Year Constructed': 0, 'Square Footage': 0})


	# Displaying the Dataset Overview
	st.header("Building Inventory Dataset Analysis")
	st.write("Below are the first 10 rows of the dataset:")
	st.write(df.head(10))
	st.write(f"The shape of dataset before cleaning is: {df.shape}")
	#Drop irrelevant columns
	columns_to_drop = [
	'Rep Full Name', 'Senator Full Name', 'Usage Description 3',
	'Usage Description 2', 'Congressional Full Name', 'Address'
	]
	df = df.drop(columns=columns_to_drop)

	#Check and handle missing values
	missing_values = df.isnull().sum()
	st.subheader("Missing Values Before Cleaning")
	st.write(missing_values)

	# Drop rows where 'Year Acquired' or 'Year Constructed' is NaN
	df = df.dropna(subset=['Year Acquired', 'Year Constructed'])

	df['County'] = df['County'].fillna('Unknown')
	df['Square Footage'] = df['Square Footage'].fillna(df['Square Footage'].mean())

	st.subheader("Missing Values After Cleaning")
	st.write(df.isnull().sum())
	st.write(f"The shape of dataset after cleaning is: {df.shape}")


	# Visualization 1: Number of Buildings by County and Agency

	st.markdown("""
	<h4>Visualization 1: Number of Buildings by County and Agency</h4>
	""", unsafe_allow_html=True)

	# Group the data by 'County' and 'Agency Name' to get the count of buildings
	county_agency_building = df.groupby(['County', 'Agency Name']).size().reset_index(name='Number of Buildings')



	# Create a stacked bar chart with adjusted legend properties
	stacked_bar_chart_county = (
	alt.Chart(county_agency_building)
	.mark_bar()
	.encode(
	alt.X('County:N', title='County', sort='-y'),
	alt.Y(
	'Number of Buildings:Q',
	title='Number of Buildings',
	scale=alt.Scale(domain=[0, 550]),
	),
	color =
	alt.Color(
	'Agency Name:N',
	title='Agency Name',
	legend=alt.Legend(orient='right', padding=0, symbolSize=50,labelFontSize=10,labelOverlap="greedy", columnPadding=0,rowPadding=0)
	),
	tooltip=['County', 'Agency Name', 'Number of Buildings']
	).properties(
	width=800,
	height=500,
	title="Stacked Bar Graph for Number of Buildings by County and Agency"
	))

	# Display the stacked bar chart in Streamlit
	st.altair_chart(stacked_bar_chart_county,theme="streamlit", use_container_width=True)

	# Write-up for Stacked Bar Chart


	st.markdown("""


	Number of Buildings by County and Agency

	For this visualization, I wanted to highlight the distribution of buildings across different counties and agencies. The primary goal was to show how buildings are spread out by agency in each county, allowing for a clear comparison of agency activity within a county.

	I chose a stacked bar chart for this visualization because it effectively conveys the breakdown of buildings by agency in each county. This chart type allows viewers to easily compare the total number of buildings in each county while also seeing the proportion of buildings allocated to each agency within that county. For the x-axis, I used the counties, as they are the main categories for comparison, while the y-axis represents the total number of buildings. I employed the color encoding to differentiate between agencies, using distinct colors to make it easier for users to distinguish between them. The tooltips provide additional context by displaying detailed numbers when hovering over the bars.

	If I had more time, I would focus on adding interactivity by implementing filters so users can select specific counties or agencies they are interested in. This would allow them to drill down into the data and explore individual trends. Additionally, it would be valuable to show proportions in the tooltip to provide more insights into the relative size of each agency's buildings within a county. This would enhance the user experience by making the data more dynamic and interactive.

	""", unsafe_allow_html=True)



	# Visualization 2: Bubble Chart for County, Total Floors, and Square Footage


	st.markdown("""
	<h4 >Visualization 2: County, Total Floors, and Square Footage</h4>
	""", unsafe_allow_html=True)

	bubble_chart = alt.Chart(df).mark_circle().encode(
	x=alt.X('County:N', title='County'),
	y=alt.Y('sum(Square Footage):Q', title='Total Square Footage (sq ft)'),
	size=alt.Size('sum(Total Floors):Q', title='Total Floors'),
	color=alt.Color('County:N', scale=alt.Scale(scheme='category20'), title='County'),
	tooltip=['County', 'sum(Square Footage)', 'sum(Total Floors)']
	).properties(
	width=800,
	height=500,
	title="Relationship Between County, Square Footage, and Total Floors"
	)

	st.altair_chart(bubble_chart,theme="streamlit", use_container_width=True)


	# Write-up for Bubble Chart
	st.markdown("""


	County, Total Floors, and Square Footage Relationship

	In this bubble chart, I aimed to highlight the relationship between three key features: the total square footage of buildings, the number of floors, and the counties where these buildings are located. The purpose of this chart is to show how different counties compare in terms of building size and the number of floors. By plotting total square footage on the y-axis and using the size of the bubbles to represent the number of floors, I wanted to illustrate how building size correlates with the number of floors across various counties.

	For the design choices, I chose a bubble chart because it allows for the visualization of three variables at once, making it ideal for this type of data. The x-axis represents the counties, while the y-axis shows the total square footage of buildings. The size of each bubble is used to represent the total number of floors, providing a quick visual reference for the size of each building. I applied the "category20" color scheme to the counties, ensuring that each county is distinctly represented by a unique color, which helps differentiate them easily. The tooltips are included to provide detailed information when hovering over each bubble, allowing users to quickly access the specific county, total square footage, and total floors of the buildings.

	If I had more time, I would focus on adding interactive filters that would allow users to filter by specific counties or range of floors, making the chart more customizable and user-friendly. Another improvement could be the addition of a scroll bar to the legend since there are a large number of counties, and currently, not all of them are visible at once, which could limit the user's ability to differentiate them easily.


	""", unsafe_allow_html=True)


	# Visualization 3: Heatmap for Building Count by County and Status


	st.markdown("""
	<h4 >Visualization 3: Building Count by County and Status</h4>
	""", unsafe_allow_html=True)

	heatmap = alt.Chart(df).mark_rect().encode(
	x=alt.X('County:N', title='County'),
	y=alt.Y('Bldg Status:N', title='Building Status'),
	color=alt.Color('count():Q', scale=alt.Scale(scheme='blues'), title='Count of Buildings'),
	tooltip=['County', 'Bldg Status', 'count()']
	).properties(
	width=800,
	height=500,
	title="Heatmap of Building Count by County and Status"
	)

	st.altair_chart(heatmap, theme="streamlit", use_container_width=True)



	# Write-up for Heatmap
	st.markdown("""

	Building Count by County and Status

	In this heatmap, I aimed to highlight the distribution of building counts across counties and their statuses, such as whether they are in use, in progress, or abandoned. The goal was to help users quickly see which counties have high building activity and how building statuses are spread out across those counties.

	For the design choices, I selected a heatmap because it effectively visualizes data across two categorical variables: counties and building statuses. The x-axis represents the counties, while the y-axis shows the different building statuses (e.g., In use, Progress, Abandon). To make the differences in building counts more visually distinguishable, I chose the "blues" color scheme, which provides a gradient where darker shades represent higher counts of buildings. This allows users to easily spot areas with high building activity, while maintaining an accessible color scheme. Tooltips were added to display exact counts of buildings in each county and status combination, offering users more detailed insights when they hover over the heatmap cells.

	If I had more time, I would enhance the interactivity of the heatmap by allowing users to zoom into specific counties or statuses, enabling them to focus on areas of interest. Additionally, I would incorporate a filter to select specific counties, as the map contains a large number of them, and not all are immediately visible. This would allow users to dive deeper into particular regions for a more customized analysis.


	""", unsafe_allow_html=True)


	st.markdown("""
	Thank you for exploring the Building Inventory Analysis with me!
	""", unsafe_allow_html=True)
	st.markdown("""
	~By Janhavi Tushar Zarapkar
	""", unsafe_allow_html=True)