is445_demo / app.py
namdini's picture
uploading changes to template for HW 5
e281d63
import streamlit as st
import pandas as pd
import altair as alt
st.title('IS 445 Homework 5: Visualization with Streamlit')
st.text("The URL for this app is: https://huggingface.co/spaces/namdini/is445_demo")
source = "https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/licenses_fall2022.csv"
license_df = pd.read_csv(source)
# First visualization: License Status Distribution
license_status = license_df['License Status'].value_counts().reset_index()
license_status.columns = ['License Status', 'Count']
license_status = license_status.sort_values(by='Count', ascending=False)
bar_plot = alt.Chart(license_status).mark_bar().encode(
x = alt.X('License Status:N', title='License Status', sort='-y'),
y = alt.Y('Count:Q', title='License Count'),
color=alt.Color('License Status:N'),
).properties(title = alt.TitleParams(text="1. License Status Distribution", fontSize=30), width=550, height=300)
st.altair_chart(bar_plot, theme="streamlit", use_container_width=True)
st.text("""
This bar plot displays the distribution of license statuses. The x-axis was
originally in alphabetical order, but has been reorganized by count to
provide users with a more intuitive visualization, highlighting the
most common license statuses first. It shows how many licenses are active,
not renewed, cancelled, and so on, providing a clear overview of the
current state of various licenses. I also made the font size bigger for
each plot title for better readability. If I had more time, I would've
get rid of the ellipsis in some statuses to provide the full status
name for the users.
""")
# Second visualization: Issued License Over Time by License Type
license_df["Issue Year"] = pd.to_datetime(license_df['Original Issue Date'], errors='coerce').dt.year
yearly_license_count = license_df.dropna(subset=['Issue Year'])
yearly_license_count = yearly_license_count.groupby(['Issue Year', 'License Type']).size().reset_index(name='Count')
top5_license_types = yearly_license_count.groupby('License Type')['Count'].sum().nlargest(3).index.tolist()
yearly_license_count['Top3'] = yearly_license_count['License Type'].isin(top5_license_types).replace({True: 'Top 3', False: 'Other'})
line_plot = alt.Chart(yearly_license_count).mark_line().encode(
x = alt.X('Issue Year:O', title='Year of Issue'),
y = alt.Y('Count:Q', title='License Issued Count'),
color=alt.Color('License Type:N', title = "License Type"),
size=alt.Size('Top3:N', scale=alt.Scale(domain=['Top 3', 'Other'], range=[3, 1]))
).properties(title = alt.TitleParams(text="2. Issued License Over Time by License Type", fontSize=30), width=700, height=400)
st.altair_chart(line_plot, theme="streamlit", use_container_width=True)
st.text("""
This line plot shows the trends in license issuance for all license types
over time. Each line represents a different license type, allowing users to
see how the issuance of each type has changed over the years. The top 3 license
types are represented with thicker lines to help users quickly identify the most
significant trends. This visualization helps identify which license types have
increased in popularity and which have seen declines. If I had more time, it
would be great to have a slider widget that allows users to select a specific
time range to focus on certain periods of time.
""")