Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import pandas as pd | |
| import altair as alt | |
| # Loading the dataset | |
| url = 'https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/licenses_fall2022.csv' | |
| df = pd.read_csv(url) | |
| # Dropping unnecessary columns to prevent serialization errors and reduce data size | |
| df.drop(columns=['Title', 'Prefix', 'Suffix', 'BusinessDBA', '_id', 'Delegated Controlled Substance Schedule', 'Case Number'], inplace=True, errors='ignore') | |
| # Converting date columns to datetime format, handling errors | |
| date_columns = ['Original Issue Date', 'Effective Date', 'Expiration Date', 'LastModifiedDate'] | |
| for col in date_columns: | |
| df[col] = pd.to_datetime(df[col], errors='coerce') | |
| # Converting object columns to category for performance improvement | |
| for col in df.select_dtypes(include='object').columns: | |
| df[col] = df[col].astype('category') | |
| # Dropping rows with missing values in essential columns to ensure Arrow compatibility | |
| df.dropna(subset=['License Type', 'License Status', 'Original Issue Date'], inplace=True) | |
| # Adding Year columns for visualizations | |
| df['Original Issue Year'] = df['Original Issue Date'].dt.year | |
| # # Visualization 1: Bar chart of licenses by Status ( Top 5) | |
| st.subheader("Licenses by Status") | |
| category_counts = df['License Status'].value_counts().reset_index() | |
| category_counts.columns = ['License Status', 'Count'] | |
| # Selecting top 5 categories and sort in descending order | |
| category_counts_top5 = category_counts.head(5).sort_values(by='Count', ascending=False) | |
| if not category_counts_top5.empty: | |
| chart1 = alt.Chart(category_counts_top5).mark_bar().encode( | |
| x=alt.X(field="License Status", type="nominal", title="License Status", | |
| sort='-y'), # Sorting by count in descending order | |
| y=alt.Y(field="Count", type="quantitative", title="Number of Licenses"), | |
| color=alt.Color(field="License Status", type="nominal", legend=alt.Legend(title="License Status"), | |
| scale=alt.Scale(scheme='pastel1')), | |
| tooltip=['License Status', 'Count'] | |
| ).properties( | |
| width=600, | |
| height=400, | |
| title="Licenses by Status (Top 5)" | |
| ) | |
| st.altair_chart(chart1) | |
| else: | |
| st.write("No data available for the 'License Status' plot.") | |
| # Write-up for Visualization 1 | |
| st.write(""" | |
| **Licenses by Status**: I chose this visualization as it provides a clear overview of the distribution of licenses across various statuses, presenting the top 5 most common statuses. By showcasing these statuses in a descending order within a bar chart, the visual layout makes it easy to grasp which statuses are most prevalent. | |
| I chose this color coding, applied to each bar, to enhance visual distinction, allowing users to quickly differentiate between the statuses and comprehend the overall distribution at a glance. Sorting the types in descending order emphasizes the larger categories, making comparisons straightforward and helping users identify prominent trends. | |
| If I had more time, I would have enhanced the visualisation by adding interactivity, such as the ability to filter by variables like date range, license type, or location, would provide additional layers of insight. Users could then focus on specific subsets of data, enhancing the depth and relevance of the analysis for diverse scenarios or decision-making needs. | |
| """) | |
| # Visualization 2: Line chart showing licenses issued over time | |
| st.subheader("Licenses Over Time") | |
| time_data = df.groupby('Original Issue Year').size().reset_index(name='License Count') | |
| if not time_data.empty: | |
| chart2 = alt.Chart(time_data).mark_line(point=True).encode( | |
| x=alt.X('Original Issue Year:O', title='Year'), # Ordinal scale for better control | |
| y=alt.Y('License Count:Q', title='Number of Licenses'), | |
| tooltip=[ | |
| alt.Tooltip('Original Issue Year:O', title='Year'), | |
| alt.Tooltip('License Count:Q', title='Number of Licenses') | |
| ], | |
| color=alt.value('brown') | |
| ).properties( | |
| width=600, | |
| height=400, | |
| title="Number of Licenses Issued Over Time" | |
| ).interactive() # Enables zooming and panning | |
| st.altair_chart(chart2) | |
| else: | |
| st.write("No data available for the 'Licenses Over Time' plot.") | |
| # Write-up for Visualization 2 | |
| st.write(""" | |
| **Licenses Over Time**: I chose this visualization as this line chart illustrates trends in the number of licenses issued on an annual basis, providing a clear view of how issuance rates have changed over time. By tracking these trends year over year, this visualization helps identify potential patterns, growth periods, or downturns in license issuance. | |
| The line chart format is particularly useful here, as it highlights gradual changes and variations across years, making it easy to spot peaks, dips, and other shifts at a glance. Consistent use of color enhances readability, enabling users to quickly identify specific years or periods of interest. | |
| This annual overview serves as a solid foundation for deeper analysis. If I had more time , I would do analysis on a granular level by breaking down the data by month or quarter which would reveal more granular trends. For example, monthly or seasonal fluctuations in license issuance might be linked to industry cycles or external factors. Examining these more specific patterns could yield valuable insights for planning and decision-making. | |
| """) | |