File size: 5,386 Bytes
9cbd365
 
 
 
984393f
9cbd365
 
 
984393f
9cbd365
 
984393f
9cbd365
 
 
 
984393f
9cbd365
 
 
984393f
9cbd365
 
984393f
9cbd365
 
984393f
cee59aa
6870269
 
017948d
cee59aa
8f5dc0d
78d34ff
 
e482537
8f5dc0d
 
e482537
 
984393f
6870269
 
e84181a
6870269
78d34ff
6870269
 
 
 
acf8067
679ec6d
9cbd365
fec4e3b
cee59aa
 
 
 
 
9cbd365
 
cee59aa
9cbd365
e134081
9cbd365
e134081
9cbd365
 
 
e134081
 
 
 
 
 
 
4dd61cb
9cbd365
 
 
 
e134081
 
9cbd365
 
 
 
 
 
cee59aa
 
 
 
 
9cbd365
cee59aa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import streamlit as st
import pandas as pd
import altair as alt

# Loading  the dataset
url = 'https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/licenses_fall2022.csv'
df = pd.read_csv(url)

# Dropping unnecessary columns to prevent serialization errors and reduce data size
df.drop(columns=['Title', 'Prefix', 'Suffix', 'BusinessDBA', '_id', 'Delegated Controlled Substance Schedule', 'Case Number'], inplace=True, errors='ignore')

# Converting date columns to datetime format, handling errors
date_columns = ['Original Issue Date', 'Effective Date', 'Expiration Date', 'LastModifiedDate']
for col in date_columns:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Converting object columns to category for performance improvement
for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].astype('category')

# Dropping rows with missing values in essential columns to ensure Arrow compatibility
df.dropna(subset=['License Type', 'License Status', 'Original Issue Date'], inplace=True)

# Adding Year columns for visualizations
df['Original Issue Year'] = df['Original Issue Date'].dt.year

# # Visualization 1: Bar chart of licenses by Status ( Top 5)
st.subheader("Licenses by Status")
category_counts = df['License Status'].value_counts().reset_index()
category_counts.columns = ['License Status', 'Count']

# Selecting top 5 categories and sort in descending order
category_counts_top5 = category_counts.head(5).sort_values(by='Count', ascending=False)

if not category_counts_top5.empty:
    chart1 = alt.Chart(category_counts_top5).mark_bar().encode(
        x=alt.X(field="License Status", type="nominal", title="License Status", 
                sort='-y'),  # Sorting by count in descending order
        y=alt.Y(field="Count", type="quantitative", title="Number of Licenses"),
        color=alt.Color(field="License Status", type="nominal", legend=alt.Legend(title="License Status"),
                        scale=alt.Scale(scheme='pastel1')),  
        tooltip=['License Status', 'Count']
    ).properties(
        width=600,
        height=400,
        title="Licenses by Status (Top 5)"
    )
    st.altair_chart(chart1)
else:
    st.write("No data available for the 'License Status' plot.")


# Write-up for Visualization 1
st.write("""
**Licenses by Status**: I chose this visualization as it provides a clear overview of the distribution of licenses across various statuses, presenting the top 5 most common statuses. By showcasing these statuses in a descending order within a bar chart, the visual layout makes it easy to grasp which statuses are most prevalent.

I chose this color coding, applied to each bar, to enhance visual distinction, allowing users to quickly differentiate between the statuses and comprehend the overall distribution at a glance. Sorting the types in descending order emphasizes the larger categories, making comparisons straightforward and helping users identify prominent trends.

If I had more time, I would have enhanced the visualisation by adding interactivity, such as the ability to filter by variables like date range, license type, or location, would provide additional layers of insight. Users could then focus on specific subsets of data, enhancing the depth and relevance of the analysis for diverse scenarios or decision-making needs.
""")


# Visualization 2: Line chart showing licenses issued over time

st.subheader("Licenses Over Time")

time_data = df.groupby('Original Issue Year').size().reset_index(name='License Count')

if not time_data.empty:
    chart2 = alt.Chart(time_data).mark_line(point=True).encode(
        x=alt.X('Original Issue Year:O', title='Year'),  # Ordinal scale for better control
        y=alt.Y('License Count:Q', title='Number of Licenses'),
        tooltip=[
            alt.Tooltip('Original Issue Year:O', title='Year'),
            alt.Tooltip('License Count:Q', title='Number of Licenses')
        ],
        color=alt.value('brown')
    ).properties(
        width=600,
        height=400,
        title="Number of Licenses Issued Over Time"
    ).interactive()  # Enables zooming and panning

    st.altair_chart(chart2)
else:
    st.write("No data available for the 'Licenses Over Time' plot.")

# Write-up for Visualization 2
st.write("""
**Licenses Over Time**: I chose this visualization as this line chart illustrates trends in the number of licenses issued on an annual basis, providing a clear view of how issuance rates have changed over time. By tracking these trends year over year, this visualization helps identify potential patterns, growth periods, or downturns in license issuance.

The line chart format is particularly useful here, as it highlights gradual changes and variations across years, making it easy to spot peaks, dips, and other shifts at a glance. Consistent use of color enhances readability, enabling users to quickly identify specific years or periods of interest.

This annual overview serves as a solid foundation for deeper analysis. If I had more time , I would do analysis on a granular level by breaking down the data by month or quarter which would reveal more granular trends. For example, monthly or seasonal fluctuations in license issuance might be linked to industry cycles or external factors. Examining these more specific patterns could yield valuable insights for planning and decision-making.
""")