qua605 commited on
Commit
610c352
·
1 Parent(s): b4291ef
Files changed (1) hide show
  1. p2.py +93 -0
p2.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import altair as alt
4
+
5
+ original_data = pd.read_csv('https://huggingface.co/spaces/fallinginfall65/final_project/resolve/main/2015_salary_reporting.csv')
6
+
7
+ original_data["Pay Difference"] = original_data["2016 Budgeted Salary"] - original_data["2015 Total Pay"]
8
+
9
+ numeric_data = original_data.select_dtypes(include=['float64', 'int64'])
10
+ top_20_jobs = (
11
+ original_data.groupby("Current Job Title")[numeric_data.columns]
12
+ .mean()
13
+ .sort_values("2015 Pay", ascending=False)
14
+ .head(20)
15
+ .reset_index()
16
+ )
17
+
18
+ data = original_data[original_data["Current Job Title"].isin(top_20_jobs["Current Job Title"])]
19
+
20
+ st.title("2015 Salary Dashboard")
21
+ st.subheader("Description")
22
+ st.text(
23
+ """
24
+ The size of the data is small and I will be uploading the dataset to HuggingFace to host the data. The
25
+ dashboard that I had created can be used by selecting the bars on the left and it will shows
26
+ the specific department that the user is interested in. The data is filtered with the top 20 "2015 Pay"
27
+ that is in the dataset. There are some same "Job Title" and identical "Department", so some of the
28
+ department would have multiple data points when a specific department is selected. The purpose of the
29
+ visualization is to show the average total pay in 2015 for each department and compare their changes in
30
+ wages in 2016.
31
+ """
32
+ )
33
+ st.subheader("Vizualization")
34
+ department_selection = alt.selection_point(fields=["Department Location"])
35
+
36
+ bar_chart = alt.Chart(data).mark_bar().encode(
37
+ x=alt.X("mean(2015 Total Pay):Q", title="Average 2015 Total Pay"),
38
+ y=alt.Y("Department Location:N", title="Department Location", sort="-x"),
39
+ color=alt.condition(
40
+ department_selection,
41
+ alt.Color("Department Location:N", title="Department"),
42
+ alt.value("lightgray") # De-emphasized color for unselected
43
+ ),
44
+ tooltip=[
45
+ alt.Tooltip("Department Location:N", title="Department"),
46
+ alt.Tooltip("mean(2015 Total Pay):Q", title="Average Total Pay"),
47
+ ]
48
+ ).add_params(
49
+ department_selection
50
+ ).properties(
51
+ title="Average 2015 Total Pay by Department Location",
52
+ width=500,
53
+ height=400
54
+ )
55
+
56
+ scatter_chart = alt.Chart(data).mark_circle(size=100).encode(
57
+ x=alt.X("Current Job Title:N", title="Job Title"),
58
+ y=alt.Y("Pay Difference:Q", title="Pay Difference"),
59
+ color=alt.Color("Department Location:N", title="Department"),
60
+ tooltip=[
61
+ alt.Tooltip("First Name:N", title="First Name"),
62
+ alt.Tooltip("Last Name:N", title="Last Name"),
63
+ alt.Tooltip("Pay Difference:Q", title="Pay Difference"),
64
+ alt.Tooltip("Department Location:N", title="Department"),
65
+ alt.Tooltip("Current Job Title:N", title="Job Title"),
66
+ ]
67
+ ).transform_filter(
68
+ department_selection
69
+ ).properties(
70
+ title="Pay Difference(Filtered by Department)",
71
+ width=500,
72
+ height=400
73
+ )
74
+ combined_chart = bar_chart | scatter_chart
75
+
76
+ st.altair_chart(combined_chart)
77
+
78
+ st.subheader("Contextual Dataset")
79
+ st.markdown(
80
+ """
81
+ The contextual dataset that I included are from data.illinois.gov where the structure of the dataset is almost the same but it is from 2014.
82
+ It could be helpful to check if the 2015 budget salary matches the 2015 pay. Also, it would be interesting to also analyze the difference in 2014 pay and 2015 pay.
83
+ https://data.illinois.gov/dataset/e30b5cb2-c1e8-428c-ae64-546498276690/resource/0a6f537a-a233-48f5-9967-34e54e2eaa79/download/2014_salary_repo
84
+ """
85
+ )
86
+ st.subheader("Write Up")
87
+ st.markdown(
88
+ """
89
+ The dashboard contains 2 graphs side-by-side where the one on the left serves as the driver and the one on the right is the driven graph. Users could simply click on the bars on the bar graph to see a more detailed version of scatter plot. When selecting a single bar, the scatter
90
+ plot will only show data points from that department. The reason why some of the department has more than 1 data points is that they are all in the top twenty 2015 pay group. The scatter plot sorted the job title on the X-axis where the same job title would be in the same column.
91
+ Each department is seperated by colors so that it is more clearly to see when no specific bar is selected. Click on the empty space in the bar graph would reset the selection. Another feature of the scatter plot is that when user hover over a specific data point in the scatter plot, it will show a even more detail description about that data point.
92
+ """
93
+ )