Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import pandas as pd | |
| import altair as alt | |
| original_data = pd.read_csv('https://huggingface.co/spaces/fallinginfall65/final_project/resolve/main/2015_salary_reporting.csv') | |
| original_data["Pay Difference"] = original_data["2016 Budgeted Salary"] - original_data["2015 Total Pay"] | |
| numeric_data = original_data.select_dtypes(include=['float64', 'int64']) | |
| top_20_jobs = ( | |
| original_data.groupby("Current Job Title")[numeric_data.columns] | |
| .mean() | |
| .sort_values("2015 Pay", ascending=False) | |
| .head(20) | |
| .reset_index() | |
| ) | |
| data = original_data[original_data["Current Job Title"].isin(top_20_jobs["Current Job Title"])] | |
| st.title("2015 Salary Dashboard") | |
| st.subheader("Description") | |
| st.text( | |
| """ | |
| The size of the data is small and I will be uploading the dataset to HuggingFace to host the data. The | |
| dashboard that I had created can be used by selecting the bars on the left and it will shows | |
| the specific department that the user is interested in. The data is filtered with the top 20 "2015 Pay" | |
| that is in the dataset. There are some same "Job Title" and identical "Department", so some of the | |
| department would have multiple data points when a specific department is selected. The purpose of the | |
| visualization is to show the average total pay in 2015 for each department and compare their changes in | |
| wages in 2016. | |
| """ | |
| ) | |
| st.subheader("Vizualization") | |
| department_selection = alt.selection_point(fields=["Department Location"]) | |
| bar_chart = alt.Chart(data).mark_bar().encode( | |
| x=alt.X("mean(2015 Total Pay):Q", title="Average 2015 Total Pay"), | |
| y=alt.Y("Department Location:N", title="Department Location", sort="-x"), | |
| color=alt.condition( | |
| department_selection, | |
| alt.Color("Department Location:N", title="Department"), | |
| alt.value("lightgray") # De-emphasized color for unselected | |
| ), | |
| tooltip=[ | |
| alt.Tooltip("Department Location:N", title="Department"), | |
| alt.Tooltip("mean(2015 Total Pay):Q", title="Average Total Pay"), | |
| ] | |
| ).add_params( | |
| department_selection | |
| ).properties( | |
| title="Average 2015 Total Pay by Department Location", | |
| width=500, | |
| height=400 | |
| ) | |
| scatter_chart = alt.Chart(data).mark_circle(size=100).encode( | |
| x=alt.X("Current Job Title:N", title="Job Title"), | |
| y=alt.Y("Pay Difference:Q", title="Pay Difference"), | |
| color=alt.Color("Department Location:N", title="Department"), | |
| tooltip=[ | |
| alt.Tooltip("First Name:N", title="First Name"), | |
| alt.Tooltip("Last Name:N", title="Last Name"), | |
| alt.Tooltip("Pay Difference:Q", title="Pay Difference"), | |
| alt.Tooltip("Department Location:N", title="Department"), | |
| alt.Tooltip("Current Job Title:N", title="Job Title"), | |
| ] | |
| ).transform_filter( | |
| department_selection | |
| ).properties( | |
| title="Pay Difference(Filtered by Department)", | |
| width=500, | |
| height=400 | |
| ) | |
| combined_chart = bar_chart | scatter_chart | |
| st.altair_chart(combined_chart) | |
| st.subheader("Contextual Dataset") | |
| st.markdown( | |
| """ | |
| The contextual dataset that I included are from data.illinois.gov where the structure of the dataset is almost the same but it is from 2014. | |
| It could be helpful to check if the 2015 budget salary matches the 2015 pay. Also, it would be interesting to also analyze the difference in 2014 pay and 2015 pay. | |
| https://data.illinois.gov/dataset/e30b5cb2-c1e8-428c-ae64-546498276690/resource/0a6f537a-a233-48f5-9967-34e54e2eaa79/download/2014_salary_repo | |
| """ | |
| ) | |
| st.subheader("Write Up") | |
| st.markdown( | |
| """ | |
| The dashboard contains 2 graphs side-by-side where the one on the left serves as the driver and the one on the right is the driven graph. Users could simply click on the bars on the bar graph to see a more detailed version of scatter plot. When selecting a single bar, the scatter | |
| plot will only show data points from that department. The reason why some of the department has more than 1 data points is that they are all in the top twenty 2015 pay group. The scatter plot sorted the job title on the X-axis where the same job title would be in the same column. | |
| Each department is seperated by colors so that it is more clearly to see when no specific bar is selected. Click on the empty space in the bar graph would reset the selection. Another feature of the scatter plot is that when user hover over a specific data point in the scatter plot, it will show a even more detail description about that data point. | |
| """ | |
| ) |