Spaces:
Sleeping
Sleeping
| ''' | |
| Phase 1 Graded Challenge 5 | |
| Nama : Achmad Dhani | |
| Batch : HCK-009 | |
| Objective : Creating a page specific to for the main app and able to navigate and run the other pages. | |
| ''' | |
| import streamlit as st | |
| import eda | |
| import model | |
| # page | |
| page = st.sidebar.selectbox(label='Select Page:', options=['Home Page', 'Exploration Data Analysis', 'Prediction Model']) | |
| if page == 'Home Page': # if function for specific pages | |
| st.header('Home Page') | |
| st.write('') | |
| st.write('Phase 1 Graded Challenge 5') | |
| st.write('Nama : Achmad Dhani') | |
| st.write('Batch : HCK-009') | |
| st.write('Dataset: `bigquery-public-data.ml_datasets.credit_card_default`') | |
| st.write('Objective : Explore and analyze credit card dataset to determine and create a model that able to predict default payments of a customer.') | |
| st.write('') | |
| st.caption('Please pick the options in the Select Page Box located on the left of the screen to start!') | |
| st.write('') | |
| st.write('') | |
| with st.expander("Background Information"): | |
| st.caption('The dataset used `bigquery-public-data.ml_datasets.credit_card_default` is a public dataset that consist of data related to credit cards records of clients. It has 2965 entries with 24 columns. Google Big Query was used to extract the data set. ') | |
| with st.expander("Work Flow"): # work flow | |
| st.caption( | |
| ''' | |
| - Loading the data and check information regarding the data set | |
| - EDA on the dataset which involve cleaning and analyzing the data further to gain insights | |
| - Feature Engineering to get the features and target for modelling | |
| - Defining Models using pipeline | |
| - Training the model | |
| - Evaluate the models which consists hypertuning to get the best model with the best result | |
| - Saving the model | |
| - Conclusion | |
| - Deployment in Hugging Face | |
| ''' | |
| ) | |
| with st.expander("Conclusion"): # conclusion | |
| st.caption( | |
| ''' | |
| The dataset is well mantained and clean but there is a big imbalance data when used in the modelling for predicting default payments. EDA gives insight regarding the demographic of clients with defauly payments and without. Over 79% clients doesn't have default payments and only 21% have. Male clients have higher percentage to have default compared to female clients by 1.8%. Clients that have graduated is less likely to have default payment compared to university and high school. EDA also show's most clients are in their 30's followed up with 20's and lastly above 40. From this demographic, clients that is in their 30's is less likely to default compared to 20's and 40's. The EDA insights can be useful for the credit card marketing regarding their default payment demographic. The classification model used are logistic regression, SVC and KNN where SVC has the best score out of the rest. Although it has the best, the score is not high. A good fit model, able to capture 55% of all actual class 1 instance (Have Default Payments) and 51% on the test. The reason why the score could be low is due to the imbalance data and insufficient data to train hence is not able to capture the distribution well. Undersampling did help the train data but didn't improve the test score which becomes overfit hence i didn't ended up using it. My suggestions for the future would be, with more balance data and the model to be tweaked again, it might create a model that has higher score | |
| ''' | |
| ) | |
| elif page == 'Exploration Data Analysis': # changing page | |
| eda.run() | |
| else: | |
| model.run() | |