Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| import nannyml as nml | |
| def get_data(): | |
| reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset() | |
| return reference_df, analysis_df, analysis_target_df | |
| st.title('Is your model degrading?') | |
| st.caption('### :violet[_Estimate_] the performance of an ML model. :violet[_Without ground truth_].') | |
| st.markdown(""" | |
| If you have been previously exposed to concepts like [covariate shift or concept drift](https://www.nannyml.com/blog/types-of-data-shift), | |
| you may be aware that changes in the distribution of | |
| the production data can affect the model's performance. | |
| """) | |
| st.markdown("""Recently a paper from MIT, Harvard, and other institutions showed how [91% of their ML models | |
| experiments degraded](https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time) in time.""") | |
| st.markdown("""Typically, we need access to ground truth to know if a model is degrading. | |
| But most of the time, getting new labeled data is expensive, time-consuming, or impossible. | |
| So we end up blindless without knowing how the model performs in production. | |
| """) | |
| st.markdown(""" | |
| To overcome this issue, we at NannyML created two methods to :violet[_estimate_] the performance of ML models without needing access to | |
| new labeled data. In this demo, we show the **Confidence-based Performance Estimation (CBPE)** method, specially designed to estimate | |
| the performance of **classification** models. | |
| """) | |
| reference_df, analysis_df, analysis_target_df = get_data() | |
| st.markdown("#### The prediction task") | |
| st.markdown(""" | |
| A model was trained to predict whether or not a person will repay their car loan. The model used features like: | |
| car_value, salary_range, loan_lenght, etc. | |
| """) | |
| st.dataframe(analysis_df.head(3)) | |
| st.markdown(""" | |
| We know that the model had a **Test F1-Score of: 0.943**. But what guarantees us that the F1-Score | |
| will continue to be good on production data? | |
| """) | |
| st.markdown("#### Estimating the Model Performance") | |
| st.markdown(""" | |
| Instead of waiting for ground truth, we can use NannyML's | |
| [CBPE](https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html) | |
| method to estimate the performance of an ML model. | |
| CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities. | |
| Once the probabilities are calibrated, it can estimate any performance metric that can be computed from the confusion matrix elements. | |
| """) | |
| chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500) | |
| metric = st.selectbox( | |
| 'Performance Metric', | |
| ('f1', 'roc_auc', 'precision', 'recall', 'specificity', 'accuracy')) | |
| plot_realized_performance = st.checkbox('Compare NannyML estimation with actual outcomes') | |
| if st.button('**_Estimate_ Performance**'): | |
| with st.spinner('Running...'): | |
| estimator = nml.CBPE( | |
| y_pred_proba='y_pred_proba', | |
| y_pred='y_pred', | |
| y_true='repaid', | |
| timestamp_column_name='timestamp', | |
| metrics=[metric], | |
| chunk_size=chunk_size, | |
| problem_type='classification_binary' | |
| ) | |
| estimator.fit(reference_df) | |
| estimated_performance = estimator.estimate(analysis_df) | |
| if plot_realized_performance: | |
| analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True) | |
| calculator = nml.PerformanceCalculator( | |
| y_pred_proba='y_pred_proba', | |
| y_pred='y_pred', | |
| y_true='repaid', | |
| timestamp_column_name='timestamp', | |
| metrics=[metric], | |
| chunk_size=chunk_size, | |
| problem_type='classification_binary' | |
| ) | |
| calculator.fit(reference_df) | |
| realized_performance = calculator.calculate(analysis_with_targets) | |
| st.plotly_chart(estimated_performance.compare(realized_performance).plot(), use_container_width=False) | |
| else: | |
| st.plotly_chart(estimated_performance.plot(), use_container_width=False) | |
| st.divider() | |
| st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML.""") | |
| st.markdown(""" | |
| NannyML is an open-source library for post-deployment data science. Leave us a 🌟 on [GitHub](https://github.com/NannyML/nannyml) | |
| or [check our docs](https://nannyml.readthedocs.io/en/stable/landing_page.html) to learn more. | |
| """) |