Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| st.set_page_config(page_title="Model Evaluation Metrics", page_icon="π", layout="wide") | |
| st.sidebar.title("π Model Evaluation Metrics") | |
| st.sidebar.markdown("Select a metric category from below.") | |
| st.markdown("<h1 style='text-align: center;'>π Model Evaluation Metrics</h1>", unsafe_allow_html=True) | |
| metric_type = st.radio( | |
| "Select the type of model evaluation:", | |
| ["π― Classification Metrics", "π Regression Metrics"] | |
| ) | |
| if metric_type == "π― Classification Metrics": | |
| st.markdown("## π― Classification Metrics") | |
| st.write("Used when the target variable is **categorical**.") | |
| st.markdown("### 1. Accuracy") | |
| st.write(""" | |
| - **Definition**: Correct predictions out of total predictions | |
| - **Formula**: | |
| Accuracy = (TP + TN) / (TP + FP + FN + TN) | |
| - β οΈ Avoid using when classes are imbalanced. | |
| """) | |
| st.markdown("### 2. Confusion Matrix") | |
| st.write(""" | |
| A matrix that compares actual and predicted labels. | |
| Useful for understanding **true positives**, **false positives**, **true negatives**, and **false negatives**. | |
| | | Predicted Positive | Predicted Negative | | |
| |---------------|--------------------|--------------------| | |
| | Actual Positive | True Positive (TP) | False Negative (FN) | | |
| | Actual Negative | False Positive (FP) | True Negative (TN) | | |
| - Use for binary and multiclass classification. | |
| """) | |
| st.markdown("### 3. Precision") | |
| st.latex(r"Precision = \frac{TP}{TP + FP}") | |
| st.write("Of all predicted positives, how many were correct.") | |
| st.markdown("### 4. Recall (Sensitivity)") | |
| st.latex(r"Recall = \frac{TP}{TP + FN}") | |
| st.write("Of all actual positives, how many were correctly identified.") | |
| st.markdown("### 5. F1 Score") | |
| st.latex(r"F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}") | |
| st.write("Harmonic mean of precision and recall. Good for imbalanced classes.") | |
| st.markdown("### 6. Specificity (True Negative Rate)") | |
| st.latex(r"Specificity = \frac{TN}{TN + FP}") | |
| st.write("Measures how well the model identifies negatives.") | |
| st.markdown("### 7. ROC Curve and AUC") | |
| st.write(""" | |
| - **ROC Curve**: Plot of True Positive Rate (Recall) vs False Positive Rate | |
| - **AUC** (Area Under the Curve): Measures model's ability to distinguish classes. | |
| - AUC = 1: Perfect | |
| - AUC = 0.5: Random | |
| """) | |
| st.markdown("### 8. Log Loss (Logarithmic Loss)") | |
| st.latex(r"LogLoss = -\frac{1}{n} \sum \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]") | |
| st.write(""" | |
| - Evaluates predicted probabilities instead of just labels | |
| - Lower log loss indicates better performance | |
| - Especially useful for probabilistic models | |
| """) | |
| elif metric_type == "π Regression Metrics": | |
| st.markdown("## π Regression Metrics") | |
| st.write("Used when the target variable is **continuous**.") | |
| st.markdown("### 1. Mean Absolute Error (MAE)") | |
| st.latex(r"MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|") | |
| st.write("Measures average absolute difference between actual and predicted values. More robust to outliers.") | |
| st.markdown("### 2. Mean Squared Error (MSE)") | |
| st.latex(r"MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2") | |
| st.write("Penalizes large errors more than MAE. Sensitive to outliers.") | |
| st.markdown("### 3. Root Mean Squared Error (RMSE)") | |
| st.latex(r"RMSE = \sqrt{MSE}") | |
| st.write("Square root of MSE. Easy to interpret since it has same units as output.") | |
| st.markdown("### 4. RΒ² Score (Coefficient of Determination)") | |
| st.latex(r"R^2 = 1 - \frac{SS_{res}}{SS_{tot}}") | |
| st.write(""" | |
| Indicates how well model explains variation in data: | |
| - **1.0** β perfect | |
| - **0.0** β same as predicting mean | |
| - **< 0** β worse than mean | |
| """) | |
| st.markdown("### 5. Adjusted RΒ² Score") | |
| st.latex(r"\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right)") | |
| st.write(""" | |
| - Adjusts RΒ² for number of predictors (k) | |
| - Prevents overestimating performance from adding irrelevant features | |
| """) | |
| st.markdown("### 6. Mean Absolute Percentage Error (MAPE)") | |
| st.latex(r"MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|") | |
| st.write("Expresses error as a percentage. Avoid if actual values can be 0.") | |
| st.markdown("### 7. Median Absolute Error") | |
| st.write("Robust metric not influenced by outliers. Takes the median of all absolute differences.") | |
| st.markdown("---") | |
| st.markdown("### β Choosing the Right Metric") | |
| st.write(""" | |
| - **Classification**: | |
| - Use **F1-score** for imbalanced data. | |
| - Use **AUC-ROC** for probabilistic classifiers. | |
| - Use **Log Loss** if working with predicted probabilities. | |
| - **Regression**: | |
| - Use **RMSE** when large errors are more serious. | |
| - Use **MAE** when all errors matter equally. | |
| - Use **RΒ²** to evaluate explained variance. | |
| - Always compare with a **baseline model**. | |
| """) | |
| st.success("Choosing the right metric helps you evaluate and improve your model with confidence!") | |