File size: 5,163 Bytes
71188bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import streamlit as st

st.set_page_config(page_title="Model Evaluation Metrics", page_icon="πŸ“Š", layout="wide")

st.sidebar.title("πŸ“Š Model Evaluation Metrics")
st.sidebar.markdown("Select a metric category from below.")

st.markdown("<h1 style='text-align: center;'>πŸ“ Model Evaluation Metrics</h1>", unsafe_allow_html=True)

metric_type = st.radio(
    "Select the type of model evaluation:",
    ["🎯 Classification Metrics", "πŸ“ˆ Regression Metrics"]
)

if metric_type == "🎯 Classification Metrics":
    st.markdown("## 🎯 Classification Metrics")
    st.write("Used when the target variable is **categorical**.")

    st.markdown("### 1. Accuracy")
    st.write("""
    - **Definition**: Correct predictions out of total predictions  
    - **Formula**:  
      Accuracy = (TP + TN) / (TP + FP + FN + TN)  
    - ⚠️ Avoid using when classes are imbalanced.
    """)

    st.markdown("### 2. Confusion Matrix")
    st.write("""
    A matrix that compares actual and predicted labels.  
    Useful for understanding **true positives**, **false positives**, **true negatives**, and **false negatives**.
    
    |               | Predicted Positive | Predicted Negative |
    |---------------|--------------------|--------------------|
    | Actual Positive | True Positive (TP) | False Negative (FN) |
    | Actual Negative | False Positive (FP) | True Negative (TN) |
    
    - Use for binary and multiclass classification.
    """)

    st.markdown("### 3. Precision")
    st.latex(r"Precision = \frac{TP}{TP + FP}")
    st.write("Of all predicted positives, how many were correct.")

    st.markdown("### 4. Recall (Sensitivity)")
    st.latex(r"Recall = \frac{TP}{TP + FN}")
    st.write("Of all actual positives, how many were correctly identified.")

    st.markdown("### 5. F1 Score")
    st.latex(r"F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}")
    st.write("Harmonic mean of precision and recall. Good for imbalanced classes.")

    st.markdown("### 6. Specificity (True Negative Rate)")
    st.latex(r"Specificity = \frac{TN}{TN + FP}")
    st.write("Measures how well the model identifies negatives.")

    st.markdown("### 7. ROC Curve and AUC")
    st.write("""
    - **ROC Curve**: Plot of True Positive Rate (Recall) vs False Positive Rate  
    - **AUC** (Area Under the Curve): Measures model's ability to distinguish classes.  
      - AUC = 1: Perfect  
      - AUC = 0.5: Random
    """)

    st.markdown("### 8. Log Loss (Logarithmic Loss)")
    st.latex(r"LogLoss = -\frac{1}{n} \sum \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]")
    st.write("""
    - Evaluates predicted probabilities instead of just labels  
    - Lower log loss indicates better performance  
    - Especially useful for probabilistic models
    """)

elif metric_type == "πŸ“ˆ Regression Metrics":
    st.markdown("## πŸ“ˆ Regression Metrics")
    st.write("Used when the target variable is **continuous**.")

    st.markdown("### 1. Mean Absolute Error (MAE)")
    st.latex(r"MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|")
    st.write("Measures average absolute difference between actual and predicted values. More robust to outliers.")

    st.markdown("### 2. Mean Squared Error (MSE)")
    st.latex(r"MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2")
    st.write("Penalizes large errors more than MAE. Sensitive to outliers.")

    st.markdown("### 3. Root Mean Squared Error (RMSE)")
    st.latex(r"RMSE = \sqrt{MSE}")
    st.write("Square root of MSE. Easy to interpret since it has same units as output.")

    st.markdown("### 4. RΒ² Score (Coefficient of Determination)")
    st.latex(r"R^2 = 1 - \frac{SS_{res}}{SS_{tot}}")
    st.write("""
    Indicates how well model explains variation in data:  
    - **1.0** β†’ perfect  
    - **0.0** β†’ same as predicting mean  
    - **< 0** β†’ worse than mean
    """)

    st.markdown("### 5. Adjusted RΒ² Score")
    st.latex(r"\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right)")
    st.write("""
    - Adjusts RΒ² for number of predictors (k)  
    - Prevents overestimating performance from adding irrelevant features
    """)

    st.markdown("### 6. Mean Absolute Percentage Error (MAPE)")
    st.latex(r"MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|")
    st.write("Expresses error as a percentage. Avoid if actual values can be 0.")

    st.markdown("### 7. Median Absolute Error")
    st.write("Robust metric not influenced by outliers. Takes the median of all absolute differences.")

st.markdown("---")
st.markdown("### βœ… Choosing the Right Metric")
st.write("""
- **Classification**:
    - Use **F1-score** for imbalanced data.
    - Use **AUC-ROC** for probabilistic classifiers.
    - Use **Log Loss** if working with predicted probabilities.
- **Regression**:
    - Use **RMSE** when large errors are more serious.
    - Use **MAE** when all errors matter equally.
    - Use **RΒ²** to evaluate explained variance.
- Always compare with a **baseline model**.
""")

st.success("Choosing the right metric helps you evaluate and improve your model with confidence!")