File size: 5,507 Bytes
d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 d868c3f 9d36b10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | ---
language: en
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- bert
- student-checkins
- roadblock-detection
- nlp
- active-learning
- education
- classification
datasets:
- custom
metrics:
- accuracy
- f1
- precision
- recall
---
# π§ Roadblock Classification Model (v2)
## π Overview
The **Roadblock Classification Model (v2)** is a fine-tuned transformer-based model built on BERT to classify student check-ins into two categories:
- **ROADBLOCK** β The student cannot move forward
- **NOT_ROADBLOCK** β The student is still making progress
This model is designed to understand **semantic meaning**, not just keywords, enabling it to differentiate between **difficulty** and **true blockage**.
---
## π§ Motivation
### β Problem with Version 1
The first version of this model attempted to classify:
- struggles
- confusion
- being stuck
**all under one label**
This created a major issue:
> The model could not distinguish between **temporary difficulty** and **actual inability to proceed**
---
### π₯ Why Version 2 Was Created
Version 2 was developed to **separate definitions clearly**:
| Concept | Meaning |
|--------|--------|
| **Struggle** | The student is experiencing difficulty |
| **Roadblock** | The student cannot move forward |
---
### π₯ Key Insight
> Not all struggles are roadblocks.
Example:
| Check-in | Correct Label |
|--------|--------------|
| "I had problems but made progress" | NOT_ROADBLOCK |
| "I can't fix my code and I'm stuck" | ROADBLOCK |
---
## βοΈ Model Architecture
- Base Model: `bert-base-uncased`
- Task: Binary Classification
- Framework: Hugging Face Transformers
- Training Environment: Google Colab (GPU)
---
## π Dataset Design
The dataset was **synthetically generated and refined iteratively** to ensure:
### β
Semantic Accuracy
- Focus on meaning, not keywords
### β
Balanced Classes
- ROADBLOCK vs NOT_ROADBLOCK distribution controlled
### β
Language Diversity
- Includes:
- formal phrasing
- informal/slang expressions
- varied sentence structures
---
## π¨ Bias Identification and Correction
### π Initial Problem
Early versions of the dataset showed **strong keyword bias**, such as:
- `"problem"` β always NOT_ROADBLOCK
- `"can't"` β always ROADBLOCK
- `"stuck"` β always ROADBLOCK
---
### β οΈ Why This Was Dangerous
The model learned:
> β keyword β label
instead of
> β
meaning β label
This caused incorrect predictions in real-world scenarios.
---
### π§ Bias Mitigation Strategy
To eliminate bias, the dataset was redesigned to include:
#### 1. Keyword Symmetry
Each keyword appears in **both labels**:
| Keyword | ROADBLOCK | NOT_ROADBLOCK |
|--------|----------|---------------|
| "problem" | βοΈ | βοΈ |
| "can't" | βοΈ | βοΈ |
| "stuck" | βοΈ | βοΈ |
---
#### 2. Contrastive Examples
Pairs of sentences with similar wording but different meanings:
- "I can't fix it and I'm stuck" β ROADBLOCK
- "I can't fix it yet but I'm making progress" β NOT_ROADBLOCK
---
#### 3. Pattern Diversity
Avoided over-reliance on patterns like:
- `"but"` β NOT_ROADBLOCK
Instead included:
- "and I fixed it"
- "and it's working now"
- "and I solved it"
---
### β
Result
The model now learns:
> **progress vs no progress**
instead of relying on surface-level patterns.
---
## π§ͺ Model Evaluation
The model was tested on:
### 1. Clean Synthetic Data
- Achieved near-perfect validation scores (expected due to dataset similarity)
### 2. Edge Cases
- Handled ambiguous phrasing correctly
### 3. Realistic Language
Test examples:
| Input | Prediction |
|------|-----------|
| "lowkey stuck but I think I got it" | NOT_ROADBLOCK |
| "this bug annoying but I fixed it" | NOT_ROADBLOCK |
| "ngl I can't get this working" | ROADBLOCK |
| "still stuck idk what to do" | ROADBLOCK |
---
### β οΈ Observed Limitation
Minor generalization gap:
- "I was confused but it's working now" β incorrectly predicted ROADBLOCK
---
### π§ Fix Approach
Instead of regenerating the dataset:
> Add targeted examples to cover missing language patterns
---
## π Active Learning Strategy
This model is designed to serve as a **base model for active learning**.
---
### π₯ Active Learning Workflow
1. Model predicts on real check-ins
2. Identify incorrect predictions
3. Collect high-value error samples
4. Add corrected examples to dataset
5. Retrain model
---
### π₯ Key Principle
> High-confidence errors are more valuable than random samples
---
### π― Goal
Continuously improve the model using **real-world feedback**, not just synthetic data.
---
## π Future Improvements
- Integrate real Slack check-in data
- Expand dataset with informal and noisy text
- Add confidence-based filtering for active learning
- Combine with a **Struggle Detection Model** for multi-signal analysis
---
## π§ Final Insight
This model represents a shift from:
> β pattern-based classification
to
> β
meaning-based understanding
---
## π― Conclusion
The Roadblock Classification Model (v2):
- Correctly distinguishes **difficulty vs blockage**
- Handles diverse language patterns
- Minimizes keyword bias
- Serves as a strong foundation for **active learning systems**
---
> π₯ This is not just a model β it is a continuously improving system. |