Update README.md
Browse files
README.md
CHANGED
|
@@ -34,6 +34,12 @@ It fine-tunes the `Salesforce/codet5-base` model for classifying student queries
|
|
| 34 |
- **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
|
| 35 |
- **Note**: Dataset does not include AI responses — only the student queries.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
### Per-Category F1 Scores
|
| 39 |
|
|
|
|
| 34 |
- **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
|
| 35 |
- **Note**: Dataset does not include AI responses — only the student queries.
|
| 36 |
|
| 37 |
+
## Challenges
|
| 38 |
+
- **Class imbalance**: e.g., “General Question” is much more frequent.
|
| 39 |
+
- **Field-based hints**: Some classes have unique fields (like `input task description`), inadvertently helping classification.
|
| 40 |
+
- **Token length**: Some queries, especially with code snippets, can be very long, hitting transformer limits.
|
| 41 |
+
- **Structural inconsistency**: Dataset descriptions sometimes did not match actual data.
|
| 42 |
+
|
| 43 |
|
| 44 |
### Per-Category F1 Scores
|
| 45 |
|