geralto
/

codet-classy

Text Classification

Model card Files Files and versions

geralto commited on Jun 5, 2025

Commit

2db6db3

·

verified ·

1 Parent(s): 4b58642

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -34,6 +34,12 @@ It fine-tunes the `Salesforce/codet5-base` model for classifying student queries
 - **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
 - **Note**: Dataset does not include AI responses — only the student queries.
 ### Per-Category F1 Scores

 - **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
 - **Note**: Dataset does not include AI responses — only the student queries.
+## Challenges
+- **Class imbalance**: e.g., “General Question” is much more frequent.
+- **Field-based hints**: Some classes have unique fields (like `input task description`), inadvertently helping classification.
+- **Token length**: Some queries, especially with code snippets, can be very long, hitting transformer limits.
+- **Structural inconsistency**: Dataset descriptions sometimes did not match actual data.
 ### Per-Category F1 Scores