geralto commited on
Commit
2db6db3
·
verified ·
1 Parent(s): 4b58642

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -34,6 +34,12 @@ It fine-tunes the `Salesforce/codet5-base` model for classifying student queries
34
  - **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
35
  - **Note**: Dataset does not include AI responses — only the student queries.
36
 
 
 
 
 
 
 
37
 
38
  ### Per-Category F1 Scores
39
 
 
34
  - **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
35
  - **Note**: Dataset does not include AI responses — only the student queries.
36
 
37
+ ## Challenges
38
+ - **Class imbalance**: e.g., “General Question” is much more frequent.
39
+ - **Field-based hints**: Some classes have unique fields (like `input task description`), inadvertently helping classification.
40
+ - **Token length**: Some queries, especially with code snippets, can be very long, hitting transformer limits.
41
+ - **Structural inconsistency**: Dataset descriptions sometimes did not match actual data.
42
+
43
 
44
  ### Per-Category F1 Scores
45