File size: 1,963 Bytes
5c4b1b4
 
b791e31
 
 
 
 
 
5c4b1b4
 
 
 
 
fea465d
5c4b1b4
 
 
fea465d
b791e31
5c4b1b4
 
fea465d
 
b791e31
5c4b1b4
 
 
 
fea465d
 
4b58642
 
 
 
 
 
2db6db3
 
 
 
 
 
4b58642
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
library_name: transformers
datasets:
- majeedkazemi/students-coding-questions-from-ai-assistant
language:
- en
base_model:
- Salesforce/codet5-base
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
Vilnius University Deep Neural Networks course project.


## Model Details
A transformer-based query classification model.


### Model Description
This model was developed as part of a Deep Neural Networks (DNN) course project at Vilnius University. 
It fine-tunes the `Salesforce/codet5-base` model for classifying student queries related to C programming into five categories: **General Question**, **Question from Code**, **Help Fix Code**, **Help Write Code**, and **Explain Code**.


<!-- Provide a longer summary of what this model is. -->


- **Developed by:** Brigita Bruškytė, Artiom Hovhannisyan, Eglė Orinaitė  
  Faculty of Mathematics and Informatics, Vilnius University

## Dataset
- **Size**: 6,776 student queries from a real C programming course.
- **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`.
- **Note**: Dataset does not include AI responses — only the student queries.

## Challenges
- **Class imbalance**: e.g., “General Question” is much more frequent.  
- **Field-based hints**: Some classes have unique fields (like `input task description`), inadvertently helping classification.  
- **Token length**: Some queries, especially with code snippets, can be very long, hitting transformer limits.  
- **Structural inconsistency**: Dataset descriptions sometimes did not match actual data.


### Per-Category F1 Scores

| Category             | Codet-classy |
|----------------------|------------|
| Explain Code         | 0.90   |
| General Question     | 0.97   |
| Help Fix Code        | 0.85   |
| Help Write Code      | 0.63   |
| Question from Code   | 0.89   |