--- library_name: transformers datasets: - majeedkazemi/students-coding-questions-from-ai-assistant language: - en base_model: - Salesforce/codet5-base --- # Model Card for Model ID Vilnius University Deep Neural Networks course project. ## Model Details A transformer-based query classification model. ### Model Description This model was developed as part of a Deep Neural Networks (DNN) course project at Vilnius University. It fine-tunes the `Salesforce/codet5-base` model for classifying student queries related to C programming into five categories: **General Question**, **Question from Code**, **Help Fix Code**, **Help Write Code**, and **Explain Code**. - **Developed by:** Brigita Bruškytė, Artiom Hovhannisyan, Eglė Orinaitė Faculty of Mathematics and Informatics, Vilnius University ## Dataset - **Size**: 6,776 student queries from a real C programming course. - **Structure**: JSON entries with `user_id`, `time`, `feature type`, `feature version`, `input question`, `input code`, `input intention`, `input task description`. - **Note**: Dataset does not include AI responses — only the student queries. ## Challenges - **Class imbalance**: e.g., “General Question” is much more frequent. - **Field-based hints**: Some classes have unique fields (like `input task description`), inadvertently helping classification. - **Token length**: Some queries, especially with code snippets, can be very long, hitting transformer limits. - **Structural inconsistency**: Dataset descriptions sometimes did not match actual data. ### Per-Category F1 Scores | Category | Codet-classy | |----------------------|------------| | Explain Code | 0.90 | | General Question | 0.97 | | Help Fix Code | 0.85 | | Help Write Code | 0.63 | | Question from Code | 0.89 |