# BERT-based Multi-label Cognitive Load Classifier This model is a fine-tuned `bert-base-uncased` transformer trained to classify **students' cognitive and psychological states** (e.g., cognitive load, confidence, anxiety) from naturalistic **human-AI educational dialogues** in K-12 settings. ## 🧠 What does the model do? The model performs **multi-label classification** on student-AI conversations, identifying whether a given interaction reflects one or more of the following cognitive and affective states: - Math Confidence / Math Anxiety - AI Confidence / AI Concerns - Intrinsic Cognitive Load - Extraneous Cognitive Load - Germane Cognitive Load Each input text (a single conversation) may correspond to **multiple labels simultaneously**. --- ## 📚 Training Data The model was trained on a custom dataset collected from a large-scale empirical study involving **160 K-12 students** interacting with an AI-powered teachable agent in a math learning platform (ALTER-Math, name anonymized for review). - **Dialogues**: 1,440 student-agent interactions over 10 days - **Labels**: Derived from pre- and post-questionnaires grounded in Cognitive Load Theory and affective constructs - **Label types**: Binary indicators (0/1) per psychological factor - **Preprocessing**: Tokenized using Hugging Face's `AutoTokenizer`, padded to max length of 128 --- ## 🏋️‍♂️ Training Setup - Model: `bert-base-uncased` - Task: Multi-label text classification - Loss: BCEWithLogitsLoss - Optimizer: AdamW - Batch Size: 16 - Epochs: 5 - Learning Rate: 1e-5 - Evaluation Strategy: Hold-out test set (20%) --- ## 🚀 Intended Use This model is designed to support **AI-based unobtrusive assessment of cognitive load** in education, enabling: - Researchers to monitor how students respond cognitively and emotionally to AI tutors - Developers to build more adaptive, trustworthy AI learning agents - Teachers to gain insight into student engagement and overload without invasive devices --- ## 📌 Limitations - The dataset size is modest (N=160), and model generalization to other domains or age groups is not guaranteed. - Labels are inferred from questionnaire-aligned criteria, which may include subjectivity. - The model does not currently handle out-of-distribution input or code-switching effectively.