Regulatory Capacity Classifier

A BERT-based multi-label classifier for analyzing regulatory capacities in collaborative learning dialogues.

Model Description

Attribute	Value
Base Model	`bert-base-uncased`
Task	Multi-label Text Classification
Number of Labels	12
Training Strategy	Weighted BCEWithLogitsLoss for class imbalance
Language	English
Framework	PyTorch + HuggingFace Transformers

Model Performance

Overall Metrics (Validation Set, Threshold=0.5)

Metric	Score
F1-Micro	0.6554
F1-Macro	0.4675
Precision (Micro)	0.5600
Recall (Micro)	0.7800
Weighted Avg F1	0.6800

Per-Class Performance

Label	Precision	Recall	F1-Score	Support
Cog-Evaluate	0.78	0.77	0.77	104
Cog-Explain	0.25	0.27	0.26	22
Cog-Reason	0.51	0.85	0.64	47
Meta-Monitor	0.64	0.83	0.72	127
Meta-Orient	0.26	0.83	0.40	12
Meta-Plan	0.39	0.73	0.51	15
Meta-Reflect	0.08	0.50	0.13	2
SE-Express	0.55	0.69	0.61	35
SE-Regulate	0.25	0.62	0.36	8
TE-Act	0.27	1.00	0.43	3
TE-Report	0.69	0.89	0.78	72

Cross-Validation Results (5-Fold)

Fold	F1-Micro	F1-Macro	Precision	Recall
Fold 1	0.6418	0.4501	0.3781	0.5717
Fold 2	0.6310	0.5032	0.4677	0.5914
Fold 3	0.4904	0.3569	0.2593	0.7470
Fold 4	0.6769	0.5134	0.4460	0.6331
Fold 5	0.6660	0.5211	0.4420	0.6584
Mean	0.6212	0.4689	0.3986	0.6403
Std	±0.0754	±0.0685	±0.0848	±0.0687

Intended Use

This model is designed for analyzing collaborative learning dialogues to identify regulatory capacity categories:

Label Taxonomy

Category	Labels	Description
Cognitive (Cog-)	Evaluate, Explain, Generate, Reason	Cognitive processing and reasoning
Metacognitive (Meta-)	Monitor, Orient, Plan, Reflect	Self-regulation and monitoring
Socio-emotional (SE-)	Express, Regulate	Social and emotional expressions
Task Execution (TE-)	Act, Report	Task-related actions and reporting

Training Data

Attribute	Session 1	Session 2	Total
Total Samples	1,620	1,082	2,702
AI-assisted Groups	865	564	1,429
Teams Groups	755	518	1,273
Unique Groups	12	24	36
Avg Text Length	78.25 chars	65.36 chars	-
Avg Labels/Sample	1.37	1.83	-

Label Distribution (Session 1)

Label	Count	Percentage
Meta-Monitor	641	39.57%
Cog-Evaluate	448	27.65%
TE-Report	345	21.30%
Cog-Reason	249	15.37%
SE-Express	148	9.14%
Meta-Orient	108	6.67%
Meta-Plan	108	6.67%
Cog-Explain	93	5.74%
SE-Regulate	33	2.04%
TE-Act	26	1.60%
Meta-Reflect	22	1.36%

Usage

from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/regulatory-capacity-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Define labels
labels = [
    'Cog-Evaluate', 'Cog-Explain', 'Cog-Reason',
    'Meta-Monitor', 'Meta-Orient', 'Meta-Plan', 'Meta-Reflect',
    'SE-Express', 'SE-Regulate',
    'TE-Act', 'TE-Report'
]

# Inference function
def predict(text, threshold=0.5):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.sigmoid(outputs.logits)
        predictions = (probs > threshold).int()
    
    predicted_labels = [labels[i] for i in range(len(labels)) if predictions[0][i] == 1]
    confidence = {labels[i]: float(probs[0][i]) for i in range(len(labels))}
    
    return predicted_labels, confidence

# Example usage
text = "I think we should evaluate our approach before moving forward."
predicted, confidence = predict(text)
print(f"Predicted labels: {predicted}")
print(f"Confidence scores: {confidence}")

Training Procedure

Hyperparameters

Parameter	Value
Epochs	8
Batch Size	16
Learning Rate	3e-5
Warmup Steps	100
Max Sequence Length	128
Loss Function	Weighted BCEWithLogitsLoss
Optimizer	AdamW
Train/Val Split	80/20
Random Seed	42

Class Weights (Computed)

Weights computed using formula: pos_weight = (total_samples - positive_samples) / positive_samples

Label	Weight
Meta-Reflect	72.64
TE-Act	61.31
SE-Regulate	48.09
Cog-Explain	16.42
Meta-Plan	14.00
Meta-Orient	14.00
SE-Express	9.95
Cog-Reason	5.51
TE-Report	3.70
Cog-Evaluate	2.62
Meta-Monitor	1.53

Hardware

Device: Apple Silicon (MPS) / CUDA GPU
Training Time: ~10-15 minutes per epoch
Total FLOPs: 6.82 × 10^14

Limitations

Domain Specificity: Trained on collaborative learning dialogues; may not generalize to other dialogue types
Class Imbalance: Rare labels (Meta-Reflect, TE-Act) have lower prediction accuracy
Language: English only
Context Length: Maximum 128 tokens; longer texts are truncated
Session Shift: Performance may vary across different learning sessions due to distribution shift

Known Confusion Patterns

True Label	Often Confused With	Count
Cog-Evaluate	Cog-Reason	33
Cog-Evaluate	Meta-Monitor	18
Meta-Monitor	TE-Report	17
Meta-Monitor	Meta-Orient	16
Cog-Reason	Cog-Evaluate	14

Ethical Considerations

This model is intended for educational research purposes
Should not be used as the sole basis for evaluating student performance
Human review is recommended for high-stakes applications

Citation

@misc{regulatory-classifier-2026,
  title={Regulatory Capacity Classifier for Collaborative Learning Dialogues},
  author={Anonymous},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/your-username/regulatory-capacity-classifier},
  note={Multi-label BERT classifier trained on 2,702 annotated utterances}
}

Model Card Authors

Generated on 2026-01-24

Files Included

File	Description
`model.safetensors`	Model weights (SafeTensors format)
`config.json`	Model configuration
`vocab.txt`	BERT vocabulary
`tokenizer_config.json`	Tokenizer configuration
`special_tokens_map.json`	Special tokens mapping
`example_usage.py`	Usage example script

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

F1-Micro
self-reported

0.655
F1-Macro
self-reported

0.468
Precision (Micro)
self-reported

0.560
Recall (Micro)
self-reported

0.780