ajfranck
/

COPUS-analysis

@@ -2,28 +2,100 @@
 license: mit
 language:
 - en
 base_model:
 - openbmb/MiniCPM-V-4_5
 pipeline_tag: visual-question-answering
 ---
 # COPUS Classifier
-Classifier head for COPUS action recognition, trained on MiniCPM-V-4_5 features.
 ## Usage
 ```python
 # Load classifier
-classifier = torch.load('classifier.pt')
-# Extract features from video
-features = extract_features(frames, temporal_ids)
-# Predict
 logits = classifier(features)
-predictions = torch.sigmoid(logits) > 0.5
 ```
-## Actions: 24
-Trained on: 2025-11-06

 license: mit
 language:
 - en
+tags:
+- video-classification
+- education
+- classroom-observation
+- copus
+- vision-language-model
 base_model:
 - openbmb/MiniCPM-V-4_5
 pipeline_tag: visual-question-answering
 ---
 # COPUS Classifier
+The system consists of a lightweight classifier head trained on top of the frozen MiniCPM-V-4.5 vision-language model. The base model remains unchanged during training, with only the classification layers being optimized.
+## COPUS Framework
+The model detects 24 classroom activities across two categories:
+**Student Actions (13 codes)**: L (Listening), Ind (Individual work), CG (Clicker groups), WG (Worksheet groups), OG (Other groups), AnQ (Answering questions), SQ (Asking questions), WC (Whole class discussion), Prd (Predictions), SP (Presentations), TQ (Test/Quiz), W (Waiting), O (Other)
+**Instructor Actions (11 codes)**: Lec (Lecturing), RtW (Real-time writing), FUp (Follow-up), PQ (Posing questions), CQ (Clicker questions), AnQ (Answering questions), MG (Moving/Guiding), 1o1 (One-on-one), D/V (Demo/Video), Adm (Administration), W (Waiting)
 ## Usage
 ```python
+import torch
+import torch.nn as nn
+from transformers import AutoModel, AutoTokenizer
+from PIL import Image
+from decord import VideoReader, cpu
+class COPUSClassifier(nn.Module):
+    def __init__(self, input_dim=4096, num_classes=24):
+        super().__init__()
+        self.classifier = nn.Sequential(
+            nn.Linear(input_dim, 1024),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+            nn.Linear(1024, 512),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(512, num_classes)
+        )
+    def forward(self, x):
+        return self.classifier(x)
+# Load base model
+base_model = AutoModel.from_pretrained(
+    "openbmb/MiniCPM-V-4_5",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16
+).cuda().eval()
+tokenizer = AutoTokenizer.from_pretrained(
+    "openbmb/MiniCPM-V-4_5",
+    trust_remote_code=True
+)
 # Load classifier
+classifier = COPUSClassifier().cuda()
+checkpoint = torch.load("classifier.pt")
+classifier.load_state_dict(checkpoint['classifier_state_dict'])
+classifier.eval()
+# Process video
+def extract_features(frames, prompt):
+    with torch.no_grad():
+        msgs = [{"role": "user", "content": frames + [prompt]}]
+        response = base_model.chat(
+            msgs=msgs,
+            tokenizer=tokenizer,
+            max_new_tokens=500,
+            sampling=False
+        )
+        tokens = tokenizer(response, return_tensors='pt', max_length=512, truncation=True)
+        embeddings = base_model.llm.get_input_embeddings()(tokens['input_ids'].cuda())
+        return embeddings.mean(dim=1).float()
+# Classify
+frames = load_video_frames("classroom.mp4", num_frames=30)
+features = extract_features(frames, classification_prompt)
 logits = classifier(features)
+predictions = (torch.sigmoid(logits) > 0.5).cpu().numpy()
 ```
+## Citation
+```bibtex
+@software{copus_classifier_2025,
+  title={COPUS Video Evaluation System: Automated Classroom Observation using Vision-Language Models},
+  author={Franck, Andy and Ng, Brendan and Derrod, Zane and Fitzgerald, Ben},
+  year={2025},
+  url={https://huggingface.co/ajfranck/COPUS-analysis}
+}
+```