ajfranck commited on
Commit
8dff199
·
verified ·
1 Parent(s): 75be037

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -8
README.md CHANGED
@@ -2,28 +2,100 @@
2
  license: mit
3
  language:
4
  - en
 
 
 
 
 
 
5
  base_model:
6
  - openbmb/MiniCPM-V-4_5
7
  pipeline_tag: visual-question-answering
8
  ---
9
  # COPUS Classifier
10
 
11
- Classifier head for COPUS action recognition, trained on MiniCPM-V-4_5 features.
 
 
 
 
 
 
 
 
12
 
13
  ## Usage
14
 
15
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  # Load classifier
17
- classifier = torch.load('classifier.pt')
 
 
 
18
 
19
- # Extract features from video
20
- features = extract_features(frames, temporal_ids)
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- # Predict
 
 
23
  logits = classifier(features)
24
- predictions = torch.sigmoid(logits) > 0.5
25
  ```
26
 
27
- ## Actions: 24
28
 
29
- Trained on: 2025-11-06
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  language:
4
  - en
5
+ tags:
6
+ - video-classification
7
+ - education
8
+ - classroom-observation
9
+ - copus
10
+ - vision-language-model
11
  base_model:
12
  - openbmb/MiniCPM-V-4_5
13
  pipeline_tag: visual-question-answering
14
  ---
15
  # COPUS Classifier
16
 
17
+ The system consists of a lightweight classifier head trained on top of the frozen MiniCPM-V-4.5 vision-language model. The base model remains unchanged during training, with only the classification layers being optimized.
18
+
19
+ ## COPUS Framework
20
+
21
+ The model detects 24 classroom activities across two categories:
22
+
23
+ **Student Actions (13 codes)**: L (Listening), Ind (Individual work), CG (Clicker groups), WG (Worksheet groups), OG (Other groups), AnQ (Answering questions), SQ (Asking questions), WC (Whole class discussion), Prd (Predictions), SP (Presentations), TQ (Test/Quiz), W (Waiting), O (Other)
24
+
25
+ **Instructor Actions (11 codes)**: Lec (Lecturing), RtW (Real-time writing), FUp (Follow-up), PQ (Posing questions), CQ (Clicker questions), AnQ (Answering questions), MG (Moving/Guiding), 1o1 (One-on-one), D/V (Demo/Video), Adm (Administration), W (Waiting)
26
 
27
  ## Usage
28
 
29
  ```python
30
+ import torch
31
+ import torch.nn as nn
32
+ from transformers import AutoModel, AutoTokenizer
33
+ from PIL import Image
34
+ from decord import VideoReader, cpu
35
+
36
+ class COPUSClassifier(nn.Module):
37
+ def __init__(self, input_dim=4096, num_classes=24):
38
+ super().__init__()
39
+ self.classifier = nn.Sequential(
40
+ nn.Linear(input_dim, 1024),
41
+ nn.ReLU(),
42
+ nn.Dropout(0.3),
43
+ nn.Linear(1024, 512),
44
+ nn.ReLU(),
45
+ nn.Dropout(0.2),
46
+ nn.Linear(512, num_classes)
47
+ )
48
+
49
+ def forward(self, x):
50
+ return self.classifier(x)
51
+
52
+ # Load base model
53
+ base_model = AutoModel.from_pretrained(
54
+ "openbmb/MiniCPM-V-4_5",
55
+ trust_remote_code=True,
56
+ torch_dtype=torch.bfloat16
57
+ ).cuda().eval()
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained(
60
+ "openbmb/MiniCPM-V-4_5",
61
+ trust_remote_code=True
62
+ )
63
+
64
  # Load classifier
65
+ classifier = COPUSClassifier().cuda()
66
+ checkpoint = torch.load("classifier.pt")
67
+ classifier.load_state_dict(checkpoint['classifier_state_dict'])
68
+ classifier.eval()
69
 
70
+ # Process video
71
+ def extract_features(frames, prompt):
72
+ with torch.no_grad():
73
+ msgs = [{"role": "user", "content": frames + [prompt]}]
74
+ response = base_model.chat(
75
+ msgs=msgs,
76
+ tokenizer=tokenizer,
77
+ max_new_tokens=500,
78
+ sampling=False
79
+ )
80
+ tokens = tokenizer(response, return_tensors='pt', max_length=512, truncation=True)
81
+ embeddings = base_model.llm.get_input_embeddings()(tokens['input_ids'].cuda())
82
+ return embeddings.mean(dim=1).float()
83
 
84
+ # Classify
85
+ frames = load_video_frames("classroom.mp4", num_frames=30)
86
+ features = extract_features(frames, classification_prompt)
87
  logits = classifier(features)
88
+ predictions = (torch.sigmoid(logits) > 0.5).cpu().numpy()
89
  ```
90
 
 
91
 
92
+ ## Citation
93
+
94
+ ```bibtex
95
+ @software{copus_classifier_2025,
96
+ title={COPUS Video Evaluation System: Automated Classroom Observation using Vision-Language Models},
97
+ author={Franck, Andy and Ng, Brendan and Derrod, Zane and Fitzgerald, Ben},
98
+ year={2025},
99
+ url={https://huggingface.co/ajfranck/COPUS-analysis}
100
+ }
101
+ ```