mjpsm commited on
Commit
a4deb26
Β·
verified Β·
1 Parent(s): 4f9ac6e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +354 -0
README.md ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: tabular-classification
3
+ ---
4
+ # Communicative Engagement Classification API
5
+
6
+ ## Overview
7
+
8
+ This project deploys an XGBoost machine learning model as a cloud-based inference API using FastAPI. The model predicts participant engagement behavior during online meetings based on Recall.ai participant event data extracted from Zoom meetings.
9
+
10
+ The API is designed to support a larger event-driven engagement analytics pipeline that processes participant activity after meetings end.
11
+
12
+ ---
13
+
14
+ # Purpose
15
+
16
+ The purpose of this model is to classify meeting participants into behavioral engagement groups using engineered participation features derived from Recall.ai event streams.
17
+
18
+ The model predicts one of three engagement labels:
19
+
20
+ | Label | Meaning |
21
+ |---|---|
22
+ | Silent Observer | Participant attended but rarely or never verbally engaged |
23
+ | Occasional Participant | Participant engaged intermittently |
24
+ | Active Participant | Participant frequently engaged verbally and behaviorally |
25
+
26
+ ---
27
+
28
+ # Input Features
29
+
30
+ The model uses the following engineered features:
31
+
32
+ | Feature | Description |
33
+ |---|---|
34
+ | total_time | Total amount of time participant remained in the meeting |
35
+ | was_webcam_on | Binary indicator for whether webcam was used |
36
+ | screenshare_usage | Number of screenshare events triggered |
37
+ | never_spoke | Binary indicator for whether participant never spoke |
38
+ | speech_turns | Number of speaking sessions detected |
39
+
40
+ ---
41
+
42
+ # Data Source
43
+
44
+ The input data is generated from Recall.ai participant event logs collected from Zoom meetings.
45
+
46
+ Examples of participant events include:
47
+
48
+ - join
49
+ - leave
50
+ - speech_on
51
+ - speech_off
52
+ - webcam_on
53
+ - webcam_off
54
+ - screenshare_on
55
+ - screenshare_off
56
+
57
+ These events are processed into participant-level behavioral features before inference.
58
+
59
+ ---
60
+
61
+ # Model Architecture
62
+
63
+ | Component | Value |
64
+ |---|---|
65
+ | Model Type | XGBoost Classifier |
66
+ | Task | Multi-class Classification |
67
+ | Output Classes | 3 |
68
+ | Training Data | Recall.ai participant meeting features |
69
+ | Framework | xgboost |
70
+ | API Framework | FastAPI |
71
+
72
+ ---
73
+
74
+ # API Endpoints
75
+
76
+ ## Health Check
77
+
78
+ ### GET /
79
+
80
+ Returns API health status.
81
+
82
+ ### Example Response
83
+
84
+ ```json
85
+ {
86
+ "status": "running"
87
+ }
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Prediction Endpoint
93
+
94
+ ### POST /predict
95
+
96
+ Runs engagement classification on participant feature rows.
97
+
98
+ ### Request Format
99
+
100
+ ```json
101
+ {
102
+ "rows": [
103
+ {
104
+ "student_name": "John",
105
+ "meeting_id": "meeting_123",
106
+ "total_time": 3200,
107
+ "was_webcam_on": 1,
108
+ "screenshare_usage": 0,
109
+ "never_spoke": 0,
110
+ "speech_turns": 5
111
+ }
112
+ ]
113
+ }
114
+ ```
115
+
116
+ ---
117
+
118
+ ### Response Format
119
+
120
+ ```json
121
+ [
122
+ {
123
+ "student_name": "John",
124
+ "meeting_id": "meeting_123",
125
+ "cluster_label": "Occasional Participant"
126
+ }
127
+ ]
128
+ ```
129
+
130
+ ---
131
+
132
+ # Deployment Purpose
133
+
134
+ This API is intended to serve as the inference layer for a cloud-based communicative engagement analytics pipeline.
135
+
136
+ The larger architecture consists of:
137
+
138
+ ```text
139
+ Recall.ai
140
+ ↓
141
+ Webhook Trigger
142
+ ↓
143
+ FastAPI Cloud Server
144
+ ↓
145
+ Participant Event Processing
146
+ ↓
147
+ XGBoost Inference API
148
+ ↓
149
+ Google Sheets / Analytics Storage
150
+ ```
151
+
152
+ ---
153
+
154
+ # File Structure
155
+
156
+ ```text
157
+ .
158
+ β”œβ”€β”€ app.py
159
+ β”œβ”€β”€ engagement_xgb_model.json
160
+ β”œβ”€β”€ requirements.txt
161
+ └── README.md
162
+ ```
163
+
164
+ ---
165
+
166
+ # requirements.txt
167
+
168
+ ```txt
169
+ fastapi
170
+ uvicorn
171
+ xgboost
172
+ pandas
173
+ scikit-learn
174
+ ```
175
+
176
+ ---
177
+
178
+ # app.py
179
+
180
+ ```python
181
+ from fastapi import FastAPI
182
+ from pydantic import BaseModel
183
+
184
+ import xgboost as xgb
185
+ import pandas as pd
186
+
187
+ app = FastAPI()
188
+
189
+ # =========================
190
+ # LOAD MODEL
191
+ # =========================
192
+
193
+ model = xgb.XGBClassifier()
194
+
195
+ model.load_model(
196
+ "engagement_xgb_model.json"
197
+ )
198
+
199
+ # =========================
200
+ # FEATURES
201
+ # =========================
202
+
203
+ FEATURES = [
204
+
205
+ "total_time",
206
+
207
+ "was_webcam_on",
208
+
209
+ "screenshare_usage",
210
+
211
+ "never_spoke",
212
+
213
+ "speech_turns"
214
+ ]
215
+
216
+ # =========================
217
+ # LABELS
218
+ # =========================
219
+
220
+ LABEL_MAP = {
221
+
222
+ 0: "Silent Observer",
223
+
224
+ 1: "Occasional Participant",
225
+
226
+ 2: "Active Participant"
227
+ }
228
+
229
+ # =========================
230
+ # REQUEST MODEL
231
+ # =========================
232
+
233
+ class PredictionRequest(BaseModel):
234
+
235
+ rows: list
236
+
237
+ # =========================
238
+ # HEALTH CHECK
239
+ # =========================
240
+
241
+ @app.get("/")
242
+ def home():
243
+
244
+ return {
245
+ "status": "running"
246
+ }
247
+
248
+ # =========================
249
+ # PREDICTION ENDPOINT
250
+ # =========================
251
+
252
+ @app.post("/predict")
253
+ def predict(request: PredictionRequest):
254
+
255
+ df = pd.DataFrame(request.rows)
256
+
257
+ # Ensure required columns exist
258
+ for col in FEATURES:
259
+
260
+ if col not in df:
261
+
262
+ df[col] = 0
263
+
264
+ X = df[FEATURES]
265
+
266
+ preds = model.predict(X)
267
+
268
+ df["cluster_label"] = [
269
+
270
+ LABEL_MAP[p]
271
+ for p in preds
272
+ ]
273
+
274
+ output = df[[
275
+
276
+ "student_name",
277
+
278
+ "meeting_id",
279
+
280
+ "cluster_label"
281
+ ]]
282
+
283
+ return output.to_dict(
284
+ orient="records"
285
+ )
286
+ ```
287
+
288
+ ---
289
+
290
+ # Deployment Instructions
291
+
292
+ ## Step 1
293
+
294
+ Create a new Space on Hugging Face.
295
+
296
+ Use:
297
+ - SDK: Docker or FastAPI
298
+ - Visibility: Public or Private
299
+
300
+ ---
301
+
302
+ ## Step 2
303
+
304
+ Upload:
305
+ - app.py
306
+ - requirements.txt
307
+ - engagement_xgb_model.json
308
+ - README.md
309
+
310
+ ---
311
+
312
+ ## Step 3
313
+
314
+ Wait for Hugging Face to build the API.
315
+
316
+ ---
317
+
318
+ # Example API URL
319
+
320
+ ```text
321
+ https://your-space-name.hf.space/predict
322
+ ```
323
+
324
+ ---
325
+
326
+ # Example cURL Request
327
+
328
+ ```bash
329
+ curl -X POST \
330
+ https://your-space-name.hf.space/predict \
331
+ -H "Content-Type: application/json" \
332
+ -d '{
333
+ "rows": [
334
+ {
335
+ "student_name": "John",
336
+ "meeting_id": "meeting_123",
337
+ "total_time": 3200,
338
+ "was_webcam_on": 1,
339
+ "screenshare_usage": 0,
340
+ "never_spoke": 0,
341
+ "speech_turns": 5
342
+ }
343
+ ]
344
+ }'
345
+ ```
346
+
347
+ ---
348
+
349
+ # Notes
350
+
351
+ - This API performs inference only.
352
+ - Training is not performed inside the deployed service.
353
+ - The model is optimized for lightweight CPU inference.
354
+ - The API is intended for integration into event-driven engagement analytics systems.