metadata
language: ko
license: apache-2.0
tags:
- video-classification
- cnn-lstm
- pytorch
- confusion-detection
- daiSEE
datasets:
- daiSEE
metrics:
- accuracy
- f1
- precision
- recall
Colab Notebook
์๋ Colab์์ ๋ชจ๋ธ์ ์ง์ ์คํํ๊ณ ํ ์คํธํ ์ ์์ต๋๋ค.
ํ์ต ๋ฐ์ดํฐ (Dataset)
https://people.iith.ac.in/vineethnb/resources/daisee/index.html
- DAiSEE Dataset (ํด๋์ค: Boredom, Confusion, Engagement, Frustration ์ค Confusion๋ง ์ฌ์ฉ)
- ๋ ์ด๋ธ ๋งคํ:
- Confusion = 0 โ Not Confused (0)
- Confusion = 1~3 โ Confused (1)
- ์ ๋ ฅ ๋ฐ์ดํฐ: ๋น๋์ค ํ๋ ์ (sequence_length=30, image_size=112ร112)
Face Comprehension: Confusion Binary Classification
์ด ๋ชจ๋ธ์ DAiSEE ๋ฐ์ดํฐ์ ์ ๊ธฐ๋ฐ์ผ๋ก ์ฌ๋์ ํ์ ยทํ๋์ ๋ถ์ํ์ฌ Confusion(ํผ๋) ์ฌ๋ถ๋ฅผ ์ด์ง ๋ถ๋ฅํ๋ ๋น๋์ค ์ดํด ๋ชจ๋ธ์ ๋๋ค.
๋ชจ๋ธ ๊ตฌ์กฐ (Model Architecture)
- Backbone: MobileNetV2 (ImageNet ์ฌ์ ํ์ต ๊ฐ์ค์น ์ฌ์ฉ, ์ผ๋ถ ๋ ์ด์ด ๊ณ ์ )
- Sequence Modeling: LSTM (hidden_dim=256, num_layers=2)
- Attention Mechanism: Temporal Attention ์ ์ฉ
- Classifier: Fully-connected layer (2 ํด๋์ค: Not Confused / Confused)
ํ์ต ์ ์ฐจ (Training Procedure)
- Optimizer: Adam (lr=0.001)
- Loss: Weighted CrossEntropyLoss (Class imbalance ๋ณด์ )
- Not Confused: 67.5%, Confused: 32.5% โ ๊ฐ์ค์น ๋น์จ ์ฝ 1 : 2.1
- Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
- Epochs: 3~10 (์ต์ ์ ๋ฐ๋ผ ๋ณ๊ฒฝ ๊ฐ๋ฅ)
- Batch Size: 8~16
์ฌ์ฉ ๋ฐฉ๋ฒ (How to Use)
import torch
from face import DAiSEEConfusionNet
# ๋ชจ๋ธ ๋ก๋
model = DAiSEEConfusionNet()
model.load_state_dict(torch.load("confusion_binary_model.pth", map_location="cpu"))
model.eval()
# ์
๋ ฅ: (batch, seq_len, C, H, W)
dummy_input = torch.randn(1, 30, 3, 112, 112)
outputs = model(dummy_input)
prediction = torch.argmax(outputs, dim=1).item()
print("์์ธก ๊ฒฐ๊ณผ:", "Confused" if prediction == 1 else "Not Confused")