face-comprehension / README.md
combe4259's picture
Update README.md
6aac4c9 verified
metadata
language: ko
license: apache-2.0
tags:
  - video-classification
  - cnn-lstm
  - pytorch
  - confusion-detection
  - daiSEE
datasets:
  - daiSEE
metrics:
  - accuracy
  - f1
  - precision
  - recall

Colab Notebook

์•„๋ž˜ Colab์—์„œ ๋ชจ๋ธ์„ ์ง์ ‘ ์‹คํ–‰ํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Open In Colab

ํ•™์Šต ๋ฐ์ดํ„ฐ (Dataset)

https://people.iith.ac.in/vineethnb/resources/daisee/index.html

  • DAiSEE Dataset (ํด๋ž˜์Šค: Boredom, Confusion, Engagement, Frustration ์ค‘ Confusion๋งŒ ์‚ฌ์šฉ)
  • ๋ ˆ์ด๋ธ” ๋งคํ•‘:
    • Confusion = 0 โ†’ Not Confused (0)
    • Confusion = 1~3 โ†’ Confused (1)
  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ: ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ (sequence_length=30, image_size=112ร—112)

Face Comprehension: Confusion Binary Classification

์ด ๋ชจ๋ธ์€ DAiSEE ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ๋žŒ์˜ ํ‘œ์ •ยทํ–‰๋™์„ ๋ถ„์„ํ•˜์—ฌ Confusion(ํ˜ผ๋ž€) ์—ฌ๋ถ€๋ฅผ ์ด์ง„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋น„๋””์˜ค ์ดํ•ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.


๋ชจ๋ธ ๊ตฌ์กฐ (Model Architecture)

  • Backbone: MobileNetV2 (ImageNet ์‚ฌ์ „ํ•™์Šต ๊ฐ€์ค‘์น˜ ์‚ฌ์šฉ, ์ผ๋ถ€ ๋ ˆ์ด์–ด ๊ณ ์ •)
  • Sequence Modeling: LSTM (hidden_dim=256, num_layers=2)
  • Attention Mechanism: Temporal Attention ์ ์šฉ
  • Classifier: Fully-connected layer (2 ํด๋ž˜์Šค: Not Confused / Confused)


ํ•™์Šต ์ ˆ์ฐจ (Training Procedure)

  • Optimizer: Adam (lr=0.001)
  • Loss: Weighted CrossEntropyLoss (Class imbalance ๋ณด์ •)
    • Not Confused: 67.5%, Confused: 32.5% โ†’ ๊ฐ€์ค‘์น˜ ๋น„์œจ ์•ฝ 1 : 2.1
  • Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
  • Epochs: 3~10 (์˜ต์…˜์— ๋”ฐ๋ผ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ)
  • Batch Size: 8~16

์‚ฌ์šฉ ๋ฐฉ๋ฒ• (How to Use)

import torch
from face import DAiSEEConfusionNet

# ๋ชจ๋ธ ๋กœ๋“œ
model = DAiSEEConfusionNet()
model.load_state_dict(torch.load("confusion_binary_model.pth", map_location="cpu"))
model.eval()

# ์ž…๋ ฅ: (batch, seq_len, C, H, W)
dummy_input = torch.randn(1, 30, 3, 112, 112)
outputs = model(dummy_input)

prediction = torch.argmax(outputs, dim=1).item()
print("์˜ˆ์ธก ๊ฒฐ๊ณผ:", "Confused" if prediction == 1 else "Not Confused")