face-comprehension / README.md

combe4259

Update README.md

6aac4c9 verified 3 months ago

preview code

raw

history blame contribute delete

2.27 kB

metadata

language: ko
license: apache-2.0
tags:
  - video-classification
  - cnn-lstm
  - pytorch
  - confusion-detection
  - daiSEE
datasets:
  - daiSEE
metrics:
  - accuracy
  - f1
  - precision
  - recall

Colab Notebook

아래 Colab에서 모델을 직접 실행하고 테스트할 수 있습니다.

학습 데이터 (Dataset)

https://people.iith.ac.in/vineethnb/resources/daisee/index.html

DAiSEE Dataset (클래스: Boredom, Confusion, Engagement, Frustration 중 Confusion만 사용)
레이블 매핑:
- Confusion = 0 → Not Confused (0)
- Confusion = 1~3 → Confused (1)
입력 데이터: 비디오 프레임 (sequence_length=30, image_size=112×112)

Face Comprehension: Confusion Binary Classification

이 모델은 DAiSEE 데이터셋을 기반으로 사람의 표정·행동을 분석하여 Confusion(혼란) 여부를 이진 분류하는 비디오 이해 모델입니다.

모델 구조 (Model Architecture)

Backbone: MobileNetV2 (ImageNet 사전학습 가중치 사용, 일부 레이어 고정)
Sequence Modeling: LSTM (hidden_dim=256, num_layers=2)
Attention Mechanism: Temporal Attention 적용
Classifier: Fully-connected layer (2 클래스: Not Confused / Confused)

학습 절차 (Training Procedure)

Optimizer: Adam (lr=0.001)
Loss: Weighted CrossEntropyLoss (Class imbalance 보정)
- Not Confused: 67.5%, Confused: 32.5% → 가중치 비율 약 1 : 2.1
Scheduler: ReduceLROnPlateau (patience=3, factor=0.5)
Epochs: 3~10 (옵션에 따라 변경 가능)
Batch Size: 8~16

사용 방법 (How to Use)

import torch
from face import DAiSEEConfusionNet

# 모델 로드
model = DAiSEEConfusionNet()
model.load_state_dict(torch.load("confusion_binary_model.pth", map_location="cpu"))
model.eval()

# 입력: (batch, seq_len, C, H, W)
dummy_input = torch.randn(1, 30, 3, 112, 112)
outputs = model(dummy_input)

prediction = torch.argmax(outputs, dim=1).item()
print("예측 결과:", "Confused" if prediction == 1 else "Not Confused")