Driver_monitoring / README.md
koreashin's picture
Upload 4 files
8c039c2 verified
|
raw
history blame
2.38 kB

Driver Behavior Detection Model (Epoch 2)

운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.

Model Description

  • Architecture: Video Swin Transformer Tiny (swin3d_t)
  • Backbone Pretrained: Kinetics-400
  • Parameters: 27.85M
  • Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)

Classes (5)

Label Class F1-Score
0 정상 (Normal) 0.93
1 졸음운전 (Drowsy Driving) 0.98
2 물건찾기 (Reaching/Searching) 0.90
3 휴대폰 사용 (Phone Usage) 0.88
4 운전자 폭행 (Driver Assault) 1.00

Performance (Epoch 2)

Metric Value
Accuracy 95.15%
Macro F1 0.9392
Validation Samples 1,371,062

Training Configuration

Parameter Value
Hardware 2x NVIDIA RTX A6000 (48GB)
Distributed DDP (DistributedDataParallel)
Batch Size 32 (16 × 2 GPU)
Gradient Accumulation 4
Effective Batch 128
Optimizer AdamW (lr=1e-3, wd=0.05)
Scheduler OneCycleLR
Mixed Precision FP16
Loss CrossEntropy + Label Smoothing (0.1)
Regularization Mixup (α=0.4), Dropout (0.3)

Usage

import torch
from model import DriverBehaviorModel

# Load model
model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

# Inference
# input: [1, 3, 30, 224, 224] - 30 frames, 224x224, RGB normalized
with torch.no_grad():
    output = model(video_tensor)
    prediction = output.argmax(dim=1)

Dataset

  • Total Videos: 243,979
  • Total Samples (windows): 1,371,062
  • Window Size: 30 frames
  • Stride: 15 frames
  • Resolution: 224×224

Augmentation (Training)

  • RandomResizedCrop (scale 0.8-1.0)
  • HorizontalFlip (p=0.5)
  • ColorJitter, HueSaturationValue
  • Temporal Augmentation (speed change, frame drop)
  • Mixup (α=0.4)
  • CoarseDropout

License

This model is for research purposes only.

Citation

@misc{driver-behavior-detection-2026,
  title={Driver Behavior Detection using Video Swin Transformer},
  author={C-Team},
  year={2026}
}