Driver Behavior Detection Model (Epoch 2)
운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다.
Model Description
- Architecture: Video Swin Transformer Tiny (swin3d_t)
- Backbone Pretrained: Kinetics-400
- Parameters: 27.85M
- Input: [B, 3, 30, 224, 224] (batch, channels, frames, height, width)
Classes (5)
| Label |
Class |
F1-Score |
| 0 |
정상 (Normal) |
0.93 |
| 1 |
졸음운전 (Drowsy Driving) |
0.98 |
| 2 |
물건찾기 (Reaching/Searching) |
0.90 |
| 3 |
휴대폰 사용 (Phone Usage) |
0.88 |
| 4 |
운전자 폭행 (Driver Assault) |
1.00 |
Performance (Epoch 2)
| Metric |
Value |
| Accuracy |
95.15% |
| Macro F1 |
0.9392 |
| Validation Samples |
1,371,062 |
Training Configuration
| Parameter |
Value |
| Hardware |
2x NVIDIA RTX A6000 (48GB) |
| Distributed |
DDP (DistributedDataParallel) |
| Batch Size |
32 (16 × 2 GPU) |
| Gradient Accumulation |
4 |
| Effective Batch |
128 |
| Optimizer |
AdamW (lr=1e-3, wd=0.05) |
| Scheduler |
OneCycleLR |
| Mixed Precision |
FP16 |
| Loss |
CrossEntropy + Label Smoothing (0.1) |
| Regularization |
Mixup (α=0.4), Dropout (0.3) |
Usage
import torch
from model import DriverBehaviorModel
model = DriverBehaviorModel(num_classes=5, pretrained=False)
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()
with torch.no_grad():
output = model(video_tensor)
prediction = output.argmax(dim=1)
Dataset
- Total Videos: 243,979
- Total Samples (windows): 1,371,062
- Window Size: 30 frames
- Stride: 15 frames
- Resolution: 224×224
Augmentation (Training)
- RandomResizedCrop (scale 0.8-1.0)
- HorizontalFlip (p=0.5)
- ColorJitter, HueSaturationValue
- Temporal Augmentation (speed change, frame drop)
- Mixup (α=0.4)
- CoarseDropout
License
This model is for research purposes only.
Citation
@misc{driver-behavior-detection-2026,
title={Driver Behavior Detection using Video Swin Transformer},
author={C-Team},
year={2026}
}