| # Driver Behavior Detection Model (Epoch 2) | |
| 운전자 이상행동 감지를 위한 Video Swin Transformer 기반 모델입니다. | |
| ## Model Description | |
| - **Architecture**: Video Swin Transformer Tiny (swin3d_t) | |
| - **Backbone Pretrained**: Kinetics-400 | |
| - **Parameters**: 27.85M | |
| - **Input**: [B, 3, 30, 224, 224] (batch, channels, frames, height, width) | |
| ## Classes (5) | |
| | Label | Class | F1-Score | | |
| |:-----:|-------|:--------:| | |
| | 0 | 정상 (Normal) | 0.93 | | |
| | 1 | 졸음운전 (Drowsy Driving) | 0.98 | | |
| | 2 | 물건찾기 (Reaching/Searching) | 0.90 | | |
| | 3 | 휴대폰 사용 (Phone Usage) | 0.88 | | |
| | 4 | 운전자 폭행 (Driver Assault) | 1.00 | | |
| ## Performance (Epoch 2) | |
| | Metric | Value | | |
| |--------|-------| | |
| | **Accuracy** | 95.15% | | |
| | **Macro F1** | 0.9392 | | |
| | **Validation Samples** | 1,371,062 | | |
| ## Training Configuration | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Hardware | 2x NVIDIA RTX A6000 (48GB) | | |
| | Distributed | DDP (DistributedDataParallel) | | |
| | Batch Size | 32 (16 × 2 GPU) | | |
| | Gradient Accumulation | 4 | | |
| | Effective Batch | 128 | | |
| | Optimizer | AdamW (lr=1e-3, wd=0.05) | | |
| | Scheduler | OneCycleLR | | |
| | Mixed Precision | FP16 | | |
| | Loss | CrossEntropy + Label Smoothing (0.1) | | |
| | Regularization | Mixup (α=0.4), Dropout (0.3) | | |
| ## Usage | |
| ```python | |
| import torch | |
| from model import DriverBehaviorModel | |
| # Load model | |
| model = DriverBehaviorModel(num_classes=5, pretrained=False) | |
| checkpoint = torch.load("pytorch_model.bin", map_location="cpu") | |
| model.load_state_dict(checkpoint["model"]) | |
| model.eval() | |
| # Inference | |
| # input: [1, 3, 30, 224, 224] - 30 frames, 224x224, RGB normalized | |
| with torch.no_grad(): | |
| output = model(video_tensor) | |
| prediction = output.argmax(dim=1) | |
| ``` | |
| ## Dataset | |
| - **Total Videos**: 243,979 | |
| - **Total Samples (windows)**: 1,371,062 | |
| - **Window Size**: 30 frames | |
| - **Stride**: 15 frames | |
| - **Resolution**: 224×224 | |
| ## Augmentation (Training) | |
| - RandomResizedCrop (scale 0.8-1.0) | |
| - HorizontalFlip (p=0.5) | |
| - ColorJitter, HueSaturationValue | |
| - Temporal Augmentation (speed change, frame drop) | |
| - Mixup (α=0.4) | |
| - CoarseDropout | |
| ## License | |
| This model is for research purposes only. | |
| ## Citation | |
| ``` | |
| @misc{driver-behavior-detection-2026, | |
| title={Driver Behavior Detection using Video Swin Transformer}, | |
| author={C-Team}, | |
| year={2026} | |
| } | |
| ``` | |